我尝试用Python构建一个scraper,它从网页HTML中的JavaScript代码获取一个变量,这个变量会随着时间的推移而变化。我需要yValues
变量的第一个数字:
jQuery(document).ready(function() {
var draw = true;
if ("Biblioteca di Ingegneria" == "") {
draw = false;
}
if (draw) {
var yValues = [
"28",
"100"
];
var Titolo = "Biblioteca di Ingegneria";
var sottoTitolo = "Posti Totali: 128";
var barColors = [
"#167d21",
"#ed2135"
];
var xValues = [
"Liberi (28)",
"Occupati (100)"
];
new Chart("InOutChart", {
type: "pie",
data: {
labels: xValues,
datasets: [
{
backgroundColor: barColors,
data: yValues
}
]
},
options: {
plugins: {
title: {
display: true,
text: Titolo,
font: {
size: 25,
style: 'normal',
lineHeight: 1.2
},
// padding: {
// top: 10,
// bottom: 30
// }
},
subtitle: {
display: true,
text: sottoTitolo,
font: {
size: 20,
style: 'normal',
lineHeight: 1.2
},
padding: {
bottom: 30
}
},
legend: {
display: true,
position: "bottom",
labels: {
font: {
size: 20,
style: 'normal',
lineHeight: 1.2
}
}
}
},
responsive: true,
maintainAspectRatio: false,
scales: {
yAxes: [
{
display: true,
ticks: {
beginAtZero: true
}
}
]
}
}
});
}
});
这是我能做的最好的:
from bs4 import BeautifulSoup
import requests
# Make a GET request to the URL of the web page.
base_url = 'https://qrbiblio.unipi.it/Home/Chart?IdCat=a96d84ba-46e8-47a1-b947-ab98a8746d6f'
response = requests.get(base_url)
# Parse the HTML content of the page.
soup = BeautifulSoup(response.text, "html.parser")
# Find all the `<script>` elements on the page.
scripts = soup.find_all("script")
# Get the 8th `<script>` element.
script8 = scripts[7]
# Transform the 8th `<script>` into a string.
script8_txt = "".join(script8)
# Get the useful string from the 8th `<script>`.
usefull_txt = script8_txt[248:251]
# Get the int from the string.
pl = int("".join(filter(str.isdigit, usefull_txt)))
print(pl)
这是可行的,但我想自动解析JavaScript代码来找到变量并获取其值,因为正如你所看到的,我手动检查了我需要的字符的位置。我正在寻找一个更好的解决方案,因为我计划将此代码用于其他类似的网页,但变量的位置每次都在变化。我想把这个Python代码放在一个Alexa技能中,所以我不知道Selenium包是否能很好地工作。
1条答案
按热度按时间ss2ws0br1#
试试这个:
输出: