我可以从highcharts.js中提取原始数据吗?

ars1skjm  于 2022-11-10  发布在  Highcharts
关注(0)|答案(2)|浏览(164)

我想使用highcharts.js从一个显示图表的页面中抓取数据,这样我就完成了对所有页面的解析,得到了following page。但是,最后一个显示数据集的页面使用highcharts.js来显示图表,要访问原始数据似乎几乎是不可能的。
我使用Python 3.5和BeautifulSoup。
还可以解析吗?如果可以,我怎么能把它刮下来呢?

smtd7mpg

smtd7mpg1#

数据在一个脚本标记中,可以使用bs4和一个正则表达式来获取脚本标记,也可以使用正则表达式来提取数据,但我喜欢使用/js2xml来将js函数解析为xml树:

from bs4 import BeautifulSoup
import requests
import re
import js2xml

soup = BeautifulSoup(requests.get("http://www.worldweatheronline.com/brussels-weather-averages/be.aspx").content, "html.parser")
script = soup.find("script", text=re.compile("Highcharts.Chart")).text

# script = soup.find("script", text=re.compile("precipchartcontainer")).text if you want precipitation data

parsed = js2xml.parse(script)
print js2xml.pretty_print(parsed)

这将为您提供:

<program>
  <functioncall>
    <function>
      <identifier name="$"/>
    </function>
    <arguments>
      <funcexpr>
        <identifier/>
        <parameters/>
        <body>
          <var name="chart"/>
          <functioncall>
            <function>
              <dotaccessor>
                <object>
                  <functioncall>
                    <function>
                      <identifier name="$"/>
                    </function>
                    <arguments>
                      <identifier name="document"/>
                    </arguments>
                  </functioncall>
                </object>
                <property>
                  <identifier name="ready"/>
                </property>
              </dotaccessor>
            </function>
            <arguments>
              <funcexpr>
                <identifier/>
                <parameters/>
                <body>
                  <assign operator="=">
                    <left>
                      <identifier name="chart"/>
                    </left>
                    <right>
                      <new>
                        <dotaccessor>
                          <object>
                            <identifier name="Highcharts"/>
                          </object>
                          <property>
                            <identifier name="Chart"/>
                          </property>
                        </dotaccessor>
                        <arguments>
                          <object>
                            <property name="chart">
                              <object>
                                <property name="renderTo">
                                  <string>tempchartcontainer</string>
                                </property>
                                <property name="type">
                                  <string>spline</string>
                                </property>
                              </object>
                            </property>
                            <property name="credits">
                              <object>
                                <property name="enabled">
                                  <boolean>false</boolean>
                                </property>
                              </object>
                            </property>
                            <property name="colors">
                              <array>
                                <string>#FF8533</string>
                                <string>#4572A7</string>
                              </array>
                            </property>
                            <property name="title">
                              <object>
                                <property name="text">
                                  <string>Average Temperature (°c) Graph for Brussels</string>
                                </property>
                              </object>
                            </property>
                            <property name="xAxis">
                              <object>
                                <property name="categories">
                                  <array>
                                    <string>January</string>
                                    <string>February</string>
                                    <string>March</string>
                                    <string>April</string>
                                    <string>May</string>
                                    <string>June</string>
                                    <string>July</string>
                                    <string>August</string>
                                    <string>September</string>
                                    <string>October</string>
                                    <string>November</string>
                                    <string>December</string>
                                  </array>
                                </property>
                                <property name="labels">
                                  <object>
                                    <property name="rotation">
                                      <number value="270"/>
                                    </property>
                                    <property name="y">
                                      <number value="40"/>
                                    </property>
                                  </object>
                                </property>
                              </object>
                            </property>
                            <property name="yAxis">
                              <object>
                                <property name="title">
                                  <object>
                                    <property name="text">
                                      <string>Temperature (°c)</string>
                                    </property>
                                  </object>
                                </property>
                              </object>
                            </property>
                            <property name="tooltip">
                              <object>
                                <property name="enabled">
                                  <boolean>true</boolean>
                                </property>
                              </object>
                            </property>
                            <property name="plotOptions">
                              <object>
                                <property name="spline">
                                  <object>
                                    <property name="dataLabels">
                                      <object>
                                        <property name="enabled">
                                          <boolean>true</boolean>
                                        </property>
                                      </object>
                                    </property>
                                    <property name="enableMouseTracking">
                                      <boolean>false</boolean>
                                    </property>
                                  </object>
                                </property>
                              </object>
                            </property>
                            <property name="series">
                              <array>
                                <object>
                                  <property name="name">
                                    <string>Average High Temp (°c)</string>
                                  </property>
                                  <property name="color">
                                    <string>#FF8533</string>
                                  </property>
                                  <property name="data">
                                    <array>
                                      <number value="6"/>
                                      <number value="8"/>
                                      <number value="11"/>
                                      <number value="14"/>
                                      <number value="19"/>
                                      <number value="21"/>
                                      <number value="23"/>
                                      <number value="23"/>
                                      <number value="19"/>
                                      <number value="15"/>
                                      <number value="9"/>
                                      <number value="6"/>
                                    </array>
                                  </property>
                                </object>
                                <object>
                                  <property name="name">
                                    <string>Average Low Temp (°c)</string>
                                  </property>
                                  <property name="color">
                                    <string>#4572A7</string>
                                  </property>
                                  <property name="data">
                                    <array>
                                      <number value="2"/>
                                      <number value="2"/>
                                      <number value="4"/>
                                      <number value="6"/>
                                      <number value="10"/>
                                      <number value="12"/>
                                      <number value="14"/>
                                      <number value="14"/>
                                      <number value="11"/>
                                      <number value="8"/>
                                      <number value="5"/>
                                      <number value="2"/>
                                    </array>
                                  </property>
                                </object>
                              </array>
                            </property>
                          </object>
                        </arguments>
                      </new>
                    </right>
                  </assign>
                </body>
              </funcexpr>
            </arguments>
          </functioncall>
        </body>
      </funcexpr>
    </arguments>
  </functioncall>
</program>

所以要得到所有的数据:

In [28]: from bs4 import BeautifulSoup  
In [29]: import requests
In [30]: import re    
In [31]: import js2xml    
In [32]: from itertools import repeat    
In [33]: from pprint import pprint as pp
In [34]: soup = BeautifulSoup(requests.get("http://www.worldweatheronline.com/brussels-weather-averages/be.aspx").content, "html.parser")

In [35]: script = soup.find("script", text=re.compile("Highcharts.Chart")).text

In [36]: parsed = js2xml.parse(script)

In [37]: data = [d.xpath(".//array/number/@value") for d in parsed.xpath("//property[@name='data']")]

In [38]: categories = parsed.xpath("//property[@name='categories']//string/text()")

In [39]: output =  list(zip(repeat(categories), data))    
In [40]: pp(output)
[(['January',
   'February',
   'March',
   'April',
   'May',
   'June',
   'July',
   'August',
   'September',
   'October',
   'November',
   'December'],
  ['6', '8', '11', '14', '19', '21', '23', '23', '19', '15', '9', '6']),
 (['January',
   'February',
   'March',
   'April',
   'May',
   'June',
   'July',
   'August',
   'September',
   'October',
   'November',
   'December'],
  ['2', '2', '4', '6', '10', '12', '14', '14', '11', '8', '5', '2'])]

就像我说的,你可以只使用一个正则表达式,但 js2xml 我发现更可靠的错误空间等。

cnwbcb6i

cnwbcb6i2#

给其他偶然发现的人。
如果您的页面是在Selenium中加载的,则可以执行以下操作:

driver.execute_script("return $('#YOUR_ID_LOCATION').highcharts().series[0].processedYData")

将提取YData等的数据。这假定驱动程序是Selenium Webdriver。

相关问题