从chrome控制台读取数据到python

wqnecbli  于 2023-03-10  发布在  Go
关注(0)|答案(2)|浏览(133)

'我在python中有一段代码,可以从网站读取xpath(https://www.op.gg/summoners/kr/Hide%20on%20bush

import requests
import lxml.html as html
import pandas as pd

url_padre = "https://www.op.gg/summoners/br/tercermundista"

link_farm = '//div[@class="stats"]//div[@class="cs"]'

r = requests.get(url_padre) 

home=r.content.decode("utf-8") 

parser=html.fromstring(home) 
farm=parser.xpath(link_farm) 

print(farm)`

此代码打印“[]”
但在控制台chrome中放置以下xpath:$x('//div[@class=“stats”]//div[@class=“cs”]').map(x=〉x.innerText),这会打印出我想要的数字,但是我的python代码不会这样做。
我想要一个密码来解决我的错误
----------------------编辑--------------

Error                                     Traceback (most recent call last)
c:\Users\GCO\Desktop\Analisis de datos\borradores\fsdfs.ipynb Cell 2 in 3
      1 from playwright.sync_api import sync_playwright
----> 3 with sync_playwright() as p, p.chromium.launch() as browser:
      4     page = browser.new_page()
      5     page.goto("https://www.op.gg/summoners/kr/Hide%20on%20bush", timeout=10000)

File c:\Users\GCO\AppData\Local\Programs\Python\Python310\lib\site-packages\playwright\sync_api\_context_manager.py:47, in PlaywrightContextManager.__enter__(self)
     45             self._own_loop = True
     46         if self._loop.is_running():
---> 47             raise Error(
     48                 """It looks like you are using Playwright Sync API inside the asyncio loop.
     49 Please use the Async API instead."""
     50             )
     52         # In Python 3.7, asyncio.Process.wait() hangs because it does not use ThreadedChildWatcher
     53         # which is used in Python 3.8+. This is unix specific and also takes care about
     54         # cleaning up zombie processes. See https://bugs.python.org/issue35621
     55         if (
     56             sys.version_info[0] == 3
     57             and sys.version_info[1] == 7
     58             and sys.platform != "win32"
     59             and isinstance(asyncio.get_child_watcher(), asyncio.SafeChildWatcher)
     60         ):

Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Please use the Async API instead.
toiithl6

toiithl61#

据我所知,您无法使用requests获得动态生成的内容。
下面是使用playwright的解决方案,它可以在解析之前加载整个页面。
1.使用pip install playwright安装剧作家
1.使用playwright install chromium --with-deps安装浏览器和依赖项
1.运行以下代码

from playwright.sync_api import sync_playwright

with sync_playwright() as p, p.chromium.launch() as browser:
    page = browser.new_page()
    page.goto("https://www.op.gg/summoners/kr/Hide%20on%20bush", timeout=10000)
    selector = "//div[@class='stats']//div[@class='cs']/div"
    cs_stats = page.query_selector_all(selector)
    print(len(cs_stats), [cs.inner_text() for cs in cs_stats])

如果你想坚持使用lxml作为解析工具,你可以使用下面的代码:

from lxml import html
from playwright.sync_api import sync_playwright

with sync_playwright() as p, p.chromium.launch() as browser:
    page = browser.new_page()
    page.goto("https://www.op.gg/summoners/kr/Hide%20on%20bush", timeout=10000)
    selector = "//div[@class='stats']//div[@class='cs']/div"
    c = page.content()
    parser = html.fromstring(c)
    farm = parser.xpath(selector)
    print(len(farm), [cs.text for cs in farm])

附言

我还注意到op.gg使用非常简单的HTTP请求,不需要授权。你可以使用以下代码找到想要的信息:

import json
from urllib.request import urlopen
url = "https://op.gg/api/v1.0/internal/bypass/games/kr/summoners/4b4tvMrpRRDLvXAiQ_Vmh5yMOsD0R3GPGTUVfIanp1Httg?&limit=20"
r = urlopen(url)
games = json.load(r).get("data", [])
print(games)

games是一个字典列表,它存储了你需要的所有信息。CS统计信息存储在以下键下的列表元素中:games[0]["myData"]["stats"]["minion_kill"]
这里唯一困难的事情是找到如何为所需用户获取summoner_id(在您的示例中是4b4tvMrpRRDLvXAiQ_Vmh5yMOsD0R3GPGTUVfIanp1Httg

ehxuflar

ehxuflar2#

您可以使用以下示例来说明如何从外部URL加载数据并计算CS值:

import re
import requests

url = "https://www.op.gg/summoners/kr/Hide%20on%20bush"
api_url = "https://op.gg/api/v1.0/internal/bypass/games/kr/summoners/{summoner_id}?=&limit=20&hl=en_US&game_type=total"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0"
}

html_doc = requests.get(url, headers=headers).text
summoner_id = re.search(r'"summoner_id":"(.*?)"', html_doc).group(1)

data = requests.get(api_url.format(summoner_id=summoner_id), headers=headers).json()

for d in data["data"]:
    stats = d["myData"]["stats"]
    kills = (
        stats["minion_kill"]
        + stats["neutral_minion_kill_team_jungle"]
        + stats["neutral_minion_kill_enemy_jungle"]
        + stats["neutral_minion_kill"]
    )
    cs = kills / (d['game_length_second'] / 60)
    print(f'{cs=:.1f}')

图纸:

cs=6.7
cs=8.5
cs=8.2
cs=1.4
cs=7.3
cs=8.5
cs=6.8
cs=7.7
cs=8.7
cs=8.8
cs=5.6
cs=9.9
cs=7.0
cs=9.6
cs=9.7
cs=5.0
cs=7.5
cs=9.2
cs=9.0
cs=7.9

相关问题