'我在python中有一段代码,可以从网站读取xpath(https://www.op.gg/summoners/kr/Hide%20on%20bush)
import requests
import lxml.html as html
import pandas as pd
url_padre = "https://www.op.gg/summoners/br/tercermundista"
link_farm = '//div[@class="stats"]//div[@class="cs"]'
r = requests.get(url_padre)
home=r.content.decode("utf-8")
parser=html.fromstring(home)
farm=parser.xpath(link_farm)
print(farm)`
此代码打印“[]”
但在控制台chrome中放置以下xpath:$x('//div[@class=“stats”]//div[@class=“cs”]').map(x=〉x.innerText),这会打印出我想要的数字,但是我的python代码不会这样做。
我想要一个密码来解决我的错误
----------------------编辑--------------
Error Traceback (most recent call last)
c:\Users\GCO\Desktop\Analisis de datos\borradores\fsdfs.ipynb Cell 2 in 3
1 from playwright.sync_api import sync_playwright
----> 3 with sync_playwright() as p, p.chromium.launch() as browser:
4 page = browser.new_page()
5 page.goto("https://www.op.gg/summoners/kr/Hide%20on%20bush", timeout=10000)
File c:\Users\GCO\AppData\Local\Programs\Python\Python310\lib\site-packages\playwright\sync_api\_context_manager.py:47, in PlaywrightContextManager.__enter__(self)
45 self._own_loop = True
46 if self._loop.is_running():
---> 47 raise Error(
48 """It looks like you are using Playwright Sync API inside the asyncio loop.
49 Please use the Async API instead."""
50 )
52 # In Python 3.7, asyncio.Process.wait() hangs because it does not use ThreadedChildWatcher
53 # which is used in Python 3.8+. This is unix specific and also takes care about
54 # cleaning up zombie processes. See https://bugs.python.org/issue35621
55 if (
56 sys.version_info[0] == 3
57 and sys.version_info[1] == 7
58 and sys.platform != "win32"
59 and isinstance(asyncio.get_child_watcher(), asyncio.SafeChildWatcher)
60 ):
Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Please use the Async API instead.
2条答案
按热度按时间toiithl61#
据我所知,您无法使用
requests
获得动态生成的内容。下面是使用
playwright
的解决方案,它可以在解析之前加载整个页面。1.使用
pip install playwright
安装剧作家1.使用
playwright install chromium --with-deps
安装浏览器和依赖项1.运行以下代码
如果你想坚持使用
lxml
作为解析工具,你可以使用下面的代码:附言
我还注意到op.gg使用非常简单的HTTP请求,不需要授权。你可以使用以下代码找到想要的信息:
games
是一个字典列表,它存储了你需要的所有信息。CS统计信息存储在以下键下的列表元素中:games[0]["myData"]["stats"]["minion_kill"]
这里唯一困难的事情是找到如何为所需用户获取
summoner_id
(在您的示例中是4b4tvMrpRRDLvXAiQ_Vmh5yMOsD0R3GPGTUVfIanp1Httg
)ehxuflar2#
您可以使用以下示例来说明如何从外部URL加载数据并计算CS值:
图纸: