scrapy 如何使用python提取HTML按钮和脚本内容？

oknwwptz 于 2022-11-09 发布在 Python

关注(0)|答案(1)|浏览(159)

例如，我尝试使用python抓取并获取按钮和脚本内容

<button class="xxx" href=www.example.com link="www.link.com"></button>

我想打印class、href和来自button标记的引用链接，

<script> let x = 10; let y = 20; let link = "www.link.com"; <\script>

我想从脚本中得到x，y和链接标记，有人能帮忙吗？

scrapy

来源：https://stackoverflow.com/questions/74031889/how-to-extract-html-button-and-script-content-using-python

1条答案

按热度按时间

oaxa6hgo1#

请尝试：

import re
from bs4 import BeautifulSoup

html_doc = """\
<button class="xxx" href=www.example.com link="www.link.com"></button>
<script>let x = 10; let y = 20; let link = "www.link.com";</script>"""

soup = BeautifulSoup(html_doc, "html.parser")

# print <button> stuff

button = soup.find("button", class_="xxx")
print(f"{button['class']=} {button['link']=} {button['href']=}")

# print <script> stuff

script = soup.find("script").text
x = re.search(r"let x = (\S+);", script).group(1)
y = re.search(r"let y = (\S+);", script).group(1)
link = re.search(r'let link = "(\S+)"', script).group(1)
print(f"{x=} {y=} {link=}")

印刷品：

button['class']=['xxx'] button['link']='www.link.com' button['href']='www.example.com'
x='10' y='20' link='www.link.com'

赞(0）回复(0）举报 2022-11-09

我来回答

scrapy 如何使用python提取HTML按钮和脚本内容？

1条答案

相关问题

热门标签

最新问答