scrapy 如何使用python提取HTML按钮和脚本内容?

oknwwptz  于 2022-11-09  发布在  Python
关注(0)|答案(1)|浏览(159)

例如,我尝试使用python抓取并获取按钮和脚本内容

<button class="xxx" href=www.example.com link="www.link.com"></button>

我想打印class、href和来自button标记的引用链接,

<script> let x = 10; let y = 20; let link = "www.link.com"; <\script>

我想从脚本中得到x,y和链接标记,有人能帮忙吗?

oaxa6hgo

oaxa6hgo1#

请尝试:

import re
from bs4 import BeautifulSoup

html_doc = """\
<button class="xxx" href=www.example.com link="www.link.com"></button>
<script>let x = 10; let y = 20; let link = "www.link.com";</script>"""

soup = BeautifulSoup(html_doc, "html.parser")

# print <button> stuff

button = soup.find("button", class_="xxx")
print(f"{button['class']=} {button['link']=} {button['href']=}")

# print <script> stuff

script = soup.find("script").text
x = re.search(r"let x = (\S+);", script).group(1)
y = re.search(r"let y = (\S+);", script).group(1)
link = re.search(r'let link = "(\S+)"', script).group(1)
print(f"{x=} {y=} {link=}")

印刷品:

button['class']=['xxx'] button['link']='www.link.com' button['href']='www.example.com'
x='10' y='20' link='www.link.com'

相关问题