例如,我尝试使用python抓取并获取按钮和脚本内容
<button class="xxx" href=www.example.com link="www.link.com"></button>
我想打印class、href和来自button标记的引用链接,
<script> let x = 10; let y = 20; let link = "www.link.com"; <\script>
我想从脚本中得到x,y和链接标记,有人能帮忙吗?
oaxa6hgo1#
请尝试:
import re from bs4 import BeautifulSoup html_doc = """\ <button class="xxx" href=www.example.com link="www.link.com"></button> <script>let x = 10; let y = 20; let link = "www.link.com";</script>""" soup = BeautifulSoup(html_doc, "html.parser") # print <button> stuff button = soup.find("button", class_="xxx") print(f"{button['class']=} {button['link']=} {button['href']=}") # print <script> stuff script = soup.find("script").text x = re.search(r"let x = (\S+);", script).group(1) y = re.search(r"let y = (\S+);", script).group(1) link = re.search(r'let link = "(\S+)"', script).group(1) print(f"{x=} {y=} {link=}")
印刷品:
button['class']=['xxx'] button['link']='www.link.com' button['href']='www.example.com' x='10' y='20' link='www.link.com'
1条答案
按热度按时间oaxa6hgo1#
请尝试:
印刷品: