python 我怎样才能获得浏览器上看到的网站的Html？

bvjxkvbb 于 2022-12-10 发布在 Python

关注(0)|答案(1)|浏览(206)

一个网站在打开后会加载网站的一部分，当我使用request和urllib3等库时，我无法获取稍后加载的部分，我如何才能获取浏览器中看到的这个网站的html。我无法使用Selenium打开浏览器并获取html，因为这个过程不应该在浏览器中变慢。
我尝试了htppx、httplib2、urllib、urllib3，但是我无法获得后来加载的部分。

python

来源：https://stackoverflow.com/questions/74742348/how-can-i-get-html-of-a-website-as-seen-on-browser

1条答案

按热度按时间

fnx2tebb1#

您可以使用BeautifulSoup库或Selenium来模拟类似于用户的页面加载和等待加载其他HTML元素。
我建议使用Selenium，因为它包含WebDriverWait类，可以帮助您抓取额外的HTML元素。
这是我的简单例子：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Replace with the URL of the website you want
url = "https://www.example.com"

# Adding the option for headless browser
options = webdriver.ChromeOptions()
options.add_argument("headless")
driver = webdriver.Chrome(options=options)

# Create a new instance of the Chrome webdriver
driver = webdriver.Chrome()

driver.get(url)

# Wait for the additional HTML elements to load
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_all_elements_located((By.XPATH, "//*[contains(@class, 'lazy-load')]")))

# Get  HTML 
html = driver.page_source

print(html)

driver.close()

在上面的例子中，你可以看到我正在使用explicit wait等待（10秒）一个特定的条件发生。更具体地说，我正在等待，直到带有'lazy-load'类的元素被.XPath定位，然后我检索HTML元素。
最后，我建议检查BeautifulSoup和Selenium，因为它们都有强大的功能来废弃网站和自动化基于Web的任务。

赞(0）回复(0）举报 2022-12-10

我来回答

python 我怎样才能获得浏览器上看到的网站的Html？

1条答案

相关问题

热门标签

最新问答