from bs4 import BeautifulSoup
import requests, lxml
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
"q": "Narendra Modi",
"hl": "en", # language
"gl": "us" # country of the search, US -> USA
}
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}
html = requests.get("https://www.google.com/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
result = soup.select_one(".kno-rdesc span").text
print(result)
输出量:
Narendra Damodardas Modi is an Indian politician serving as the 14th and current prime minister of India since 2014. Modi was the chief minister of Gujarat from 2001 to 2014 and is the Member of Parliament from Varanasi.
from serpapi import GoogleSearch
import os
params = {
# https://docs.python.org/3/library/os.html#os.getenv
"api_key": os.getenv("API_KEY"), # your serpapi api key
"engine": "google", # search engine
"q": "Narendra Modi" # search query
# other parameters
}
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
result_dict = search.get_dict() # JSON -> Python dict
result = result_dict["knowledge_graph"]["description"]
print(result)
输出量:
Narendra Damodardas Modi is an Indian politician serving as the 14th and current prime minister of India since 2014. Modi was the chief minister of Gujarat from 2001 to 2014 and is the Member of Parliament from Varanasi.
2条答案
按热度按时间cngwdvgl1#
我想这可能对你有帮助,在搜索中给出了黄金率
rekjcdws2#
Beautiful Soup
library最适合这个任务。要找到所需的选择器,可以使用select_one()
方法。该方法接受一个要搜索的选择器。要获得所需的元素,需要使用.kno-rdesc
类引用常规div,并选择其中的span
标记。结果选择器如下所示:.kno-rdesc span
。该方法将返回html
元素。为了从该元素中提取文本,必须使用text
方法。下面是使用上述方法的代码片段:
另外,请确保您使用的请求头
user-agent
是“真实的”用户访问,因为默认的requests
user-agent
是python-requests
,网站知道它很可能是一个发送请求的脚本。在线IDE中的代码和完整示例:
输出量:
另外,你也可以使用SerpApi的Google Organic Results API。它是一个付费的API,有免费的计划。
不同的是,它将绕过来自Google或其他搜索引擎的阻止,因此最终用户不必弄清楚如何做,维护解析,而只需考虑检索什么数据。
要集成的示例代码:
输出量:
免责声明,我为SerpApi工作。