Chrome 使用Python从Google搜索中获得响应

xu3bshqb  于 2023-06-03  发布在  Go
关注(0)|答案(1)|浏览(360)

我试图从谷歌搜索中获得文本。我的想法是,iam在正常的谷歌搜索中使用python进行搜索,然后打印出搜索结果旁边正确网站上的文本。但是我找到的代码不起作用。

from googlesearch import search
from bs4 import BeautifulSoup
import requests

def google_search(query):
    results = search(query, num_results=1)
    for result in results:
        response = requests.get(result)
        soup = BeautifulSoup(response.content, 'html.parser')
        answer = soup.find('div', class_='kno-rdesc')
        if answer:
            return answer.text
response = google_search("Was ist die Hauptstadt von Deutschland")
print(response)

所以基本上它应该给予回正确的盒子,你可以找到有时在正确的网站。希望有人能帮上忙。谢谢

lymnna71

lymnna711#

据我所知,您希望从knowledge graph中提取描述。
要找到所需的选择器,可以使用select_one()方法。此方法接受要搜索的选择器。要获取所需的元素,需要引用带有.kno-rdesc类的通用div,并选择其中的span标记。生成的选择器如下所示:.kno-rdesc span
由于对于某些搜索查询,知识图可能会丢失,因此有必要处理此异常:

try:
    result = soup.select_one(".kno-rdesc span").text
    print(result)
except:
    print('There is no knowledge graph for this search query')

此外,确保您使用的是请求头user-agent来充当“真实的”的用户访问。因为默认的requestsuser-agentpython-requests,网站理解它最有可能是发送请求的脚本。Check what's your user-agent
代码和full example in online IDE

from bs4 import BeautifulSoup
import requests, lxml

# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
    "q": "Was ist die Hauptstadt von Deutschland",
    "hl": "en",  # language
    "gl": "us"   # country of the search, US -> USA
}

# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
}

html = requests.get("https://www.google.com/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")

try:
    result = soup.select_one(".kno-rdesc span").text
    print(result)
except:
    print('There is no knowledge graph for this search query')

输出:

Berlin, Germany’s capital, dates to the 13th century. Reminders of the city's turbulent 20th-century history include its Holocaust memorial and the Berlin Wall's graffitied remains. Divided during the Cold War, its 18th-century Brandenburg Gate has become a symbol of reunification. The city's also known for its art scene and modern landmarks like the gold-colored, swoop-roofed Berliner Philharmonie, built in 1963. ― Google

或者,您可以使用SerpApi中的Google Knowledge Graph API。这是一个付费的API与免费计划。
不同之处在于,它将绕过来自Google或其他搜索引擎的阻止,因此最终用户不必弄清楚如何做到这一点,维护解析,而只需考虑检索哪些数据。
要集成的示例代码:

from serpapi import GoogleSearch

params = {
    "api_key": "...",               # https://serpapi.com/manage-api-key
    "engine": "google",             # search engine
    "q": "Was ist die Hauptstadt von Deutschland"
    # other parameters
}

search = GoogleSearch(params)       # data extraction on the SerpApi backend
result_dict = search.get_dict()     # JSON -> Python dict

result = result_dict.get("knowledge_graph", {}).get("description")
print(result)

输出:

Berlin, Germany’s capital, dates to the 13th century. Reminders of the city's turbulent 20th-century history include its Holocaust memorial and the Berlin Wall's graffitied remains. Divided during the Cold War, its 18th-century Brandenburg Gate has become a symbol of reunification. The city's also known for its art scene and modern landmarks like the gold-colored, swoop-roofed Berliner Philharmonie, built in 1963. ― Google

免责声明我为SerpApi工作

相关问题