python-3.x 为什么我的脚本搜索谷歌不工作?

toe95027  于 2023-08-08  发布在  Python
关注(0)|答案(2)|浏览(97)

我创建了一个python脚本来搜索google,提示符正常工作,但无论我输入什么都没有结果。下面是python脚本,它应该提示您进行搜索,并询问您想要多少结果,然后将其吐出到.csv文件中。

import requests
from bs4 import BeautifulSoup
import csv

def get_google_search_results_urls(query, num_results=10):
base_url = "https://www.google.com/search"
params = {"q": query, "num": num_results}

response = requests.get(base_url, params=params)
response.raise_for_status()

soup = BeautifulSoup(response.text, "html.parser")
result_urls = []

for result in soup.select(".tF2Cxc"):
    link = result.select_one("a")
    if link and link["href"].startswith("http"):
        result_urls.append(link["href"])

return result_urls

def write_urls_to_csv(urls, csv_file):
with open(csv_file, "w", newline="") as csvfile:
    csv_writer = csv.writer(csvfile)
    csv_writer.writerow(["URL"])

    for url in urls:
        csv_writer.writerow([url])

if __name__ == "__main__":
search_query = input("Enter your Google search query: ")
num_results = int(input("Enter the number of search results to fetch: "))

search_results = get_google_search_results_urls(search_query, num_results)

if search_results:
    csv_file_name = "google_search_results.csv"
    write_urls_to_csv(search_results, csv_file_name)
    print(f"Search results URLs written to {csv_file_name}")
else:
    print("No search results found.")

字符串

qfe3c7zg

qfe3c7zg1#

get_google_search_results_urls()

base_url = "https://www.google.com/search?"
    params = {"q": query, "num": num_results}

    BROWSER_HEADER = {
        "Accept-Language": "en-US",
        "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
    }

    response = requests.get(url=base_url, params=params, headers=BROWSER_HEADER)
    response.raise_for_status()

字符串
同样在if __name__ == "__main__"中,如果你想检查search_results是否为空,你必须改变if条件:

if len(search_results) != 0:


我将"Accept-Language""User-Agent"传入http头。
找到你自己的User-Agent(假设你的浏览器是chrome):

  • 转到地址栏
  • 输入chrome://version/
js5cn81o

js5cn81o2#

你的base_url是Bing的,而你的Soup select中的tF2Cxc类是特定于Google的。另外请注意,可以有httpshttp URL。
使用Bing,以下代码似乎可以工作:

for link in soup.find_all("a", class_="tilk"):
  if link and (link["href"].startswith("https") or link["href"].startswith("http")):
    result_urls.append(link["href"])```

字符串

相关问题