Chrome 如何刮谷歌?

lskq00tm  于 2022-12-06  发布在  Go
关注(0)|答案(2)|浏览(255)

所以我想刮谷歌,我已经成功地刮craigslist使用这种方法,但我不能缝刮谷歌的一些原因(是的,当然我改变了类和东西。。)这是我想刮:
我想刮网站描述:

from selenium import webdriver

path = r"C:\Users\Skid\Desktop\chromedriver.exe"

driver = webdriver.Chrome(path)

driver.get("https://www.google.com/#q=python+webscape+google")

posts = driver.find_elements_by_class_name("r")
for post in posts:
    print(post.text)
nafvub8i

nafvub8i1#

解决了,在抓取前添加一个计时器(导入时间,time.sleep(2))。

b5lpy0ml

b5lpy0ml2#

你可以刮谷歌搜索描述网站使用BeautifulSoup网页抓取库。
更多关于CSS选择器是什么,以及使用CSS选择器的缺点。
在联机IDE中检查代码。

from bs4 import BeautifulSoup
import requests, lxml, json

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}

# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls

# this URL params is taken from the actual Google search URL
# and transformed to a more readable format
params = {
  "q": "python web scrape google",            # query
  "gl": "us",                                 # country to search from
  "hl": "en",                                 # language
}

html = requests.get("https://www.google.com/search", headers=headers, params=params, timeout=30)
soup = BeautifulSoup(html.text, "lxml")

website_description_data = []

for result in soup.select(".tF2Cxc"):
  website_name = result.select_one(".yuRUbf a")["href"]
  description = result.select_one(".lEBKkf").text  

  website_description_data.append({
    "website_name" : website_name,
    "description" : description
  })

  print(json.dumps(website_description_data, indent=2))

输出示例

[
  {
    "website_name": "https://practicaldatascience.co.uk/data-science/how-to-scrape-google-search-results-using-python",
    "description": "Mar 13, 2021 \u2014 First, we're using urllib.parse.quote_plus() to URL encode our search query. This will add + characters where spaces sit and ensure that the\u00a0..."
  }
]
[
  {
    "website_name": "https://practicaldatascience.co.uk/data-science/how-to-scrape-google-search-results-using-python",
    "description": "Mar 13, 2021 \u2014 First, we're using urllib.parse.quote_plus() to URL encode our search query. This will add + characters where spaces sit and ensure that the\u00a0..."
  },
  {
    "website_name": "https://stackoverflow.com/questions/38619478/google-search-web-scraping-with-python",
    "description": "You can always directly scrape Google results. To do this, you can use the URL https://google.com/search?q=<Query> this will return the top\u00a0..."
  }
  # ...
]

相关问题