抓取沃尔玛搜索结果python

vc6uscn9 于 2023-06-04 发布在 Python

关注(0)|答案(1)|浏览(166)

我正在尝试抓取沃尔玛的搜索结果。
例如，让我们转到域“https://www.walmart.com/search/?query=coffee%20machine”
并尝试从类名为search-product-result的元素中提取文本，全部使用python编写。
我尝试了selenium，并被要求验证我的身份。我试过requests，我从沃尔玛得到了禁止的页面。我试过其他的图书馆，我已经没有办法了。有什么建议吗

python

来源：https://stackoverflow.com/questions/68992032/scrape-walmart-search-results-python

1条答案

按热度按时间

iq3niunx1#

此URL中的数据正在由JavaScript加载。所以beautifulsoup在这种情况下不起作用。
但是，页面显示的数据在其HTML代码中以<script>标记中的JSON字符串形式存在。
我已经从HTML代码中提取了<script>，做了一些剥离并将文本转换为JSON。您可以从该JSON中提取任何您需要的数据。
下面是打印搜索结果的产品ID的代码。

from bs4 import BeautifulSoup
import requests
import json

headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"}
url = 'https://www.walmart.com/search?query=coffee%20machine'

r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'lxml')
s = str(soup.find('script', {'id': 'searchContent'}))
s = s.strip('<script id="searchContent" type="application/json"></script>')
j = json.loads(s)
x = j['searchContent']['preso']['items']

for i in x:
    print(i['productId'])

输出产品ID。

2RYLQXVZ80E8
7EYUEQ82RMBP
7A3VDQNS5R36
22GRP3PGSY4A
238DLP3R0M3W
52NMIX2M8SC5
1R4H630LRNSE
.
.
.

赞(0）回复(0）举报 2023-06-04

我来回答

抓取沃尔玛搜索结果python

1条答案

相关问题

热门标签

最新问答