python-3.x 如何刮价

nwlls2ji  于 2022-11-26  发布在  Python
关注(0)|答案(2)|浏览(240)

如何为一个特定的项目刮价格?
在html中,有几个带有class="pb-current-price"的div,但是,我只对$2,299.99的价格感兴趣。我该怎么做呢?

  • 谢谢-谢谢
<div class="pb pb-large-view pb-theme-default">
                  <div class="pb-current-price ">
                   <span class="">
                    $2,299.99
                   </span>
                  </div>
                 </div>

import requests
import bs4 as bs 
from lxml import html

url = ""
agent = {"User-Agent":""}
url_get = requests.get(url,headers=agent) #, cookies=cookies)

tree = html.fromstring(url_get.content)

prices = tree.xpath('//div[@class="pb-sale-price "]/span/text()')
print(prices)

运行上面的代码将返回价格[]

qjp7pelc

qjp7pelc1#

我在你的代码上工作过。在给出代码片段之前做了几件事:
1.您正在搜索"pb-sale-price "而不是"pb-current-price "
1.正如评论中所说,我无法处理您的html页面,所以我根据您给我们的html片段模拟了答案
1.为了完整起见,我还模拟了另一篇文章
现在代码:

import requests
import bs4 as bs 
from lxml import html

# simulating the html answer
string="""
<div class="pb pb-large-view pb-theme-default">
<div class="pb-current-price ">
  <span class="">
  $2,299.99
  </span>
</div>
</div>

<div class="pb pb-large-view pb-theme-default">
<div class="pb-current-price ">
  <span class="">
  $799.99
  </span>
</div>
</div>
"""

url = "https://www.bestbuy.com/site/lg-65-class-oled-b9-series-2160p-smart-4k-uhd-tv-with-hdr/6360611.p?skuId=6360611"
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'}
# cookies = {"cookie":"COPY_HERE_YOUR_COOKIE_FROM_BROWSER"}
#url_get = requests.get(url,headers=agent) #, cookies=cookies)

#tree = html.fromstring(url_get.content)
tree = html.fromstring(string)
#print(html.tostring(tree).decode("utf-8"))

prices = tree.xpath('//div[@class="pb-current-price "]/span/text()')

# output cleaning
prices = [x.strip(' ,\n') for x in prices]
print(prices)

输出量

['$2,299.99', '$799.99']

PS -我强烈建议你也读一读this beautiful article

hpcdzsge

hpcdzsge2#

您显示的价格是正常价格。您可以从以下脚本标记之一获取

import requests, json, re

headers = {'User-Agent':'Mozilla/5.0'}
r = requests.get('https://www.bestbuy.com/site/lg-65-class-oled-b9-series-2160p-smart-4k-uhd-tv-with-hdr/6360611.p?skuId=6360611&intl=nosplash', headers = headers)
p = re.compile(r'regularPrice\\":([\d.]+),')
price = p.findall(r.text)[0]
print(price)

相关问题