当使用python网页抓取时出现错误

nue99wik  于 2023-01-22  发布在  Python
关注(0)|答案(2)|浏览(155)

我想在两个网站上比较椰子的价格。有两个商店(网站)叫laughs和glomark。
现在,我有两个文件main.pycomparison.py。我认为问题是在笑的价格报废部分。这条线运行没有错误。我会把我的输出和预期输出bellow后的代码。

主文件.py

from compare_prices import compare_prices 
laughs_coconut = 'https://scrape-sm1.github.io/site1/COCONUT%20market1super.html'
glomark_coconut = 'https://glomark.lk/coconut/p/11624'
compare_prices(laughs_coconut,glomark_coconut)

比较.py

import requests
import json
from bs4 import BeautifulSoup

#Imitate the Mozilla browser.
user_agent = {'User-agent': 'Mozilla/5.0'}

def compare_prices(laughs_coconut,glomark_coconut):
    # Aquire the web pages which contain product Price
    laughs_coconut = requests.get(laughs_coconut)
    glomark_coconut = requests.get(glomark_coconut)

    # LaughsSuper supermarket website provides the price in a span text.
    soup_laughs = BeautifulSoup(laughs_coconut.text, 'html.parser')
    price_laughs = soup_laughs.find('span',{'class': 'price'}).text
    
    
    # Glomark supermarket website provides the data in jason format in an inline script.
    soup_glomark = BeautifulSoup(glomark_coconut.text, 'html.parser')
    script_glomark = soup_glomark.find('script', {'type': 'application/ld+json'}).text
    data_glomark = json.loads(script_glomark)
    price_glomark = data_glomark['offers'][0]['price']

    
    #TODO: Parse the values as floats, and print them.
    price_laughs = price_laughs.replace("Rs.","")
    price_laughs = float(price_laughs)
    price_glomark = float(price_glomark)
    print('Laughs   COCONUT - Item#mr-2058 Rs.: ', price_laughs)
    print('Glomark  Coconut Rs.: ', price_glomark)
    
    # Compare the prices and print the result
    if price_laughs > price_glomark:
        print('Glomark is cheaper Rs.:', price_laughs - price_glomark)
    elif price_laughs < price_glomark:
        print('Laughs is cheaper Rs.:', price_glomark - price_laughs)    
    else:
        print('Price is the same')

我的代码运行时没有错误,并且作为输出显示。

Laughs   COCONUT - Item#mr-2058 Rs.:  0.0

Glomark  Coconut Rs.:  110.0

Laughs is cheaper Rs.: 110.0

但预期输出为:

Laughs   COCONUT - Item#mr-2058 Rs.:  95.0

Glomark  Coconut Rs.:  110.0

Laughs is cheaper Rs.: 15.0

注意:-<span class="price">Rs.95.00</span>这是笑椰子价格的元素

qgelzfjb

qgelzfjb1#

因为'span',{'class': 'price'}有两个项,find()方法返回第一个值,所以我们使用findAll()方法返回第二个值,所以在代码中,如果修改为price_laughs = soup_laughs.findAll('span',{'class': 'price'})[1].text,问题就解决了。

vltsax25

vltsax252#

尝试改变您选择元素的策略-有一个id来选择更具体的元素容器。

price_laughs = soup.select_one('[id^="product-price"] .price').text

关于其他网站,你也可以使用它的API来获得价格:

requests.get('https://glomark.lk/product-page/variation-detail/11624', headers={'x-requested-with': 'XMLHttpRequest'}).json()['price']

相关问题