我试图在下面的网址中获得网站上一个项目的价格。但是,我发现一些问题时,看网站的源页面。
网址是:https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love
源代码页面中我感兴趣的部分如下(我猜):
<script type="application/ld+json">
[{
"@context":"http://schema.org",
"@type":"Product",
"productID":"25372685655708131",
"name":"LOVE bracelet, small model",
"description":"#LOVE# bracelet, small model, yellow gold 750/1000. Supplied with a screwdriver. Width: 3.65 mm (for size 17). Now available in a slimmer version, Cartier continues to write the story of the #LOVE# bracelet. Same design, same oval shape, same story: a timeless – yet slightly slimmer – creation which is fastened using a screwdriver. The closure is designed with a functional screw on one side of the bracelet and a hinge on the other. To determine the size of your #LOVE# bracelet, measure your wrist, adding one centimetre to your size for a tighter fit, or two centimetres for a looser fit.",
"image":["https://www.cartier.com/variants/images/25372685655708131/img1/w960.jpg"],
"offers":
[{"@type":"Offer","availability":"http://schema.org/InStock","priceCurrency":"GBP","price":"4100","sku":"0400574782829","url":"https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html"}]}]
</script>
我尝试了以下步骤:
import json
from bs4 import BeautifulSoup
import requests
from multiprocessing import Pool
import pandas as pd
data = {'url':[],'offers_price':[]}
def get_price(url):
soup = BeautifulSoup(requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}).content, "html.parser")
data = json.loads(soup.find_all('script', {'type': 'application/ld+json'})[-1].get_text())
return url, int(data['offers']['price'])
if __name__ == '__main__':
urls = ['https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love']
with Pool(processes=4) as pool:
for url, price in pool.imap_unordered(get_price, urls):
data['offers_price'].append(price)
data['url'].append(url)
print(data)
但不成功。你会怎么处理这个案子?
1条答案
按热度按时间aor9mmx11#
我能够得到价格,但我得到了它从
product-price
标签:输出:
顺便问一下,你确定要附加网址和价格吗?我认为你应该这样做: