python 如何使用BeautifulSoup提取JSON< script>元素

bihw5rsg  于 2023-08-02  发布在  Python
关注(0)|答案(2)|浏览(133)

我需要从页面的JSON部分提取不同的元素,每个元素都在@type下。
我试过this post,但没有找到。

data = soup.findAll('script', {'type':'application/ld+json'})
oJson = json.loads(data.text)["model"] gives an error of 
AttributeError: ResultSet object has no attribute 'text'

字符串
会很感激你的帮助。

<script type="application/ld+json"
{"@context":"https://schema.org/",
"@type":"Product","brand":"Salomon","category":"Basecaps","description":"<p>Leichte Sportkappe f&uuml;r Sport bei Sonne und Regen</p>","image":"https://static.bergzeit.com/product_gallery_regular/1118101-006_pic1.jpg",

"model":[
{"@type":"ProductModel","color":"fiery red","image":"https://static.bergzeit.com/product_gallery_regular/1118101-003_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/OutOfStock","name":"fiery red Cross Cap","price":24.95,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-003","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-003"},

{"@type":"ProductModel","color":"nightshade","image":"https://static.bergzeit.com/product_gallery_regular/1118101-005_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/InStock","name":"nightshade Cross Cap","price":24.77,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-005","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-005"},

{"@type":"ProductModel","color":"deep black","image":"https://static.bergzeit.com/product_gallery_regular/1118101-001_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/InStock","name":"deep black Cross Cap","price":24.95,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-001","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-001"},

{"@type":"ProductModel","color":"chambray blue","image":"https://static.bergzeit.com/product_gallery_regular/1118101-002_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/OutOfStock","name":"chambray blue Cross Cap","price":24.95,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-002","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-002"},

{"@type":"ProductModel","color":"bering sea","image":"https://static.bergzeit.com/product_gallery_regular/1118101-008_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/OutOfStock","name":"bering sea Cross Cap","price":24.95,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-008","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-008"},

{"@type":"ProductModel","color":"deep lichen green","image":"https://static.bergzeit.com/product_gallery_regular/1118101-007_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/OutOfStock","name":"deep lichen green Cross Cap","price":24.95,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-007","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-007"},

{"@type":"ProductModel","color":"white","image":"https://static.bergzeit.com/product_gallery_regular/1118101-004_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/InStock","name":"white Cross Cap","price":24.95,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-004","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-004"},

{"@type":"ProductModel","color":"peach amber","image":"https://static.bergzeit.com/product_gallery_regular/1118101-006_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/InStock","name":"peach amber Cross Cap","price":24.63,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-006","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-006"}],"name":"Cross Cap","offers":[{"@type":"AggregateOffer","availability":"http://schema.org/InStock","highPrice":24.95,"lowPrice":24.63,"priceCurrency":"EUR"}],"productId":"1118101","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/"}
</script>

yshpjwxd

yshpjwxd1#

您正在使用不推荐使用的语法:findAll应该是find_all
接下来,find_all返回一个列表,它没有text这样的属性。
下面是一个提取该数据的工作示例:

from bs4 import BeautifulSoup as bs
import requests
import json

headers= {
    'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'
}

r = requests.get("https://www.bergzeit.co.uk/p/black-diamond-womens-focus-climbing-shoe/3005027/#itemId=3005027-001", headers=headers)

soup = bs(r.text, 'html.parser')
script_w_data = soup.select_one('div[class^="product-detailed-page"] script[type="application/ld+json"]').string
json_obj = json.loads(script_w_data)
print(json_obj['brand'],'|',  json_obj['description'])

字符串
终端结果:

Black Diamond | Performance statement from the first Black Diamond climbing shoe series - world premiere


请参阅请求文档hereBeautifulSoup documentation

yh2wf1be

yh2wf1be2#

假设问题中的文本被分配给名为 data 的变量,则...

from bs4 import BeautifulSoup as BS
import json

soup = BS(data, 'lxml')

for script in soup.find_all('script'):
    j = ''.join(script.getText().splitlines()[1:])
    for model in json.loads(j)['model']:
        print(model['sku']) # for example

字符串

输出:

1118101-003
1118101-005
1118101-001
1118101-002
1118101-008
1118101-007
1118101-004
1118101-006

相关问题