Scrapy -获取已标识类后的内容

e7arh2l6 于 2022-11-09 发布在其他

关注(0)|答案(1)|浏览(118)

我正在尝试从这个html中提取内容：

<div class=product_detail>
  <p>
    Random stuff
  </p>
  <p>
    <span class="brand_color">Brand:</span>Product Brand
  </p>
</div>

我可以通过response.css('span.brand_color::text')获得“Brand：“，但无法获得“Product Brand”。
我想建立的东西：

find the brand_color span --〉这在所有情况下都不存在
1.上去，去找父亲
1.然后往下走，以某种方式忽略跨度，然后选择：：文本。
(my但是逻辑可能完全扭曲）。
多谢了！

scrapy

来源：https://stackoverflow.com/questions/72506311/scrapy-get-content-after-an-identified-class

1条答案

按热度按时间

jtjikinw1#

我建议使用BeautifulSoup，它是一个非常强大的解析库。
阅读更多关于BeautifulSoup的信息，请访问：https://beautiful-soup-4.readthedocs.io/en/latest/
您可以轻松地安装它：pip install beautifulsoup4

HTML = '<div class=product_detail> <p> Random stuff </p> <p> <span class="brand_color">Brand:</span>Product Brand </p> </div>'

parsed_object = BeautifulSoup(HTML)
res = [p.get_text().strip() for p in parsed_object.find_all('p')]
print(res)

您将获得以下内容：

['Random stuff', 'Brand:Product Brand']

然后，您可以使用split提取数据

brand_name, paragraph_content = res[1].split(':')
print(brand_name)        # Brand
print(paragraph_content) # Product Brand

赞(0）回复(0）举报 2022-11-09

我来回答

Scrapy -获取已标识类后的内容

1条答案

相关问题

热门标签

最新问答