如何通过Scrapy CSS选择器从RSS提要中获取媒体：缩略图

ljsrvy3e 于 2023-08-05 发布在其他

关注(0)|答案(1)|浏览(141)

我正在尝试读取包含图像URL的RSS提要。XML文件如下所示。

<item>
<title>Item Title Goes Here</title>
<link>https://www.google.com</link>
<media:thumbnail url="https://i.pcmag.com/imagery/reviews/03aizylUVApdyLAIku1AvRV-39.fit_scale.size_1028x578.v1605559903.png" height="251" width="330"/>
<media:content url="https://i.pcmag.com/imagery/reviews/03aizylUVApdyLAIku1AvRV-39.fit_scale.size_1028x578.v1605559903.png" type="image/png" height="600" width="800"/>
</item>

字符串
从标签title和其他标签获取数据是很好的：

title.css('title ::text').get()

型
然而，我无法<media:thumbnail url>通过CSS选择器获取数据。有什么想法吗

scrapy

来源：https://stackoverflow.com/questions/76646731/how-to-get-mediathumbnail-from-rss-feed-via-scrapy-css-selector

1条答案

按热度按时间

j91ykkif1#

缩略图的url将被获取：

from bs4 import BeautifulSoup

with open('RSS.xml', 'r') as f:
    contents = f.read()
soup = BeautifulSoup(contents, features='xml')

# Read the attribute url
print(soup.find('thumbnail').attrs['url'])

字符串

赞(0）回复(0）举报 2023-08-05

我来回答

如何通过Scrapy CSS选择器从RSS提要中获取媒体：缩略图

1条答案

相关问题

热门标签

最新问答