我试图在python中使用scrapy来刮取图像src,但是相反,表单img元素想从没有class属性或src属性的元素中刮取,任何人都可以请帮助我如何做到这一点,提前感谢。
<source media="(min-width: 1024px)" sizes="1140px" srcset="https://static1.simpleflyingimages.com/wordpress/wp-content/uploads/2022/09/Thomas-Boon-Air-Canada-2.jpg?q=50&fit=contain&w=1140&h=&dpr=1.5">
我尝试的代码:
from urllib.parse import urljoin
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from datetime import datetime
import pandas as pd
class NewsSpider(scrapy.Spider):
name = "simpleflying"
def start_requests(self):
url = input("Enter the article url: ")
yield scrapy.Request(url, callback=self.parse_dir_contents)
def parse_dir_contents(self, response):
Feature_Image = [i.strip() for i in response.css('source media="(min-width: 1024px)" ::attr(data-origin-srcset)').getall()][0]
yield{
'Feature_Image': Feature_Image,
}
这是网站的链接:https://simpleflying.com/best-airlines-travel-with-babies-young-children/
1条答案
按热度按时间qij5mzcb1#
您可以尝试下一个示例
输出: