I am trying to scrape the image src using scrapy in python but instead, form img element want to scrape from < source> element that has no class

plupiseo 于 2022-11-09 发布在 Python

关注(0)|答案(1)|浏览(161)

我试图在python中使用scrapy来刮取图像src，但是相反，表单img元素想从没有class属性或src属性的元素中刮取，任何人都可以请帮助我如何做到这一点，提前感谢。

<source media="(min-width: 1024px)" sizes="1140px" srcset="https://static1.simpleflyingimages.com/wordpress/wp-content/uploads/2022/09/Thomas-Boon-Air-Canada-2.jpg?q=50&amp;fit=contain&amp;w=1140&amp;h=&amp;dpr=1.5">

我尝试的代码：

from urllib.parse import urljoin
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from datetime import datetime
import pandas as pd

class NewsSpider(scrapy.Spider):
    name = "simpleflying"

    def start_requests(self):
        url = input("Enter the article url: ")

        yield scrapy.Request(url, callback=self.parse_dir_contents)

    def parse_dir_contents(self, response):
        Feature_Image = [i.strip() for i in response.css('source media="(min-width: 1024px)" ::attr(data-origin-srcset)').getall()][0]
        yield{
            'Feature_Image': Feature_Image,
        }

这是网站的链接：https://simpleflying.com/best-airlines-travel-with-babies-young-children/

scrapy

来源：https://stackoverflow.com/questions/74073363/i-am-trying-to-scrape-the-image-src-using-scrapy-in-python-but-instead-form-img

1条答案

按热度按时间

qij5mzcb1#

您可以尝试下一个示例

import scrapy
class NewsSpider(scrapy.Spider):
    name = "articles"
    def start_requests(self):
        url='https://simpleflying.com/best-airlines-travel-with-babies-young-children/'
        yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):

        img_url = response.xpath('//*[@class="heading_image responsive-img img-size-heading-image-full-width expandable "]/figure/picture/img/@data-img-url').get()
        yield {
            'img_url':img_url
        }

输出：

{'img_url': 'https://static1.simpleflyingimages.com/wordpress/wp-content/uploads/2022/09/Thomas-Boon-Air-Canada-2.jpg'}

赞(0）回复(0）举报 2022-11-09

我来回答

I am trying to scrape the image src using scrapy in python but instead, form img element want to scrape from < source> element that has no class

1条答案

相关问题

热门标签

最新问答