使用beautiful soup抓取图像,但找不到img/src标记

n3schb8v  于 2021-09-08  发布在  Java
关注(0)|答案(0)|浏览(335)

我正在写一个漂亮的汤脚本,如下所示:

for i in urls:
    url = remove_extra_char_in_values(i)
    response_get = requests.get(url)
    if response_get.status_code == 200:
        site = requests.get(url, headers=HEADERS).text
        bs = BeautifulSoup(site, 'html.parser')
        images = bs.find_all('img', {'src': re.compile('.jpg')})
        for image in images:
            name = image['alt']
            images_link = image['src']

            with open(name.replace(' ', '-') + '.jpg', 'wb') as f:
                img = requests.get(images_link)
                f.write(img.content)
                print('Writing: ', name)
    elif response_get.status_code != 200:
        print('could not download link: ', url)

我正在尝试从标签中提取图像,这些标签看起来链接如下:

<img data-test-id="Img" 
src="https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440" 
srcset="https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 
100w, https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 200w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 320w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 360w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 375w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 400w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 414w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 640w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 720w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 750w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 768w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 828w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 1024w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 1280w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 1366w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 1440w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 1536w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 1920w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 2048w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 2560w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 2732w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 2880w, 
https://content.api.news/v3/images/bin/be02edb43dcdddb2e9d81eab1080cda1?width=1440 3840w" 
sizes="(min-width: 1400px) 100vw, (min-width: 1200px) 100vw, (min-width: 980px) 100vw, (min- 
width: 600px) 100vw, 100vw" class="Img-ifujty gIUKwk"

链接有img/src,但没有拾取它,‘data test id=“img”’是否与之相关,或者最后是类?如果是这样的话,我如何在find_all('img',{'src':re.compile('jpg')}中包含这些和img/scr?似乎也没有使用js。
当我在这里时,如何让bs浏览所有网页,从整个网站获取图像?不,只是调用页面。
要清楚的编辑。
该脚本不会从相关站点下载图像。在这种情况下https://www.vogue.com.au/vogue-living
rgs,

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题