Selenium和Scrapy无法完全加载页面

xfyts7mz  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(150)

我试着用selenium plus scrapy从vmware官方网站上抓取产品信息。但是我从来没有用代码完全加载页面,即使等待时间更长。这是我的脚本。

class VmwareSpiderSpider(scrapy.Spider):

    name = 'vmware_spider'
    allowed_domains = ['customerconnect.vmware.com']
    start_urls = [
    'https://customerconnect.vmware.com/en/downloads/details?downloadGroup=NSX-4011&productId=1339#product_downloads']

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)

        self.driver.implicitly_wait(30)
        wait = WebDriverWait(self.driver, 120, poll_frequency=5)
        wait.until(EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Read More")))  

        with open("source.html", "w") as f:
            f.write(self.driver.page_source)

        self.driver.quit()

我不熟悉网页设计和架构,所以我有几个问题:
1.如果我有20个包含“阅读更多”的项目,我如何确保所有20个项目在我开始定位元素之前都已加载。
1.在原始网页中,read more类有一个onclick属性。但是在我使用selenium检索的页面源代码中,该属性消失了。因此,单击不指向任何地方。是什么导致了这个问题?
任何提示都将不胜感激。非常感谢。

zvokhttg

zvokhttg1#

所有需要的数据都通过API调用json响应作为get方法加载。如果你按F12,你会发现网络选项卡被选中,从左上角的圆形图标刷新url,点击XHR,名称,标题,预览,你会得到关于API url的一切

import scrapy
import json

API_URL = "https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339"

class VmwareSpiderSpider(scrapy.Spider):
    name = "vm"
    start_urls = [API_URL]

    custom_settings = {
        'USER_AGENT' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
    }

    def parse(self, response):
        json_response = json.loads(response.text)
        datas = json_response["downloadFiles"]
        for data in datas:
            yield {
                "title":data.get("title"),
                "fileName": data.get('fileName'),
                "releaseDate": data.get("releaseDate"),
                "build": data.get("build")

                }

输出:

{'title': 'NSX Manager/ NSX Global Manager / NSX Cloud Service Manager for VMware ESXi', 'fileName': 'nsx-unified-appliance-4.0.1.1.0.20598732.ova', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX Manager with vCenter Plugin', 'fileName': 'nsx-embedded-unified-appliance-4.0.1.1.0.20598732.ova', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX Application Platform', 'fileName': 'VMware-NSX-Application-Platform-4.0.1.0.0.20606727.tgz', 'releaseDate': '2022-10-13', 'build': '20606727'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'Kubernetes-tools 1.21', 'fileName': 'kubernetes-tools-1.21.9-00_3.8.0-1.tar.gz', 'releaseDate': '2022-10-13', 'build': '20596968'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'Kubernetes-tools 1.23', 'fileName': 'kubernetes-tools-1.23.3-00_3.8.0-1.tar.gz', 'releaseDate': '2022-10-13', 'build': '20596968'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX SVM Appliance', 'fileName': 'VMware-NSX-Malware-Prevention-appliance-4.0.1.1.0.20598729.ova', 'releaseDate': '2022-10-13', 'build': '20598729'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX Edge for Bare Metal', 'fileName': 'nsx-edge-4.0.1.1.0.20598735.iso', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX Edge for VMware ESXi', 'fileName': 'nsx-edge-4.0.1.1.0.20598735.ova', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX Kernel Module for VMware ESXi 7.0', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-esx70.zip', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX Kernel Module for VMware ESXi 8.0', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-esx80.zip', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'Standalone Edge - Client', 'fileName': 'nsx-l2vpn-client-ovf-19300606.tar.gz', 'releaseDate': 
'2022-10-13', 'build': '19307994'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': None, 'fileName': None, 'releaseDate': None, 'build': None}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX 4.0.1.1 Upgrade Bundle', 'fileName': 'VMware-NSX-upgrade-bundle-4.0.1.1.0.20598726.mub', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX Cloud Upgrade Bundle for NSX-T 4.0.1.1', 'fileName': 'VMware-CC-upgrade-bundle-4.0.1.1.0.20598726.mub', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'Upgrade bundle for NSX-T L2 VPN Client Appliance', 'fileName': 'VMware-NSX-edge-4.0.1.1.0.20598735.nub', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': None, 'fileName': None, 'releaseDate': None, 'build': None}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for RHEL 7.6 / CentOS 7.6 / OEL 7.6', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-server-rhel76_x86_64.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for RHEL 7.6 Container', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-container-rhel76_x86_64.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for RHEL 7.7 / CentOS 7.7 / OEL 7.7', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-server-rhel77_x86_64.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for RHEL 7.7 Container', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-container-rhel77_x86_64.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for RHEL 7.8 / CentOS 7.8 / OEL 7.8', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-server-rhel78_x86_64.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for RHEL 7.8 Container', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-container-rhel78_x86_64.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for RHEL 7.9 / CentOS 7.9 / OEL 7.9', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-server-rhel79_x86_64.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for RHEL 8.0 / CentOS 8.0', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-server-rhel80_x86_64.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for RHEL 8.3 / CentOS 8.3', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-server-rhel83_x86_64.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for SUSE SLES 12sp3', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-server-linux64-sles12sp3.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for SUSE SLES 12sp4', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-server-linux64-sles12sp4.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for Ubuntu 16.04', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-server-ubuntu-xenial_amd64.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for Ubuntu 18.04', 'fileName': 'nsx-lcp-4.0.1.1.0.20598730-baremetal-server-linux64-bionic_amd64.tar.gz', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://customerconnect.vmware.com/channel/public/api/v1.0/dlg/details?locale=en_US&downloadGroup=NSX-4011&productId=1339>
{'title': 'NSX BM Server Module for Windows 2016/ 2019', 'fileName': 'nsx-lcp-4.0.1.20598730-baremetal-server-win32_vs2017.zip', 'releaseDate': '2022-10-13', 'build': '20598726'}
2022-10-31 05:03:00 [scrapy.core.engine] INFO: Closing spider (finished)
2022-10-31 05:03:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 389,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 9030,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'elapsed_time_seconds': 1.133655,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 10, 30, 23, 3, 0, 669723),
 'httpcompression/response_bytes': 23588,
 'httpcompression/response_count': 1,
 'item_scraped_count': 30,

相关问题