使用Scrapy从类中提取URL

gj3fmq9x  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(249)

我正在尝试使用scrapy从这个网站得到一个URL列表。我有div的类,我想在它所有的标签。
这里是我试图获得每个配置文件的URL的网站链接。
https://www.letsmakeaplan.org/find-a-cfp-professional?limit=10&pg=1&sort=random&distance=5
这是尝试从上面的页面中提取URL的代码

sel = Selector(text=driver.page_source)
books1 = sel.xpath("//div[@class='faceted-search-results-container-listing']/a/@herf").extract()

这个是空的
这是网站上的代码

<<div class="faceted-search-results-container-listing" style="">
        <a href="/find-a-cfp-professional/certified-professional-profile/a9a0ca36-3c70-4ea4-a853-7f704fe4cc98" class="find-cfp-item js-card-link">
          <div class="find-cfp-item-top">
            <div class="h5 find-cfp-item-name">C. H. Simmons, CFP®</div>
            <div class="find-cfp-item-read-more"><span>view details</span></div>
          </div>

          <div class="find-cfp-item-bottom">
            <div class="find-cfp-item-column" data-column="1">
              <img src="https://login.cfp.net/eweb/photos/91475.jpg" data-default-img="/-/media/feature/cfp/lmapprofile/default-profile-avatar.jpeg" data-default-img-backup="/images/default-profile-avatar.jpeg" alt="C. Simmons Headshot" class="find-cfp-item-headshot" onerror="handleImg(this, event);">
              <div class="find-cfp-item-text">

      Simmons and Starzl Wealth Management<br>
      110 Bay St<br>
      Gadsden, AL 35901-5229<br>

              </div>
            </div>

            <div class="find-cfp-item-column" data-column="2">
              <div class="h6 find-cfp-item-column-heading">Planning Services Offered</div>
              <div class="find-cfp-item-text" data-line-clamp="4">
                Investment Planning, Retirement Planning
              </div>
            </div>

            <div class="find-cfp-item-column" data-column="3">
              <div class="find-cfp-item-column-inner">
                <div class="h6 find-cfp-item-column-heading">Client Focus</div>
                <div class="find-cfp-item-text" data-line-clamp="1">
                  None Provided
                </div>
              </div>

              <div class="find-cfp-item-column-inner">
                <div class="h6 find-cfp-item-column-heading">Minimum Investable Assets</div>
                <div class="find-cfp-item-text" data-line-clamp="1">
                  $500,000
                </div>
              </div>

            </div>
          </div>
        </a>
h22fl7wq

h22fl7wq1#

搜索结果看起来像是来自对json格式的api的 AJAX 调用,并动态呈现。
如果您改为抓取api url,则可以从原始json数据中获取所有信息...
第一个
输出:

'/find-a-cfp-professional/certified-professional-profile/b1a27bac-77f0-4796-ab7f-7e15c19d8421'
'/find-a-cfp-professional/certified-professional-profile/e493f31f-88c7-4fdd-9863-9712ba85c95c'
'/find-a-cfp-professional/certified-professional-profile/2d634f05-331e-4699-b1a8-96e7a20aa0bf'
'/find-a-cfp-professional/certified-professional-profile/d9074216-7321-469f-b42f-2988d84d4a2b'
'/find-a-cfp-professional/certified-professional-profile/7f55e98c-df27-4922-b3a4-07c341a87f65'
'/find-a-cfp-professional/certified-professional-profile/1b0377a2-4545-45af-9ac4-18a8af2ffecd'
'/find-a-cfp-professional/certified-professional-profile/66b78e79-608b-4079-86c2-d9ae84c3a762'
'/find-a-cfp-professional/certified-professional-profile/e884f42b-8239-475a-b55f-5bb6f1130a36'
'/find-a-cfp-professional/certified-professional-profile/b00abd44-5969-4f02-a052-e6ef34b60e9b'
'/find-a-cfp-professional/certified-professional-profile/10ae9e9f-f11e-4f79-91c4-05f24e0c7a0e'

相关问题