scrapy 从调用API调用的网页中获取进行该调用所需的信息

ycggw6v2  于 12个月前  发布在  其他
关注(0)|答案(3)|浏览(179)

我想从the following web page中提取评级的元素:


的数据
下一个是什么代码:

<ol data-bv-v="contentItemCollection:2" class="bv-content-list bv-content-list-reviews">
   <li data-bv-v="contentItem:9" class="bv-content-item bv-content-top-review bv-content-review bv-content-loaded" itemprop="review" itemscope="" itemtype="http://schema.org/Review" data-content-id="Reviews-158638580">
      <div data-bv-v="inlineProfile:13" class="bv-author-profile">
         <div class="bv-inline-profile">
            <div class="bv-author-avatar">
               <div class="bv-author-avatar-nickname">
                  <div class="bv-content-author-name" role="presentation">
                     <button type="button" class="bv-author bv-fullprofile-popup-target bv-focusable" aria-label="Voir le profil de oceaned03.">
                        <h3>oceaned03</h3>
                     </button>
                  </div>
               </div>
            </div>
            <div class="bv-popup-prosnap-userinfo bv-contains-profile-button">
               <div class="bv-content-author-name" role="presentation">
                  <button type="button" class="bv-author bv-fullprofile-popup-target bv-focusable" aria-label="Voir le profil de oceaned03.">
                     <h3>oceaned03</h3>
                  </button>
               </div>
               <div class="bv-author-location">  <span> Clermont Ferrand </span>  </div>
               <div class="bv-author-userstats">
                  <ul class="bv-author-userstats-list" role="list">
                     <li class="bv-author-userstats-reviews">  <span class="bv-author-userstats-data"> Avis : </span> <span class="bv-author-userstats-value">1</span> </li>
                     <li class="bv-author-userstats-votes">   </li>
                  </ul>
               </div>
               <div class="bv-content-author-badges">
                  <ul class="bv-content-author-badges-list" role="presentation">              </ul>
               </div>
               <div class="bv-author-userinfo">
                  <ul role="list">
                     <li class="bv-author-cdv bv-first ">
                        <!-- UIA-7763 - removed default display so only translated strings matched by FB will display; can't remove defaultDisplay field entirely due to compilation errors, so used a value of '' --> <span class="bv-author-userinfo-data">Sexe </span> <span class="bv-author-userinfo-value">une femme</span> 
                     </li>
                     <li class="bv-author-cdv  ">
                        <!-- UIA-7763 - removed default display so only translated strings matched by FB will display; can't remove defaultDisplay field entirely due to compilation errors, so used a value of '' --> <span class="bv-author-userinfo-data">Age</span> <span class="bv-author-userinfo-value">18-24 ans</span> 
                     </li>
                     <li class="bv-author-cdv  ">
                        <!-- UIA-7763 - removed default display so only translated strings matched by FB will display; can't remove defaultDisplay field entirely due to compilation errors, so used a value of '' --> <span class="bv-author-userinfo-data">Couleur des yeux</span> <span class="bv-author-userinfo-value">Bleus</span> 
                     </li>
                     <li class="bv-author-cdv  bv-last">
                        <!-- UIA-7763 - removed default display so only translated strings matched by FB will display; can't remove defaultDisplay field entirely due to compilation errors, so used a value of '' --> <span class="bv-author-userinfo-data">Type de peau</span> <span class="bv-author-userinfo-value">Sèche</span> 
                     </li>
                  </ul>
               </div>
            </div>
         </div>
      </div>
      <div class="bv-content-item-author-profile-offset bv-content-item-author-profile-offset-on">
         <div class="bv-content-container">
            <div class="bv-content-core ">
               <div class="bv-content-header">
                  <div class="bv-content-data-summary">
                     <div class="bv-content-badges-container">
                        <ul class="bv-badge-summary bv-badge-first bv-badge-top-three" role="presentation">
                           <li class="bv-badge-image bv-badge-content-loyaltyyes--im-a-beauty-insider" role="presentation"> <img src="https://display.ugc.bazaarvoice.com/static/Sephora-FR/main_site/951/3232/fr_FR/images/badgeImages/loyaltyyes--im-a-beauty-insider.png" alt="Carte White" title="Carte White"> </li>
                        </ul>
                     </div>
                     <div class="bv-content-header-meta">
                        <span class="bv-content-rating bv-rating-ratio" itemprop="reviewRating" itemscope="" itemtype="http://schema.org/Rating">
                           <meta itemprop="ratingValue" content="5">
                           <meta itemprop="bestRating" content="5">
                           <span class="bv-rating-stars-container"> <abbr title="5 sur 5 étoiles." class="bv-rating bv-rating-stars bv-rating-stars-off" aria-hidden="true"> ★★★★★ </abbr> <abbr title="5 sur 5 étoiles." class="bv-rating-max bv-rating-stars bv-rating-stars-on bv-width-from-rating-stats-100" aria-hidden="true"> ★★★★★ </abbr> <span class="bv-off-screen">5 sur 5 étoiles.</span> </span> 
                        </span>
                        <div class="bv-content-meta-wrapper">
                           <div class="bv-content-meta" role="presentation">
                              <div class="bv-content-reference-data bv-content-author-name">
                                 <button type="button" class="bv-author bv-fullprofile-popup-target bv-focusable" aria-label="Voir le profil de oceaned03." itemprop="author">
                                    <h3>oceaned03</h3>
                                 </button>
                                 <div class="bv-content-datetime" role="presentation">
                                    <meta itemprop="dateCreated" content="2020-06-24">
                                    <meta itemprop="datePublished" content="2020-06-24">
                                    <span class="bv-content-datetime-dot" aria-hidden="true">·</span> <span class="bv-content-datetime-stamp">il y a 5 mois &nbsp;</span> 
                                 </div>
                              </div>
                           </div>
                        </div>
                     </div>
                     <div class="bv-content-title-container">
                        <h3 class="bv-content-title" itemprop="headline">    Satisfaite   </h3>
                     </div>
                  </div>
               </div>
               <div class="bv-content-details-offset-off">
                  <div class="bv-content-summary">
                     <div class="bv-content-summary-body" itemprop="reviewBody">
                        <div class="bv-content-summary-body-text">
                           <p>Très contente de mon achat. Je cherchais ce parfum depuis un temps en magasin et je suis heureuse qu’il soit disponible en ligne il sent tellement bon !! En plus en promo, génial ! <br>Livraison très rapide !</p>
                        </div>
                        <div class="bv-content-data">
                           <div class="bv-content-product-questions">  </div>
                           <div class="bv-content-tag-dimensions">  </div>
                           <ul class="bv-content-data-recommend-yes">
                              <li class="bv-content-data-label-container"> <span class="bv-content-data-icon" aria-hidden="true">✔</span> <span class="bv-content-data-label">Oui</span>, </li>
                              <li class="bv-content-data-value">  je recommande ce produit. </li>
                           </ul>
                        </div>
                     </div>
                  </div>
               </div>
            </div>
         </div>
         <div class="bv-content-actions-container bv-active-feedback">
            <div data-bv-v="feedback:12" class="bv-feedback-container">
               <div class="bv-content-feedback-vote bv-content-feedback-vote-active" role="group" aria-label="Utilité du contenu">
                  <div class="bv-content-feedback-vote-request">
                     <p>Avez-vous trouvé cet avis utile ?</p>
                  </div>
                  <div class="bv-content-feedback-btn-container"> <button type="button" class="bv-content-btn bv-content-btn-feedback-yes bv-focusable" aria-label="1&nbsp;personne a trouvé cet avis utile. Oui, review de oceaned03 est utile."> <span aria-hidden="true"> Oui · <span class="bv-content-btn-count" aria-hidden="true">1</span> </span> </button> <button type="button" class="bv-content-btn bv-content-btn-feedback-no bv-focusable" aria-label="0&nbsp;personne a trouvé cet avis inutile. Non, review de oceaned03 n'est pas utile."> <span aria-hidden="true"> Non · <span class="bv-content-btn-count" aria-hidden="true">0</span> </span> </button> </div>
                  <div class="bv-content-feedback-vote bv-content-feedback-vote-active"> <button type="button" class="bv-content-report-btn bv-focusable" aria-label="Marquer «&nbsp;Satisfaite&nbsp;» de oceaned03 comme inapproprié.">   Signalez un contenu inapproprié  </button> </div>
               </div>
            </div>
         </div>
         <div class="bv-inline-form-container"></div>
         <div data-bv-v="secondaryContentList:10" class="bv-secondary-content-list">
            <ol data-bv-v="secondaryContentItemCollection:11" class="bv-content-list bv-content-list-clientresponses" role="presentation">
            </ol>
         </div>
      </div>
   </li>
   <li data-bv-v="contentItem:14" class="bv-content-item bv-content-top-review bv-content-review bv-content-loaded" itemprop="review" itemscope="" itemtype="http://schema.org/Review" data-content-id="Reviews-156726085">
      <div data-bv-v="inlineProfile:18" class="bv-author-profile">
         <div class="bv-inline-profile">
            <div class="bv-author-avatar">
               <div class="bv-author-avatar-nickname">
                  <div class="bv-content-author-name" role="presentation">
                     <button type="button" class="bv-author bv-fullprofile-popup-target bv-focusable" aria-label="Voir le profil de Jo56.">
                        <h3>Jo56</h3>
                     </button>
                  </div>
               </div>
            </div>
            <div class="bv-popup-prosnap-userinfo bv-contains-profile-button">
               <div class="bv-content-author-name" role="presentation">
                  <button type="button" class="bv-author bv-fullprofile-popup-target bv-focusable" aria-label="Voir le profil de Jo56.">
                     <h3>Jo56</h3>
                  </button>
               </div>
               <div class="bv-author-location">  <span> Lorient </span>  </div>
               <div class="bv-author-userstats">
                  <ul class="bv-author-userstats-list" role="list">
                     <li class="bv-author-userstats-reviews">  <span class="bv-author-userstats-data"> Avis : </span> <span class="bv-author-userstats-value">3</span> </li>
                     <li class="bv-author-userstats-votes">   </li>
                  </ul>
               </div>
               <div class="bv-content-author-badges">
                  <ul class="bv-content-author-badges-list" role="presentation">              </ul>
               </div>
               <div class="bv-author-userinfo">
                  <ul role="list">
                     <li class="bv-author-cdv bv-first ">
                        <!-- UIA-7763 - removed default display so only translated strings matched by FB will display; can't remove defaultDisplay field entirely due to compilation errors, so used a value of '' --> <span class="bv-author-userinfo-data">Sexe </span> <span class="bv-author-userinfo-value">une femme</span> 
                     </li>
                     <li class="bv-author-cdv  ">
                        <!-- UIA-7763 - removed default display so only translated strings matched by FB will display; can't remove defaultDisplay field entirely due to compilation errors, so used a value of '' --> <span class="bv-author-userinfo-data">Age</span> <span class="bv-author-userinfo-value">18-24 ans</span> 
                     </li>
                     <li class="bv-author-cdv  ">
                        <!-- UIA-7763 - removed default display so only translated strings matched by FB will display; can't remove defaultDisplay field entirely due to compilation errors, so used a value of '' --> <span class="bv-author-userinfo-data">Couleur des yeux</span> <span class="bv-author-userinfo-value">Marrons</span> 
                     </li>
                     <li class="bv-author-cdv  bv-last">
                        <!-- UIA-7763 - removed default display so only translated strings matched by FB will display; can't remove defaultDisplay field entirely due to compilation errors, so used a value of '' --> <span class="bv-author-userinfo-data">Type de peau</span> <span class="bv-author-userinfo-value">Sèche</span> 
                     </li>
                  </ul>
               </div>
            </div>
         </div>
      </div>
      <div class="bv-content-item-author-profile-offset bv-content-item-author-profile-offset-on">
         <div class="bv-content-container">
            <div class="bv-content-core ">
               <div class="bv-content-header">
                  <div class="bv-content-data-summary">
                     <div class="bv-content-badges-container">
                        <ul class="bv-badge-summary bv-badge-first bv-badge-top-three" role="presentation">
                           <li class="bv-badge-image bv-badge-content-loyaltyyes--im-a-vib-rouge" role="presentation"> <img src="https://display.ugc.bazaarvoice.com/static/Sephora-FR/main_site/951/3232/fr_FR/images/badgeImages/loyaltyyes--im-a-vib-rouge.png" alt="Carte Gold" title="Carte Gold"> </li>
                        </ul>
                     </div>
                     <div class="bv-content-header-meta">
                        <span class="bv-content-rating bv-rating-ratio" itemprop="reviewRating" itemscope="" itemtype="http://schema.org/Rating">
                           <meta itemprop="ratingValue" content="5">
                           <meta itemprop="bestRating" content="5">
                           <span class="bv-rating-stars-container"> <abbr title="5 sur 5 étoiles." class="bv-rating bv-rating-stars bv-rating-stars-off" aria-hidden="true"> ★★★★★ </abbr> <abbr title="5 sur 5 étoiles." class="bv-rating-max bv-rating-stars bv-rating-stars-on bv-width-from-rating-stats-100" aria-hidden="true"> ★★★★★ </abbr> <span class="bv-off-screen">5 sur 5 étoiles.</span> </span> 
                        </span>
                        <div class="bv-content-meta-wrapper">
                           <div class="bv-content-meta" role="presentation">
                              <div class="bv-content-reference-data bv-content-author-name">
                                 <button type="button" class="bv-author bv-fullprofile-popup-target bv-focusable" aria-label="Voir le profil de Jo56." itemprop="author">
                                    <h3>Jo56</h3>
                                 </button>
                                 <div class="bv-content-datetime" role="presentation">
                                    <meta itemprop="dateCreated" content="2020-05-22">
                                    <meta itemprop="datePublished" content="2020-05-22">
                                    <span class="bv-content-datetime-dot" aria-hidden="true">·</span> <span class="bv-content-datetime-stamp">il y a 6 mois &nbsp;</span> 
                                 </div>
                              </div>
                           </div>
                        </div>
                     </div>
                     <div class="bv-content-title-container">
                        <h3 class="bv-content-title" itemprop="headline">    Excellent   </h3>
                     </div>
                  </div>
               </div>
               <div class="bv-content-details-offset-off">
                  <div class="bv-content-summary">
                     <div class="bv-content-summary-body" itemprop="reviewBody">
                        <div class="bv-content-summary-body-text">
                           <p>J’adore les parfums de cette marque car je trouve qu’ils sont captivant et surtout ils tiennent toute la journée ! Ils ont des odeurs originales et que l’on ne retrouve pas partout ! Je conseil fortement</p>
                        </div>
                        <div class="bv-content-data">
                           <div class="bv-content-product-questions">  </div>
                           <div class="bv-content-tag-dimensions">  </div>
                           <ul class="bv-content-data-recommend-yes">
                              <li class="bv-content-data-label-container"> <span class="bv-content-data-icon" aria-hidden="true">✔</span> <span class="bv-content-data-label">Oui</span>, </li>
                              <li class="bv-content-data-value">  je recommande ce produit. </li>
                           </ul>
                        </div>
                     </div>
                  </div>
               </div>
            </div>
         </div>
         <div class="bv-content-actions-container bv-active-feedback">
            <div data-bv-v="feedback:17" class="bv-feedback-container">
               <div class="bv-content-feedback-vote bv-content-feedback-vote-active" role="group" aria-label="Utilité du contenu">
                  <div class="bv-content-feedback-vote-request">
                     <p>Avez-vous trouvé cet avis utile ?</p>
                  </div>
                  <div class="bv-content-feedback-btn-container"> <button type="button" class="bv-content-btn bv-content-btn-feedback-yes bv-focusable" aria-label="2&nbsp;personnes ont trouvé cet avis utile. Oui, review de Jo56 est utile."> <span aria-hidden="true"> Oui · <span class="bv-content-btn-count" aria-hidden="true">2</span> </span> </button> <button type="button" class="bv-content-btn bv-content-btn-feedback-no bv-focusable" aria-label="0&nbsp;personne a trouvé cet avis inutile. Non, review de Jo56 n'est pas utile."> <span aria-hidden="true"> Non · <span class="bv-content-btn-count" aria-hidden="true">0</span> </span> </button> </div>
                  <div class="bv-content-feedback-vote bv-content-feedback-vote-active"> <button type="button" class="bv-content-report-btn bv-focusable" aria-label="Marquer «&nbsp;Excellent&nbsp;» de Jo56 comme inapproprié.">   Signalez un contenu inapproprié  </button> </div>
               </div>
            </div>
         </div>
         <div class="bv-inline-form-container"></div>
         <div data-bv-v="secondaryContentList:15" class="bv-secondary-content-list">
            <ol data-bv-v="secondaryContentItemCollection:16" class="bv-content-list bv-content-list-clientresponses" role="presentation">
            </ol>
         </div>
      </div>
   </li>
</ol>

字符串
例如,我尝试了以下方法:

response.css('li.data-content-id').extract()


但它返回一个空数组。

更新

在查看了开发人员工具中页面的其他元素后,似乎我正在寻找的数据在batch.json文档中给出:



数据是从这个请求中生成的:
https://api.bazaarvoice.com/data/batch.json?passkey=iohrnzjadededr160osgfvimy&apiversion=5.5&displaycode=3232-fr_fr&resource.q0=products&filter.q0=id%3Aeq%3AP618001&stats.q0=questions%2Creviews&filteredstats.q0=questions%2Creviews&filter_questions.q0=contentlocale%3Aeq%3Afr_FR&filter_answers.q0=contentlocale%3Aeq%3Afr_FR&filter_reviews.q0=contentlocale%3Aeq%3Afr_FR&filter_reviewcomments.q0=contentlocale%3Aeq%3Afr_FR&resource.q1=questions&filter.q1=productid%3Aeq%3AP618001&filter.q1=contentlocale%3Aeq%3Afr_FR&sort.q1=lastapprovedanswersubmissiontime%3Adesc&stats.q1=questions&filteredstats.q1=questions&include.q1=authors%2Cproducts%2Canswers&filter_questions.q1=contentlocale%3Aeq%3Afr_FR&filter_answers.q1=contentlocale%3Aeq%3Afr_FR&limit.q1=10&offset.q1=0&limit_answers.q1=10&resource.q2=reviews&filter.q2=isratingsonly%3Aeq%3Afalse&filter.q2=productid%3Aeq%3AP618001&filter.q2=contentlocale%3Aeq%3Afr_FR&sort.q2=submissiontime%3Adesc&stats.q2=reviews&filteredstats.q2=reviews&include.q2=authors%2Cproducts%2Ccomments&filter_reviews.q2=contentlocale%3Aeq%3Afr_FR&filter_reviewcomments.q2=contentlocale%3Aeq%3Afr_FR&filter_comments.q2=contentlocale%3Aeq%3Afr_FR&limit.q2=5&offset.q2=0&limit_comments.q2=3&callback=BV._internal.dataHandler0
由于我想自动化它,我想知道是否以及如何从它包含的信息或我可以从网站上获得的信息中复制它。

oxosxuxt

oxosxuxt1#

你的代码使用了类选择器。你的html告诉data-content-id是属性。不熟悉scrappy来判断它是否有属性选择,但是你可以使用data-content-id来代替

response.xpath('//li[@data-content-id]')

字符串

vltsax25

vltsax252#

data-content-id还可以为页面上的某些其他类型的内容提供某些其他值,因此上面的xpath可能会获取不需要的部分。
下面的CSS应该工作:

response.css('[itemprop="review"]').get()

字符串
或者,如果你真的想包含data-content-id,那么我相信我们应该使用:用途:

response.css('[data-content-id*="Reviews-"]').get()


这将获取data-content-id属性包含Reviews-的所有元素。我们在这里使用通配符,因为Reviews后面的数字看起来像一个ID,每个review元素都不同。

xeufq47z

xeufq47z3#

所需数据不在页面的html代码中!!
它由单独的API调用加载并由JavaScript呈现。
要查看原始html代码,您需要(在Chrome中)使用查看源代码选项CTRL+U

  • 不检查->元素。
    附加信息- >抓取动态内容(docs

相关问题