python-3.x 从Metacritic中抓取游戏数据的问题

kgsdhlau  于 11个月前  发布在  Python
关注(0)|答案(2)|浏览(101)

我正在使用BeautifulSoup从metacritic中抓取游戏数据。我试图获得每个评论者的分数和文本。我以为一切都很顺利,但当我得到回复时,我看到了这样的东西:

class="c-siteReviewPlaceholder_header"

字符串
该网站在其类中没有占位符这个词。我知道我需要针对特定的类:

class_="c-pageProductReviews_row"


这就是我的代码看起来的样子:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.metacritic.com/game/alien-isolation/critic-reviews/? 
platform=playstation-4'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
       'AppleWebKit/537.36 (KHTML, like Gecko) '\
       'Chrome/75.0.3770.80 Safari/537.36'}
 critic_review_page = requests.get(URL, headers=headers)
 soup = BeautifulSoup(critic_review_page.content, "html.parser")
 critic_review_rows = soup.find_all("div", class_="c-pageProductReviews_row")
 print(critic_review_rows)


当我打印critic_review_rows时,我看到很多类都有占位符这个词。我不知道是Metacritic不让我抓取网站还是怎么回事。就好像我抓取数据的时候数据还没有加载一样。

c3frrgcw

c3frrgcw1#

这里的主要问题是,内容是由javascript动态呈现的,requests无法处理,因为它不像浏览器那样工作,只处理第一个静态响应状态。
初始状态存储在页面源末尾的脚本中,因此您可以提取它,但更好的方法是使用调用的API:

import requests 

url = 'https://fandom-prod.apigee.net/v1/xapi/reviews/metacritic/critic/games/alien-isolation/platform/playstation-4/web?apiKey=1MOZgmNFxvmljaQR1X9KAij9Mo4xAY3u&offset=0&limit=50&sort=score&componentType=ReviewList'

requests.get(url).json()

个字符

ufj5ltwl

ufj5ltwl2#

您看到的数据是在页面上的<script>元素中的形式JavaScript。要解析有关评论的一些信息,您可以使用例如:

import re
from ast import literal_eval

import requests

url = "https://www.metacritic.com/game/alien-isolation/critic-reviews/?platform=playstation-4"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/119.0"
}

html_text = requests.get(url, headers=headers).text

for q, s, u, d in re.findall(
    r'quote:"(.*?)",.*?score:(\d+).*?url:"(.*?)",.*?date:"(.*?)"', html_text
):
    print(s)
    print(q)
    print(literal_eval(f'"{u}"'))
    print(d)
    print("-" * 80)

字符串
打印:

90
The permanent threat of death keeps you forced to the ground – we can’t remember the last game where we willingly snuck around so much – and it feels like the claustrophobic corridors and catwalks of the Sevastopol were built from that angle. Being crouched, looking up at everything… it really does give you that feeling that Creative Assembly wanted it all along – that ‘prey being hunted’ effect. It makes a refreshing change from the feeling of being overpowered and able to kill anything that appears.
http://www.play-mag.co.uk/reviews/ps4-reviews/alien-isolation-review-2/
2014-10-03
--------------------------------------------------------------------------------
90
A masterwork of atmosphere and environmental design, Alien: Isolation may be standing on the shoulders of giants, but it's also one of the few franchise games to do so that doesn't topple off disastrously.
http://www.videogamer.com/reviews/alien_isolation_review.html
2014-10-03
--------------------------------------------------------------------------------
90
It’s comfortably the best Alien game ever made, and delivers authenticity along with a new story that is worth seeing, experiencing, and fleeing from into the darkness. Never once allowing the immersion to be broken, Creative Assembly have done it. They have actually done it.
http://www.godisageek.com/2014/10/alien-isolation-review/
2014-10-03
--------------------------------------------------------------------------------

...

相关问题