scrapy 如何对POST请求的主体生成进行反向工程

sqougxex  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(151)

我正在尝试从Google Play中抓取评论。Google Play在页面滚动到最后时会动态加载评论。我拦截了浏览器发送的检索评论的帖子请求,发现每个请求唯一会更改的是请求的正文。我很难理解的是请求的正文是如何生成的。
第一个请求的正文是这样的:

f.req: [[["UsvDTd","[null,null,[2,null,[40,null,\"CpUBCpIBKm0KOfc7ms0D_z7jKJielp7Fz8_Pz8_Pms3OzpuZyJvMnMXOxYmSxc3MyczPz8vIycjMysbHxszPysb__hAoITbZQaENmbWoMU2VCwWZPGwZOdccwQD8MmXEUABaCwlwT4zmNQBa2BADYMm1lu0EMiEKHwodYW5kcm9pZF9oZWxwZnVsbmVzc19xc2NvcmVfdjI\"],null,[]],[\"com.feelingtouch.zf3d\",7]]",null,"generic"]]]

这是第二个要求

f.req: [[["UsvDTd","[null,null,[2,null,[40,null,\"CpUBCpIBKm0KOfc7msyg_28-Rpielp7Fz8_Pz8_Pm56eypyZzcycm8XOxYmSxc3MyczPz8vIycjMysbHxszPysb__hB4ITbZQaENmbWoMZI5V7V-7g3BObnBkABfM2XEUABaCwli2aizD1W9ExADYMm1lu0EMiEKHwodYW5kcm9pZF9oZWxwZnVsbmVzc19xc2NvcmVfdjI\"],null,[]],[\"com.feelingtouch.zf3d\",7]]",null,"generic"]]]

我是否可以通过某种方式对请求的生成过程进行反向工程?
我试着使用 selenium ,但在向下滚动几十次后,RAM使用率上升, selenium 变得没有React。

oaxa6hgo

oaxa6hgo1#

改变的是分页标记。但是,还有一些其他的东西。
下面是完整的编码请求主体,其中的参数 Package 在#{}中(number_of_results、pagination_token和product_id)。

f.req=%5B%5B%5B%22UsvDTd%22%2C%22%5Bnull%2Cnull%2C%5B2%2Cnull%2C%5B#{number_of_results}%2Cnull%2C#{pagination_token}%5D%2Cnull%2C%5B%5D%5D%2C%5B%5C%22#{product_id}%5C%22%2C7%5D%5D%22%2Cnull%2C%22generic%22%5D%5D%5D

所以每次你滚动页面时,pagination_token都会改变。他们用它来检索下一页的结果。
你不需要对令牌本身进行反向工程。你可以在检查页面源代码时找到第一个令牌,然后,每次你请求检索结果时,next_page_toke都会包含在其中。所以,你只需不断替换令牌,直到到达最后一个页面,并检索所有的评论。
或者,您也可以使用第三方解决方案,如SerpApi。我们为您处理代理,解析验证码,并解析所有丰富的结构化数据。
用于检索YouTube评论的示例python代码(也可在其他库中使用):

from serpapi import GoogleSearch

params = {
  "api_key": "SECRET_API_KEY",
  "engine": "google_play_product",
  "store": "apps",
  "gl": "us",
  "product_id": "com.google.android.youtube",
  "all_reviews": "true"
}

search = GoogleSearch(params)
results = search.get_dict()

JSON输出示例:

"reviews": [
    {
      "title": "Qwerty Jones",
      "avatar": "https://play-lh.googleusercontent.com/a/AATXAJwSQC_a0OIQqkAkzuw8nAxt4vrVBgvkmwoSiEZ3=mo",
      "rating": 3,
      "snippet": "Overall a great app. Lots of videos to see, look at shorts, learn hacks, etc. However, every time I want to go on the app, it says I need to update the game and that it's \"not the current version\". I've done it about 3 times now, and it's starting to get ridiculous. It could just be my device, but try to update me if you have any clue how to fix this. Thanks :)",
      "likes": 586,
      "date": "November 26, 2021"
    },
    {
      "title": "matthew baxter",
      "avatar": "https://play-lh.googleusercontent.com/a/AATXAJy9NbOSrGscHXhJu8wmwBvR4iD-BiApImKfD2RN=mo",
      "rating": 1,
      "snippet": "App is broken, every video shows no dislikes even after I hit the button. I've tested this with multiple videos and now my recommended is all messed up because of it. The ads are longer than the videos that I'm trying to watch and there is always a second ad after the first one. This app seriously sucks. I would not recommend this app to anyone.",
      "likes": 352,
      "date": "November 28, 2021"
    },
    {
      "title": "Operation Blackout",
      "avatar": "https://play-lh.googleusercontent.com/a-/AOh14GjMRxVZafTAmwYA5xtamcfQbp0-rUWFRx_JzQML",
      "rating": 2,
      "snippet": "YouTube used to be great, but now theyve made questionable and arguably stupid decisions that have effectively ruined the platform. For instance, you now have the grand chance of getting 30 seconds of unskipable ad time before the start of a video (or even in the middle of it)! This happens so frequently that its actually a feasible option to buy an ad blocker just for YouTube itself... In correlation with this, YouTube is so sensitive twords the public they decided to remove dislikes. Why????",
      "likes": 370,
      "date": "November 24, 2021"
    },
    ...
  ],
  "serpapi_pagination": {
    "next": "https://serpapi.com/search.json?all_reviews=true&engine=google_play_product&gl=us&hl=en&next_page_token=CpEBCo4BKmgKR_8AwEEujFG0VLQA___-9zuazVT_jmsbmJ6WnsXPz8_Pz8_PxsfJx5vJns3Gxc7FiZLFxsrLysnHx8rIx87Mx8nNzsnLyv_-ECghlTCOpBLShpdQAFoLCZiJujt_EovhEANgmOjCATIiCiAKHmFuZHJvaWRfaGVscGZ1bG5lc3NfcXNjb3JlX3YyYQ&product_id=com.google.android.youtube&store=apps",
    "next_page_token": "CpEBCo4BKmgKR_8AwEEujFG0VLQA___-9zuazVT_jmsbmJ6WnsXPz8_Pz8_PxsfJx5vJns3Gxc7FiZLFxsrLysnHx8rIx87Mx8nNzsnLyv_-ECghlTCOpBLShpdQAFoLCZiJujt_EovhEANgmOjCATIiCiAKHmFuZHJvaWRfaGVscGZ1bG5lc3NfcXNjb3JlX3YyYQ"
  }

请查看documentation以了解更多详细信息。
playground上实时测试搜索。

  • 免责声明:我在SerpApi工作。*

相关问题