在网页上查看完整的JSON

xe55xuns 于 2023-02-26 发布在其他

关注(0)|答案(1)|浏览(171)

我有这个网页在这里：https://www.reddit.com/r/FunnyandSad/comments/112yfey/really_surprised_how_this_didnt_become_a_big_news/
我想从这个网站提取所有评论。
我在前面的问题（Converting JSON Lists into Data Frames）中学习了如何做到这一点：

library(jsonlite)
library(purrr)
library(dplyr)
library(tidyr)

URL  <- "https://www.reddit.com/r/FunnyandSad/comments/112yfey/really_surprised_how_this_didnt_become_a_big_news/.json"

results = fromJSON(URL) |>
  pluck("data", "children") |> 
  bind_rows() |>
  filter(row_number() > 1) |>
  unnest(data) |>
  select(id, author, body) |>
  mutate(comment_id = row_number(), .before = "id")

- 我的问题：**当我查看结果时，我发现只收集了37条评论：

> dim(results)
[1] 37  4

但在实际页面上，却有1000多条评论：

- 有没有办法修改上面的代码，以便提取更多的注解--有没有办法查看完整的JSON？**

谢谢!

- 更新日期：**

根据评论中的建议，我尝试使用"read_json"函数：

# results[[2]]$data$children[[i]]$data$body

results = read_json(URL)

body_list <- list()
for (i in seq_along(results[[2]]$data$children)) {
    body <- results[[2]]$data$children[[i]]$data$body
    body_list[[i]] <- body
}

但这只返回36条评论而不是所有评论？

JSON

来源：https://stackoverflow.com/questions/75549590/viewing-the-full-json-on-a-webpage

1条答案

按热度按时间

6qftjkof1#

数据具有嵌套结构。您可以使用以下函数进行递归扩展

get_comments <- function(x) {
  if (is.null(x) || (length(x) ==1 && x=="")) return(NULL)
  result = list()
  if (is.null(names(x))) {
    for(p in x) {
      result = c(result, get_comments(p))
    }
  }
  else {
    if (x$kind == "Listing") {
      result = c(result, get_comments(x$data$children))
    } else if (x$kind == "t1") {
      result = c(result, list(x$data), get_comments(x$data$replies))
    }
  }
  if (length(result)>0) {
    result
  } else {
    NULL
  }
}

URL  <- "https://www.reddit.com/r/FunnyandSad/comments/112yfey/really_surprised_how_this_didnt_become_a_big_news/.json"
json <- jsonlite::read_json(URL)
comments <- get_comments(json)
sapply(comments, function(x) x$body)

但这仍然只返回198个值。有一堆“更多”块只有一个ID，你需要进行额外的API调用来获得更多信息。更多细节请参见morechildren端点。看起来你必须通过身份验证才能访问这些端点。

赞(0）回复(0）举报 2023-02-26

我来回答

在网页上查看完整的JSON

1条答案

相关问题

热门标签

最新问答