我在R中有一个文件(“my_file”),看起来像这样:
NAME Address_Parse
1 name1 [('372', 'StreetNumber'), ('river', 'StreetName'), ('St', 'StreetType'), ('S', 'StreetDirection'), ('toronto', 'Municipality'), ('ON', 'Province'), ('A1C', 'PostalCode'), ('9R7', 'PostalCode')]
2 name2 [('208', 'StreetNumber'), ('ocean', 'StreetName'), ('St', 'StreetType'), ('E', 'StreetDirection'), ('Toronto', 'Municipality'), ('ON', 'Province'), ('J8N', 'PostalCode'), ('1G8', 'PostalCode')]
为了防止结构混乱,文件如下所示
my_file = structure(list(NAME = c("name1", "name2"), Address_Parse = c("[('372', 'StreetNumber'), ('river', 'StreetName'), ('St', 'StreetType'), ('S', 'StreetDirection'), ('toronto', 'Municipality'), ('ON', 'Province'), ('A1C', 'PostalCode'), ('9R7', 'PostalCode')]",
"[('208', 'StreetNumber'), ('ocean', 'StreetName'), ('St', 'StreetType'), ('E', 'StreetDirection'), ('Toronto', 'Municipality'), ('ON', 'Province'), ('J8N', 'PostalCode'), ('1G8', 'PostalCode')]"
)), class = "data.frame", row.names = c(NA, -2L))
目标:对于每一行,我想取每个“元素”(例如“StreetNumber”、“StreetName”、“StreetType”等),并将其转换为一个新列。看起来如下:
name StreetNumber StreetName StreetType StreetDirection Municipality Province PostalCode
1 name1 372 river St S toronto ON A1C9R7
2 name2 208 ocean St E Toronto ON J8N1G8
对我来说,地址字段似乎是JSON格式的(我可能是错的)。我试着寻找不同的方法来解析JSON。例如,我试着应用下面提供的答案(R: convert nested JSON in a data frame column to addtional columns in the same data frame):
library(dplyr)
library(tidyr)
library(purrr)
library(jsonlite)
final = my_file %>%
mutate(
json_parsed = map(Address_Parse, ~ fromJSON(., flatten=TRUE))
) %>%
unnest(json_parsed)
但是,这会产生以下错误:
Error in `mutate()`:
! Problem while computing `json_parsed = map(Address_Parse, ~fromJSON(., flatten = TRUE))`.
Caused by error:
! lexical error: invalid char in json text.
[('372', 'StreetNumber'), ('rive
(right here) ------^
Run `rlang::last_error()` to see where the error occurred.
我又尝试了另一种方法:
final <- my_file %>%
rowwise() %>%
do(data.frame(fromJSON(.$Address_Parse , flatten = T))) %>%
ungroup() %>%
bind_cols(my_file %>% select(-Address_Parse ))
但我现在得到一个新的错误:
Error: lexical error: invalid char in json text.
[('372', 'StreetNumber'), ('rive
(right here) ------^
谁能告诉我怎么解决这个问题?
谢谢你,谢谢你
2条答案
按热度按时间06odsfpq1#
您可能需要稍微重新调整JSON的格式才能使其正常工作。
我使用了
stream_in
函数而不是fromJSON
,因为它通常更快,并且可以自动处理很多事情。qc6wkl3g2#
在使用
fromJSON
之前,我们可能需要对文本进行一些修改-即,保留"key":value
,而不是(value, 'key')
,并在[
、]
之前、之后插入{
、}
或者使用
reticulate
,因为它似乎是元组