R语言 如何一次性从数据集中提升纬度和经度

vxf3dgd4  于 2023-11-14  发布在  其他
关注(0)|答案(2)|浏览(122)

我最近一直在玩嵌套列表,能够从深层提取数据。我遇到了一个小问题,tidyr函数hoist()。我能够拉纬度和经度为5和7个地址,但有两个单独的命令。我想知道是否有可能访问列表结构与hoist()在这样一种方式,提取latlng只接受一个命令。下面是示例:

library(tidyr)
library(dplyr)
library(repurrrsive)

gmaps_cities_o <- repurrrsive::gmaps_cities
gmaps_cities_o

字符串
输出:

A tibble:5 × 2
  city         json
  <chr>       <list>
 Houston     <list [2]>         
 Washington  <list [2]>         
 New York    <list [2]>         
 Chicago     <list [2]>         
 Arlington   <list [2]>         
5 rows


要提取latlng,我必须编写两段代码:

# extract lat, long for the first address
gmaps_cities_o %>% 
    hoist(json, 
           lat = list("results", 1, "geometry", "location", "lat"),
           lng = list("results", 1, "geometry", "location", "lng")
           )


产出:

A tibble:5 × 4
 city        lat          lng         json
 <chr>       <dbl>        <dbl>      <list>
Houston     29.76043    -95.36980   <list [2]>  
Washington  47.75107    -120.74014  <list [2]>  
New York    40.71278    -74.00597   <list [2]>  
Chicago     41.87811    -87.62980   <list [2]>  
Arlington   32.73569    -97.10807   <list [2]>  
5 rows


第二个地址:

# extract lat, long for the second address
gmaps_cities_o %>% 
    hoist(json, 
           lat = list("results", 2, "geometry", "location", "lat"),
           lng = list("results", 2, "geometry", "location", "lng")
           )


输出:

A tibble:5 × 4
 city          lat          lng        json
 <chr>        <dbl>        <dbl>      <list>
Houston           NA           NA   <list [2]>  
Washington  38.90719    -77.03687   <list [2]>  
New York          NA           NA   <list [2]>  
Chicago           NA           NA   <list [2]>  
Arlington   38.87997    -77.10677   <list [2]>  
5 rows


因此,两个单独的操作可以获得5个城市中7个地址的latlng
我可以用这段代码提取latlng

gmaps_cities_o %>% 
    unnest_wider(json) %>% 
    unnest_longer(results) %>% 
    hoist(results,
          lat = list("geometry", "location", "lat"),
          lng = list("geometry", "location", "lng")
          ) %>% 
    select(city, lat, lng)


输出:

A tibble:7 × 3
 city         lat          lng
 <chr>       <dbl>        <dbl>
Houston     29.76043    -95.36980       
Washington  47.75107   -120.74014       
Washington  38.90719    -77.03687       
New York    40.71278    -74.00597       
Chicago     41.87811    -87.62980       
Arlington   32.73569    -97.10807       
Arlington   38.87997    -77.10677       
7 rows

但是我不能在一次操作中对hoist()执行它,这似乎是不对的。

gmaps_cities_o %>% 
    hoist(json, 
           lat = list("results", (?), "geometry", "location", "lat"),
           lng = list("results", (?), "geometry", "location", "lng")
           )

有嵌套列表经验的人会给我给予吗?

u2nhd7ah

u2nhd7ah1#

这是受nice answer by I_O的启发,但它可能是一个单独的答案。你可以创建一个函数my_hoist

my_hoist <- function(x, path) {
    x_flat <- unlist(x)
    x_flat[grepl(paste(path, collapse = "\\."), names(x_flat))]
}

字符串
这可以以类似于hoist的方式使用,但不指定索引:

gmaps_cities_o |>
    group_by(city) |>
    reframe(
        lat = my_hoist(json, c("results", "geometry", "location", "lat")),
        lng = my_hoist(json, c("results", "geometry", "location", "lng")),
    )

# # A tibble: 7 × 3
#   city       lat        lng         
#   <chr>      <chr>      <chr>       
# 1 Arlington  32.735687  -97.1080656 
# 2 Arlington  38.8799697 -77.1067698 
# 3 Chicago    41.8781136 -87.6297982 
# 4 Houston    29.7604267 -95.3698028 
# 5 New York   40.7127753 -74.0059728 
# 6 Washington 47.7510741 -120.7401386
# 7 Washington 38.9071923 -77.0368707

6za6bjd0

6za6bjd02#

如果您可以使用基rapply(递归地对列表应用函数)而不是hoist,则可以执行以下操作

**编辑:**包括@SamR的有用评论,并添加了一些重塑:

library(dplyr)
library(repurrrsive)

gmaps_cities_o |>
    group_by(city) |>
    reframe(prop_value = json |> unlist(),
            prop_name = names(prop_value)
            ) |>
    filter(grepl('results\\.geometry\\.location\\.(lat|lng)', prop_name)) |>
    ## reshape and clean up:
    group_by(prop_name, city) |>
    mutate(coords_no = row_number(),
           prop_name = gsub('.*\\.', '', prop_name)
           ) |>
    pivot_wider(id_cols = c(coords_no, city),
                names_from = prop_name,
                values_from = prop_value
                )

字符串
其给出:

## # A tibble: 7 x 4
##   coords_no city       lat        lng         
##       <int> <chr>      <chr>      <chr>       
## 1         1 Arlington  32.735687  -97.1080656 
## 2         2 Arlington  38.8799697 -77.1067698 
## 3         1 Chicago    41.8781136 -87.6297982 
## 4         1 Houston    29.7604267 -95.3698028 
## 5         1 New York   40.7127753 -74.0059728 
## 6         1 Washington 47.7510741 -120.7401386
## 7         2 Washington 38.9071923 -77.0368707

相关问题