json 从R中csv格式的FeatureCollection中提取坐标数据

vxqlmq5t  于 2023-08-08  发布在  其他
关注(0)|答案(2)|浏览(145)

我已经得到了当前在csv中的数据,其中一列名为“journeyroute”。该列具有以下数据[由于大小而截断]:

{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}

字符串
有五千行数据。我试图做的是提取LineString数据以在R中使用,但我被卡住了。有人能帮忙吗?
我尝试转换为JSON,然后取消嵌套,但出现了一个错误(代码改编自其他答案使用谷歌地球引擎):

new_df <- df %>%
    mutate(geo = map(Journey.Route, ~ jsonlite::fromJSON(.))) %>%
    as.data.frame() %>%
    unnest(geo) %>%
    filter(geo != "FeatureCollection") %>%
    mutate(coord = rep(c("x", "y"))) %>%
    pivot_wider(names_from = coord, values_from = coordinates)

Error in `mutate()`:
ℹ In argument: `coord = rep(c("x", "y"))`.
Caused by error:
! `coord` must be size 5000 or 1, not 2.
Run `rlang::last_trace()` to see where the error occurred.


应为LineString坐标的sf几何体列。

xa9qqrwz

xa9qqrwz1#

library(geojsonsf)可以读取geojson的向量,因此不需要任何行操作

  • 创建一些数据
json <- '{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}'

df <- data.frame(json = rep(json, 3))

字符串

  • 转换为sf对象
sf <- geojsonsf::geojson_sf(df$json)

  • 对数据执行您可能需要的任何其他操作
## Remove empty geometries
sf <- sf[ !sf::st_is_empty(sf), ]

## Extract just the LINESTRINGS
sf <- sf[sf::st_geometry_type(sf) == "LINESTRING", ]

## Convert to a long data.frame
df <- sfheaders::sf_to_df(sf = sf, fill = TRUE)

ru9i0ody

ru9i0ody2#

当我们处理GeoJSON字符串时,可以使用sf::st_read()gejsonsf::geojson_sfc()进行解析,以获得一些性能提升(当使用geojson_sfc()作为st_read()的插入时,性能提升约2倍,当将rowwsie st_read()与矢量化geojson_sfc()进行比较时,性能提升约100倍)。
按行分组,一次访问一行;仅保留LINESTRING几何图形(假设每个FeatureCollection一个,如提供的示例所示)。

library(dplyr)
library(sf)
#> Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(geojsonsf)

json_str <- '{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}'

# 100-row test sample
df_100 <- tibble(journey_id = 1:100, journeyroute = rep(json_str, 100))
df_100
#> # A tibble: 100 × 2
#>    journey_id journeyroute                                                      
#>         <int> <chr>                                                             
#>  1          1 "{\"type\": \"FeatureCollection\", \"features\": [{\"type\": \"Fe…
#>  2          2 "{\"type\": \"FeatureCollection\", \"features\": [{\"type\": \"Fe…
#>  3          3 "{\"type\": \"FeatureCollection\", \"features\": [{\"type\": \"Fe…
#> ...

microbenchmark::microbenchmark(
  sf = {
    # parse GeoJSON strings with sf / GEOS
    routes_sf <- df_100 %>% 
      rowwise() %>% 
      mutate(geometry = st_read(journeyroute, quiet = TRUE) %>% 
                        st_geometry() %>% 
                        `[`(st_geometry_type(.) == "LINESTRING"), .keep = "unused") %>% 
      ungroup() %>% 
      st_as_sf()
  },
  geojson_sf = {
    # parse GeoJSON strings with geojsonsf
    routes_gj <- df_100 %>% 
      rowwise() %>% 
      mutate(geometry = geojson_sfc(journeyroute) %>% 
                        `[`(st_geometry_type(.) == "LINESTRING"), .keep = "unused") %>% 
      ungroup() %>% 
      st_as_sf()
  }
)

字符串
基准测试结果和生成的sf对象:

#> Unit: milliseconds
#>        expr      min       lq     mean   median       uq      max neval cld
#>          sf 437.4351 453.1961 476.8028 464.1172 487.9901 628.0495   100  a 
#>  geojson_sf 198.3025 207.9465 219.1129 212.6965 221.7101 309.2461   100   b

routes_sf
#> Simple feature collection with 100 features and 1 field
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -4.096064 ymin: 50.40939 xmax: -4.095772 ymax: 50.4104
#> Geodetic CRS:  WGS 84
#> # A tibble: 100 × 2
#>    journey_id                                                           geometry
#>         <int>                                                   <LINESTRING [°]>
#>  1          1 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  2          2 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  3          3 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  4          4 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  5          5 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  6          6 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  7          7 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  8          8 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  9          9 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 10         10 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> # ℹ 90 more rows


创建于2023-08-04使用reprex v2.0.2

相关问题