R语言 用gsub把一个圈套分成市和县

lrl1mhuk  于 2023-06-19  发布在  其他
关注(0)|答案(1)|浏览(110)

我试图将字符串拆分为城市和国家,但当城市或国家超过一个单词时,会遇到困难(例如,aix-en-provence或United States)。我使用的当前代码将适用于大多数像巴黎,法国,但不适用于类似于上述的代码。

locations
 paris_france
 miami_united states
 new york_united states
 aix-en-provence_france
 auckland_new_zealand

current code used
city = gsub("([A-z]+)_([A-z]+)", "\\1", locations)
country = gsub("([A-z]+)_([A-z]+)", "\\2", locations)

所以现在曼城将回归巴黎,乡村将回归法国,这很好,但奥克兰和新西兰将回归。猜测它显然是一个让它识别“_”之前或之后的多个单词的案例

ddrv8njm

ddrv8njm1#

由于new_zealand,我们必须采取一点额外的谨慎。

base R

strcapture("^([^_]+)_(.*)$", locs$locations, proto = c(city="", country=""))
#              city       country
# 1           paris        france
# 2           miami united states
# 3        new york united states
# 4 aix-en-provence        france
# 5        auckland   new_zealand

整理

library(tidyr)
separate_wider_delim(locs, locations, delim = "_", names = c("city", "country"), too_many = "merge")
# # A tibble: 5 × 2
#   city            country      
#   <chr>           <chr>        
# 1 paris           france       
# 2 miami           united states
# 3 new york        united states
# 4 aix-en-provence france       
# 5 auckland        new_zealand

数据

locs <- structure(list(locations = c("paris_france", "miami_united states", "new york_united states", "aix-en-provence_france", "auckland_new_zealand")), row.names = c(NA, -5L), class = "data.frame")

相关问题