我想分隔一个字段使用tidyr:分离和保留分隔符和使用负回看

nfzehxib  于 2023-04-03  发布在  其他
关注(0)|答案(1)|浏览(104)

我想使用separate,后面带否定的外观,并保留分隔符。我下面的解决方案不保留姓氏的第一个大写字母。
有一个不使用否定的答案,我不知道如何修改它的负面回顾。
How do I split a string with tidyr::separate in R and retain the values of the separator string?

tidyr::tibble(myname = c("HarlanNelson")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = "(?<!^)[[:upper:]]")
#> # A tibble: 1 × 2
#>   first  last 
#>   <chr>  <chr>
#> 1 Harlan elson

reprex package(v2.0.1)于2022-10-20创建

tidyr::tibble(myname = c("HarlanNelson", "Another Person")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = c(" ", "(?<!^)[[:upper:]]"))
#> Warning in gregexpr(pattern, x, perl = TRUE): argument 'pattern' has length > 1
#> and only the first element will be used
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1].
#> # A tibble: 2 × 2
#>   first        last  
#>   <chr>        <chr> 
#> 1 HarlanNelson <NA>  
#> 2 Another      Person

reprex package(v2.0.1)于2022-10-20创建

tidyr::tibble(myname = c("HarlanNelson", "Another Person", "someone else")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = c(" ", "(?<!^)[[:upper:]]"))
#> Warning in gregexpr(pattern, x, perl = TRUE): argument 'pattern' has length > 1
#> and only the first element will be used
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1].
#> # A tibble: 3 × 2
#>   first        last  
#>   <chr>        <chr> 
#> 1 HarlanNelson <NA>  
#> 2 Another      Person
#> 3 someone      else

创建于2022-10-20由reprex package(v2.0.1)

mzsu5hc0

mzsu5hc01#

这是我想出来的。
但这只是对https://stackoverflow.com/a/51415101/4629916上的答案的理解
来自@卡梅隆
并应用到我的问题上。

tidyr::tibble(myname = c("HarlanNelson", "Another Person", "someone else")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = "(?<=[[:lower:]])(?=[[:upper:]])", extra = 'merge', fill = 'right') |> 
  tidyr::separate(col = first, into = c("first", "last2"), sep = " ", fill = 'right', extra = 'merge') |> 
  dplyr::mutate(last = dplyr::coalesce(last, last2)) |>  
  dplyr::select(-last2)
#> # A tibble: 3 × 2
#>   first   last  
#>   <chr>   <chr> 
#> 1 Harlan  Nelson
#> 2 Another Person
#> 3 someone else
tidyr::tibble(myname = c("HarlanNelson", "Another Person", "someone else")) |>  
  tidyr::separate(col = myname, into = c("first", "last"), sep = "(?<!^)(?=[[:upper:]])", extra = 'merge', fill = 'right') |> 
  tidyr::separate(col = first, into = c("first", "last2"), sep = " ", extra = 'merge', fill = 'right') |> 
  dplyr::mutate(last = dplyr::coalesce(last, last2)) |> 
  dplyr::select(-last2)
#> # A tibble: 3 × 2
#>   first   last  
#>   <chr>   <chr> 
#> 1 Harlan  Nelson
#> 2 Another Person
#> 3 someone else

相关问题