R语言 用“/”和“,”分隔多个列

j0pj023g  于 2022-12-06  发布在  其他
关注(0)|答案(4)|浏览(190)

我正在清理一些数据,其中有多个列,需要拆分成行与','和'/'。数据表下面解释它的源代码看起来像什么。

df <- data.table(
   b = c("a", "d/e/f", "g,h"),
     c = c("1", "2,3,4", "5/6")
   )

我尝试过使用separate_rows,但它一次只能在这些分隔符中的一个上拆分一列。
编辑:我正在寻找的数据表看起来大致如下:

df_clean <- data.table(
  b = c("a", "d", "d", "d", 
        "e", "e", "e", "f", 
        "f", "f", "g", "g",
        "h", "h"),
  c = c("1", "2", "3", "4",
        "2", "3", "4",
        "2", "3", "4",
        "5", "6", 
        "5", "6")
)
amrnrhlw

amrnrhlw1#

根据补充说明更新了答案。
在每一列上运行一次separate_rows以获得所有排列。您可以使用正则表达式模式指定多个分隔符。

library(tidyr)

df %>%
  separate_rows(b, sep = '/|,') %>%
  separate_rows(c, sep = '/|,')

#> # A tibble: 14 × 2
#>    b     c    
#>    <chr> <chr>
#>  1 a     1    
#>  2 d     2    
#>  3 d     3    
#>  4 d     4    
#>  5 e     2    
#>  6 e     3    
#>  7 e     4    
#>  8 f     2    
#>  9 f     3    
#> 10 f     4    
#> 11 g     5    
#> 12 g     6    
#> 13 h     5    
#> 14 h     6
ogq8wdun

ogq8wdun2#

也许这会有所帮助:[1][2][3][4][5][6][7][8]
对于第一列:

s <- strsplit(df$b, split = c(",","/"))
data.frame(a = rep(df$a, sapply(s, length)), b = unlist(s))
mpbci0fu

mpbci0fu3#

带有cSplit的选项

library(splitstackshape)
cSplit(df, "b", sep = "/|,", "long", fixed = FALSE) |> 
   cSplit("c", sep = "/|,", "long", fixed = FALSE)
  • 输出
b c
 1: a 1
 2: d 2
 3: d 3
 4: d 4
 5: e 2
 6: e 3
 7: e 4
 8: f 2
 9: f 3
10: f 4
11: g 5
12: g 6
13: h 5
14: h 6
jqjz2hbq

jqjz2hbq4#

一个data.table选项:

# option 1
foo = \(x) unlist(strsplit(x, ",|/"))
df[, do.call(CJ, lapply(.SD, foo)), .I][, !"I"]

类似地,在碱基R中:

sep = ",|/"
Map(
  expand.grid,
  strsplit(df$b, sep),
  strsplit(df$c, sep)
) |> 
  do.call(rbind, args = _)

测试结果

#          b      c
#     <char> <char>
#  1:      a      1
#  2:      d      2
#  3:      d      3
#  4:      d      4
#  5:      e      2
#  6:      e      3
#  7:      e      4
#  8:      f      2
#  9:      f      3
# 10:      f      4
# 11:      g      5
# 12:      g      6
# 13:      h      5
# 14:      h      6

相关问题