在R中将多列文本拆分为不同的列

igetnqfo  于 2023-07-31  发布在  其他
关注(0)|答案(2)|浏览(133)

我有一个类似于以下数据集的列:

Initial_Data 3Mo_Data 6Mo_Data Irrelevant_Col1 Irrelevant_Col2
18/24        14/14    NA       1               1
4/24         NA       6/14     0               0

df <- structure(list(Initial_Data = c("18/24", "4/24"), `3Mo_Data` = c("14/14", 
NA), `6Mo_Data` = c(NA, "6/14"), Irrelevant_Col1 = 1:0,     Irrelevant_Col2 = 1:0), class = "data.frame", row.names = c(NA, -2L))

字符串
我想用这样一种方式来分割它,以识别“Data”的所有列,然后将它们分割成三列:
1.一个带有小数(最初是字符变量)的,表示为小数。
1.带有分子的第二列
1.第三个带有分母的新列
同时忽略不相关的列,以便看起来像下面那样:

Initial_Data 3Mo_Data 6Mo_Data Irrelevant_Col1 Irrelevant_Col2 Inial_Data_Numerator  Initial_Data_Denominator 3Mo_Data_Numerator 3Mo_Data_Denominator 6Mo_Data_Numerator 6Mo_Data_Denominator 
0.75         1        NA       1               1               18                    24                       14                 14                   NA                 NA
0.17         NA       0.43     0               0               4                     24                       NA                 NA                   6                  14


我尝试了类似于下面的操作来生成分子和分母列:

test <- df %>%
  mutate(across(contains("Data"),
         ~ paste0(.x, "Numerator") = str_extract(., "^\\d+"),
         ~ paste0(.x, "Denominator") = str_extract(.,"(?<=\\D)\\d+"))


但给我等号的错误,也许我不能用这种方式使用paste 0?
提前感谢您的帮助!

yr9zkbsy

yr9zkbsy1#

tidyverse工作流:

library(dplyr)
library(tidy)

df %>%
  separate_wider_delim(ends_with("Data"), delim = '/',
                       names_sep = '_', names = c("Num", "Denom")) %>%
  mutate(across(ends_with("Num"), as.numeric, .names = "{sub('_Num', '', .col)}") /
         across(ends_with("Denom"), as.numeric),
         .before = 1)

# # A tibble: 2 × 11
#   Initial_Data `3Mo_Data` `6Mo_Data` Initial_Data_Num Initial_Data_Denom `3Mo_Data_Num` `3Mo_Data_Denom` `6Mo_Data_Num` `6Mo_Data_Denom` Irrelevant_Col1 Irrelevant_Col2
#          <dbl>      <dbl>      <dbl> <chr>            <chr>              <chr>          <chr>            <chr>          <chr>                      <int>           <int>
# 1        0.75           1     NA     18               24                 14             14               NA             NA                             1               1
# 2        0.167         NA      0.429 4                24                 NA             NA               6              14                             0               0

字符串
mutate()的另一个演示文稿,在across()中使用cur_column()

df %>%
  separate_wider_delim(ends_with("Data"), delim = '/',
                       names_sep = '_', names = c("Num", "Denom")) %>%
  mutate(across(ends_with("Num"),
                ~ as.numeric(.x) / as.numeric(get(sub("Num", "Denom", cur_column()))),
                .names = "{sub('_Num', '', .col)}"),
         .before = 1)

k3fezbri

k3fezbri2#

这里有一种方法,使用separate_wider_delim

library(tidyverse)

df <- separate_wider_delim(df, cols= c("Initial_Data", "3Mo_Data", "6Mo_Data"), delim = "/", names_sep = "_")
colnames(df) <- str_replace_all(colnames(df), c("_1$" = "_Numerator", "_2$" = "_Denominator"))

字符串

相关问题