R语言 从数据框的两列中提取特定值

cig3rfwq  于 2023-02-06  发布在  其他
关注(0)|答案(2)|浏览(350)

我有一个数据框,其中只有两列是我感兴趣的,这两列包含我需要提取的标签,共有4个标签:CR, PD, PR, SD
在我现在要添加的示例中,您可以看到这两列和这4个标签,但其中包含一些其他不需要的字符串,如io.responsepfs

structure(list(`!Sample_characteristics_ch1.22` = c("duration.of.io.tx: 174", 
"io.response: PD", "io.response: PD", "duration.of.io.tx: 21", 
"io.response: PD", "duration.of.io.tx: 21", "io.response: PD", 
"io.response: PD", "io.response: PR", "duration.of.io.tx: 157", 
"io.response: PD"), `!Sample_characteristics_ch1.23` = c("io.response: PD", 
"pfs: 106", "pfs: 57", "io.response: PD", "pfs: 30", "io.response: PD", 
"pfs: 25", "pfs: 17", "pfs: 338", "io.response: SD", "pfs: 41"
)), row.names = c("Patient sample BACI139", "Patient sample BACI140", 
"Patient sample BACI142", "Patient sample BACI143", "Patient sample BACI144", 
"Patient sample BACI148", "Patient sample BACI149", "Patient sample BACI150", 
"Patient sample BACI151", "Patient sample BACI152", "Patient sample BACI153"
), class = "data.frame")
我需要的

添加一个只包含这4个标签的新列(随便你怎么称呼它)。我不想删除或更改原始列,因为我喜欢保持原始数据不变。

示例

在第一行中可以看到,第二列是io.response: PD,因此,新列将简单地为PD
第二行第一列的值为io.response: PD,因此新列在该行也为PD
谢谢大家!

hwamh0ep

hwamh0ep1#

这段代码应该可以满足您的需要:

library(dplyr)
library(stringr)
df |> 
  rowwise() |> 
  mutate(newcol = str_extract(str_c(`!Sample_characteristics_ch1.22`, `!Sample_characteristics_ch1.23`), "PD|CR|PR|SD")) |>
  ungroup()
ua4mk5z4

ua4mk5z42#

如果dplyr对你有效,你可以使用coalesce()来获取第一个非 * NA * 值(如果有的话)。为了提取标签,一个相当严格的正则表达式,包含look behind((?<=...))和标签集((CR|PD|PR|SD)):

library(dplyr)
library(stringr)
df %>% tibble::rownames_to_column() %>% as_tibble() %>% 
  mutate(io.response = coalesce(
    str_extract(`!Sample_characteristics_ch1.22`, "(?<=^io.response: )(CR|PD|PR|SD)$"),
    str_extract(`!Sample_characteristics_ch1.23`, "(?<=^io.response: )(CR|PD|PR|SD)$")))
#> # A tibble: 11 × 4
#>    rowname                `!Sample_characteristics_ch1.22` !Sample_cha…¹ io.re…²
#>    <chr>                  <chr>                            <chr>         <chr>  
#>  1 Patient sample BACI139 duration.of.io.tx: 174           io.response:… PD     
#>  2 Patient sample BACI140 io.response: PD                  pfs: 106      PD     
#>  3 Patient sample BACI142 io.response: PD                  pfs: 57       PD     
#>  4 Patient sample BACI143 duration.of.io.tx: 21            io.response:… PD     
#>  5 Patient sample BACI144 io.response: PD                  pfs: 30       PD     
#>  6 Patient sample BACI148 duration.of.io.tx: 21            io.response:… PD     
#>  7 Patient sample BACI149 io.response: PD                  pfs: 25       PD     
#>  8 Patient sample BACI150 io.response: PD                  pfs: 17       PD     
#>  9 Patient sample BACI151 io.response: PR                  pfs: 338      PR     
#> 10 Patient sample BACI152 duration.of.io.tx: 157           io.response:… SD     
#> 11 Patient sample BACI153 io.response: PD                  pfs: 41       PD     
#> # … with abbreviated variable names ¹​`!Sample_characteristics_ch1.23`,
#> #   ²​io.response

创建于2023年2月1日,使用reprex v2.0.2

相关问题