替换R中数据框所有列的字符串

x4shl7ld  于 2023-02-27  发布在  其他
关注(0)|答案(2)|浏览(113)

我在R中有一个 Dataframe ,它包含一些序列样本的覆盖率信息,列中有很多文本数据,我只想从中提取覆盖率数字
这是密码

df <- data.frame(sampleA = c("There is a 91.24% of reference with a coverageData >= 1X", "There is a 90.89% of reference with a coverageData >= 2X", "There is a 90.46% of reference with a coverageData >= 3X"),
        sampleB = c("There is a 91.22% of reference with a coverageData >= 1X", "There is a 90.99% of reference with a coverageData >= 2X", "There is a 90.77% of reference with a coverageData >= 3X")
        )

这是数据框的外观

sampleA
1 There is a 91.24% of reference with a coverageData >= 1X
2 There is a 90.89% of reference with a coverageData >= 2X
3 There is a 90.46% of reference with a coverageData >= 3X
                                                   sampleB
1 There is a 91.22% of reference with a coverageData >= 1X
2 There is a 90.99% of reference with a coverageData >= 2X
3 There is a 90.77% of reference with a coverageData >= 3X

我希望得到如下输出

sampleA sampleB
1 91.24  91.22
2 90.89  90.99
3 90.46  90.77

我看到可以使用mutate_all。但不确定语法

jjhzyzn0

jjhzyzn01#

我们可以在dplyr::across()中使用readr::parse_number()

library(readr)
library(dplyr)

df %>% 
  mutate(across(everything(), parse_number))
#>   sampleA sampleB
#> 1   91.24   91.22
#> 2   90.89   90.99
#> 3   90.46   90.77

数据来自OP

df <- data.frame(sampleA = c("There is a 91.24% of reference with a coverageData >= 1X", "There is a 90.89% of reference with a coverageData >= 2X", "There is a 90.46% of reference with a coverageData >= 3X"),
                 sampleB = c("There is a 91.22% of reference with a coverageData >= 1X", "There is a 90.99% of reference with a coverageData >= 2X", "There is a 90.77% of reference with a coverageData >= 3X")
)

reprex package(v2.0.1)于2023年2月21日创建

cgfeq70w

cgfeq70w2#

使用sub%作为标记,将百分比值作为其他数字的目标。

data.frame(sapply(df, function(x) 
  as.numeric(sub(".* ", "", sub("%.*", "", x)))))
  sampleA sampleB
1   91.24   91.22
2   90.89   90.99
3   90.46   90.77

相关问题