R语言 如何防止pmax/pmin考虑非数值?

ih99xse1  于 2023-05-04  发布在  其他
关注(0)|答案(2)|浏览(112)

我使用pmaxpmin从每一行提取最大值和最小值。我有一些统计上不显著的值,这些值被〈〉包围。由于某种原因,pmaxpmin仍然考虑了这些值,然后我无法计算显着值之间的差异。下面是一个例子:
| ID|Var1|Var2|Var3|Var4|
| --------------|--------------|--------------|--------------|--------------|
| A|1|五!|不适用|10个|
| B|二十|不适用|不适用|三|
| C|二十!|10个|不适用|不适用|
| D|不适用|不适用|三十|不适用|
| E|十个!|不适用|不适用|不适用|
我要!xx!值不包括在我执行以下操作时:

DF1 = data.frame(ID=c("A","B","C","D","E"), 
                 Var1=c("1","20","!20!","NA","!10!"), 
                 Var2=c("!5!","NA","10","NA","NA"), 
                 Var3=c("NA","NA","NA","30","NA"), 
                 Var4=c("10","NA","NA","NA","NA"),
                 Var5=c("NA","!50!","20","NA","NA"))
DF1$max <- pmax(DF1$Var1,DF1$Var2,DF1$Var3,DF1$Var4,na.rm = TRUE)
DF1$min <- pmin(DF1$Var1,DF1$Var2,DF1$Var3,DF1$Var4,na.rm = TRUE)

这导致我得到以下结果:

当以下是我想要的:

我该如何预防!pmaxpmin占用xx!个值?我感谢任何帮助!

ecbunoof

ecbunoof1#

假设您的"NA"实际上是NA(不是字符串文字):

DF1[-1] <- lapply(DF1[-1], function(z) replace(z, z=="NA", NA))

我们可以这样做:

do.call(pmax, c(lapply(DF1[-1], function(z) replace(z, grepl("!", z), NA)), list(na.rm = TRUE)))
# [1] "10" "20" "20" "30" NA  
### and converting to numbers
do.call(pmax, c(lapply(DF1[-1], function(z) suppressWarnings(as.numeric(replace(z, grepl("!", z), NA)))), list(na.rm = TRUE)))
# [1] 10 20 20 30 NA

结果存储方式:

nums <- lapply(DF1[-1], function(z) suppressWarnings(as.numeric(replace(z, grepl("!", z), NA))))
DF1$min <- do.call(pmin, c(nums, na.rm = TRUE))
DF1$max <- do.call(pmax, c(nums, na.rm = TRUE))
DF1
#   ID Var1 Var2 Var3 Var4 Var5 min max
# 1  A    1  !5!   NA   10   NA   1  10
# 2  B   20   NA   NA   NA !50!  20  20
# 3  C !20!   10   NA   NA   20  10  20
# 4  D   NA   NA   30   NA   NA  30  30
# 5  E !10!   NA   NA   NA   NA  NA  NA

注意,我们还需要添加na.rm=FALSE
或者,我们可以像这样使用readr::parse_number

nums <- lapply(DF1[-1], function(z) readr::parse_number(replace(z, grepl("!", z), NA)))
### ... as above
goqiplq2

goqiplq22#

下面是一个使用dplyr的解决方案。

library(dplyr)

suppressWarnings( 
DF1 %>% 
  mutate(across(starts_with("Var"), ~as.numeric(.x), .names = "{col}_num")) %>% 
  mutate(max = do.call(pmax, c(subset(., select = Var1_num:Var5_num), na.rm = TRUE)),
         min = do.call(pmin, c(subset(., select = Var1_num:Var5_num), na.rm = TRUE))) %>% 
  select(-contains("num"))
)
#>   ID Var1 Var2 Var3 Var4 Var5 max min
#> 1  A    1  !5!   NA   10   NA  10   1
#> 2  B   20   NA   NA   NA !50!  20  20
#> 3  C !20!   10   NA   NA   20  20  10
#> 4  D   NA   NA   30   NA   NA  30  30
#> 5  E !10!   NA   NA   NA   NA  NA  NA

相关问题