R语言 查找每列的最大值沿着组ID

wyyhbhjk  于 2023-05-20  发布在  其他
关注(0)|答案(6)|浏览(113)

我有以下 Dataframe ,df1

ID   Group    Time1    Time2    Time3
A00194       1    0.733    0.777    0.433
A00195       1    0.903    0.116    0.308
A00198       1    0.422    0.863    0.220
A00199       1    0.485    0.846    0.203
A02111       2    0.682    0.522    0.700
A02114       2    0.699    0.208    0.686
A02116       2    0.911    0.802    0.041
A02197       2    0.083    0.082    0.900

我想得到IDGroup,其中每个Time1:Time3的值最高。
所需输出将喜欢:

ID   Group   Value   Test
A02116       2   0.911  Time1
A00198       1   0.863  Time2
A02197       2   0.900  Time3

我尝试了下面的代码,但这需要我做三次才能得到所需的输出。

df1[which.max(df1$Time1),,c(1:4)]

我如何才能做到这一点?

drnojrws

drnojrws1#

长格式数据需要找到每个Time
你可以使用dplyrtidyr

library(dplyr)
library(tidyr)
df |> 
  pivot_longer(contains("Time"),names_to = "Test") |> 
  filter(value == max(value),.by=Test) |> 
  arrange(Test)

输出

ID     Group Test  value
  <chr>  <dbl> <chr> <dbl>
1 A02116     2 Time1 0.911
2 A00198     1 Time2 0.863
3 A02197     2 Time3 0.9

或者使用data.table

df_melt = melt(df,
     id.vars = c("ID","Group"),
     variable.name = "Test")

df_melt[df_melt[,.I[which.max(value)],by=Test]$V1]

输出

ID Group   Test value
1: A02116     2  Time1 0.911
2: A00198     1  Time2 0.863
3: A02197     2  Time3 0.900
cs7cruho

cs7cruho2#

library(tidyverse)

df1 %>%
  pivot_longer(cols = starts_with("Time"), names_to = "Test", values_to = "Value") %>%
  group_by(Test) %>%
  slice_max(Value, n = 1) %>%
  select(ID, Group, Value, Test)

ID     Group Value Test 
1 A02116     2 0.911 Time1
2 A00198     1 0.863 Time2
3 A02197     2 0.9   Time3
2admgd59

2admgd593#

library(dplyr)
library(tidyr)

df1 %>% 
  mutate(across(Time1:Time3, ~if_else(.x == max(.x), .x, NA))) %>% 
  pivot_longer(-c(ID, Group), values_drop_na = TRUE, names_to = "Test")

#> # A tibble: 3 x 4
#>   ID     Group Test  value
#>   <chr>  <int> <chr> <dbl>
#> 1 A00198     1 Time2 0.863
#> 2 A02116     2 Time1 0.911
#> 3 A02197     2 Time3 0.9
数据:
read.table(text= "    ID   Group    Time1    Time2    Time3
A00194       1    0.733    0.777    0.433
A00195       1    0.903    0.116    0.308
A00198       1    0.422    0.863    0.220
A00199       1    0.485    0.846    0.203
A02111       2    0.682    0.522    0.700
A02114       2    0.699    0.208    0.686
A02116       2    0.911    0.802    0.041
A02197       2    0.083    0.082    0.900", header = T, stringsAsFactor = F) -> df1
siv3szwd

siv3szwd4#

您可以沿着列sapply,并使用which.max来获得第一个最高值的行,该值可用于子集并获得所需的结果。

i <- sapply(df1[3:5], which.max)
cbind(df1[i,1:2], value=mapply(`[[`, df1[3:5], i), test=names(i))
#      ID Group value  test
#7 A02116     2 0.911 Time1
#3 A00198     1 0.863 Time2
#8 A02197     2 0.900 Time3
egmofgnx

egmofgnx5#

带有max.col的基本R选项

d <- df[max.col(t(df[-(1:2)])), ]
cbind(
    d[1:2],
    Test = names(d)[-(1:2)],
    Value = diag(t(d[-(1:2)]))
)

给予

ID Group  Test Value
7 A02116     2 Time1 0.911
3 A00198     1 Time2 0.863
8 A02197     2 Time3 0.900
bxpogfeg

bxpogfeg6#

pivot_longer + slice_max的两行:

library(dplyr)
library(tidyr)
pivot_longer(df1, matches("Time"), names_to = "Test") %>% 
  slice_max(value, by = Test)

# # A tibble: 3 × 4
#   ID     Group Test  value
#   <chr>  <int> <chr> <dbl>
# 1 A02116     2 Time1 0.911
# 2 A00198     1 Time2 0.863
# 3 A02197     2 Time3 0.9

同样的方法在R中:

df_long <- reshape(df1, direction = "long", varying = list(paste0("Time", 1:3)), v.names = "Value") 
do.call(rbind, by(df_long, df_long["time"], function(x) x[which.max(x$Value),]) )

相关问题