R语言 根据两个最大值的差值筛选分组

idv4meu8  于 2023-01-18  发布在  其他
关注(0)|答案(3)|浏览(127)

我有以下名为df的 Dataframe (下面是dput):

> df
   group value
1      A     5
2      A     1
3      A     1
4      A     5
5      B     8
6      B     2
7      B     2
8      B     3
9      C    10
10     C     1
11     C     1
12     C     8

我想根据组的最高值(max)和第二高值之间的差值来过滤组。差值应小于等于2(〈=2),这意味着应删除组B,因为最高值为8,第二高值为3,差值为5。所需的输出应如下所示:

group value
1     A     5
2     A     1
3     A     1
4     A     5
5     C    10
6     C     1
7     C     1
8     C     8

所以我想知道是否有人知道如何根据最高值和第二高值之间的差异来过滤组?
df的dput

df<-structure(list(group = c("A", "A", "A", "A", "B", "B", "B", "B", 
"C", "C", "C", "C"), value = c(5, 1, 1, 5, 8, 2, 2, 3, 10, 1, 
1, 8)), class = "data.frame", row.names = c(NA, -12L))
qacovj5a

qacovj5a1#

您可以使用ave

df[ave(df$value, df$group, FUN=\(x) diff(sort(c(-x, Inf)))[1]) <= 2,]
#   group value
#1      A     5
#2      A     1
#3      A     1
#4      A     5
#9      C    10
#10     C     1
#11     C     1
#12     C     8

如果你可以确定你有所有的时间,至少有两个值,你可以使用。

df[ave(df$value, df$group, FUN=\(x) diff(tail(sort(x), 2))) <= 2,]
df[ave(df$value, df$group, FUN=\(x) diff(sort(-x)[1:2])) <= 2,]
qlvxas9a

qlvxas9a2#

使用dplyr

library(dplyr)

df %>% 
  group_by(group) %>% 
  filter(abs(diff(sort(value, decreasing=T)[1:2])) <= 2) %>%
  ungroup()
# A tibble: 8 × 2
  group value
  <chr> <int>
1 A         5
2 A         1
3 A         1
4 A         5
5 C        10
6 C         1
7 C         1
8 C         8

A碱基R替代物

grp <- na.omit(aggregate(. ~ group, df, function(x) 
  abs(diff(sort(x, decreasing=T)[1:2])) <= 2))

do.call(rbind, c(mapply(function(g, v) 
  list(df[df$group == g & v,]), grp$group, grp$value), make.row.names=F))
  group value
1     A     5
2     A     1
3     A     1
4     A     5
5     C    10
6     C     1
7     C     1
8     C     8
fhity93d

fhity93d3#

我可能会先创建一个向量与组,以满足您的条件,然后在原始的data.frame过滤器。

library(dplyr)

group_to_keep <-
  df %>% 
  group_by(group) %>% 
  slice_max(n = 2,value) %>% 
  filter(abs(diff(value)) <= 2) %>% 
  pull(group) %>% 
  unique()

df %>% 
  filter(group %in% group_to_keep)

相关问题