我无法得到dplyr(v。1.1.2),以便在两个变量的组中存在NULL值时,仅保留该组的最小行值。使用na.rm = TRUE会导致整个组被删除,而不是忽略该组中的null并保留两个非null中的最小值:
# Original data frame
my_df <-
structure(list(uid = c("id100", "id100", "id100", "id100", "id200",
"id200", "id200", "id200", "id300", "id300", "id300", "id300"
), cat = c("franklin", "franklin", "aretha", "aretha", "franklin",
"aretha", "aretha", "aretha", "franklin", "franklin", "aretha",
"franklin"), food = c("fish", "beef", "chicken", "chicken", "beef",
"pork", "turkey", "fish", "beef", "pork", "chicken", "beef"),
date = structure(c(8674, 8703, 8685, 8689, 8675, 8677, 8680,
8691, 8701, NA, 8698, 8697), class = "Date")), class = "data.frame", row.names = c(NA,
-12L))
尝试过滤并保留每个组的最小值:
# Group by two variables, filter
new_df <-
my_df %>%
group_by(uid, cat) %>%
filter(date == min(date), na.rm = TRUE)
测试结果:
uid cat food date
<chr> <chr> <chr> <date>
1 id100 franklin fish 1993-10-01
2 id100 aretha chicken 1993-10-12
3 id200 franklin beef 1993-10-02
4 id200 aretha pork 1993-10-04
5 id300 aretha chicken 1993-10-25
预期结果:
uid cat food date
<chr> <chr> <chr> <date>
1 id100 franklin fish 1993-10-01
2 id100 aretha chicken 1993-10-12
3 id200 franklin beef 1993-10-02
4 id200 aretha pork 1993-10-04
5 id300 aretha chicken 1993-10-25
6 id300 franklin beef 1993-10-24
3条答案
按热度按时间8mmmxcuj1#
您可以使用
na_rm = TRUE
选项尝试slice_min
这给
jslywgbw2#
na.rm应该放在
min
中。z31licg03#
或者请尝试