Docker无法过滤最小分组值,na.rm = TRUE

c9x0cxw0  于 2023-09-27  发布在  Docker
关注(0)|答案(3)|浏览(91)

我无法得到dplyr(v。1.1.2),以便在两个变量的组中存在NULL值时,仅保留该组的最小行值。使用na.rm = TRUE会导致整个组被删除,而不是忽略该组中的null并保留两个非null中的最小值:

# Original data frame
my_df <-
structure(list(uid = c("id100", "id100", "id100", "id100", "id200", 
"id200", "id200", "id200", "id300", "id300", "id300", "id300"
), cat = c("franklin", "franklin", "aretha", "aretha", "franklin", 
"aretha", "aretha", "aretha", "franklin", "franklin", "aretha", 
"franklin"), food = c("fish", "beef", "chicken", "chicken", "beef", 
"pork", "turkey", "fish", "beef", "pork", "chicken", "beef"), 
    date = structure(c(8674, 8703, 8685, 8689, 8675, 8677, 8680, 
    8691, 8701, NA, 8698, 8697), class = "Date")), class = "data.frame", row.names = c(NA, 
-12L))

尝试过滤并保留每个组的最小值:

# Group by two variables, filter
new_df <-
  my_df %>%
  group_by(uid, cat) %>%
  filter(date == min(date), na.rm = TRUE)

测试结果:

uid   cat      food    date      
  <chr> <chr>    <chr>   <date>    
1 id100 franklin fish    1993-10-01
2 id100 aretha   chicken 1993-10-12
3 id200 franklin beef    1993-10-02
4 id200 aretha   pork    1993-10-04
5 id300 aretha   chicken 1993-10-25

预期结果:

uid   cat      food    date      
  <chr> <chr>    <chr>   <date>    
1 id100 franklin fish    1993-10-01
2 id100 aretha   chicken 1993-10-12
3 id200 franklin beef    1993-10-02
4 id200 aretha   pork    1993-10-04
5 id300 aretha   chicken 1993-10-25
6 id300 franklin beef    1993-10-24
8mmmxcuj

8mmmxcuj1#

您可以使用na_rm = TRUE选项尝试slice_min

my_df %>%
    slice_min(date, by = c(uid, cat), na_rm = TRUE)

这给

uid      cat    food       date
1 id100 franklin    fish 1993-10-01
2 id100   aretha chicken 1993-10-12
3 id200 franklin    beef 1993-10-02
4 id200   aretha    pork 1993-10-04
5 id300 franklin    beef 1993-10-24
6 id300   aretha chicken 1993-10-25
jslywgbw

jslywgbw2#

na.rm应该放在min中。

new_df <-
  my_df %>%
  group_by(uid, cat) %>%
  filter(date == min(date, na.rm = TRUE))
z31licg0

z31licg03#

或者请尝试

library(tidyverse)

my_df %>% group_by(uid, cat) %>% fill(date) %>% filter(date==min(date)) %>% ungroup()

# A tibble: 6 × 4
  uid   cat      food    date      
  <chr> <chr>    <chr>   <date>    
1 id100 franklin fish    1993-10-01
2 id100 aretha   chicken 1993-10-12
3 id200 franklin beef    1993-10-02
4 id200 aretha   pork    1993-10-04
5 id300 aretha   chicken 1993-10-25
6 id300 franklin beef    1993-10-24

相关问题