dplyr::rowwise和min输出单个值

qacovj5a  于 2023-02-10  发布在  其他
关注(0)|答案(1)|浏览(149)

我遇到了一个奇怪的情况,当我在mutate中使用dplyr::rowwise()min时,它在所有行中输出一个值,而不是逐行输出。它在同一会话中与我的其他 Dataframe 一起工作,不确定是什么问题。我还重新启动了我的Rstudio。

df <- indf
  dplyr::rowwise(.) %>%
  mutate(test = min(as.Date(date1), as.Date(date2), na.rm = T)

structure(list(id = structure(c("5001", "3002", "2001", "1001", 
"6001", "9001"), label = "Subject name or identifier", format.sas = "$"), 
    date1 = structure(c(NA, 18599, NA, NA, NA, NA), class = "Date"), 
    date2 = structure(c(18472, 18597, 18638, 18675, 18678, 18696
    ), class = "Date"), test = structure(c(18472, 18472, 18472, 
    18472, 18472, 18472), class = "Date")), class = c("rowwise_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
    .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame")))
o2g1uqev

o2g1uqev1#

这可能是由于在dplyr之后加载了plyr包,从而从dplyr中屏蔽了mutate

library(dplyr)
indf %>% 
   rowwise %>% 
   plyr::mutate(test = min(date1, date2, na.rm = TRUE))
# A tibble: 6 × 4
# Rowwise: 
  id    date1      date2      test      
  <chr> <date>     <date>     <date>    
1 5001  NA         2020-07-29 2020-07-29
2 3002  2020-12-03 2020-12-01 2020-07-29
3 2001  NA         2021-01-11 2020-07-29
4 1001  NA         2021-02-17 2020-07-29
5 6001  NA         2021-02-20 2020-07-29
6 9001  NA         2021-03-10 2020-07-29

与使用::从dplyr加载函数的比较

> indf %>%
   rowwise %>%
   dplyr::mutate(test = min(date1, date2, na.rm = TRUE))
# A tibble: 6 × 4
# Rowwise: 
  id    date1      date2      test      
  <chr> <date>     <date>     <date>    
1 5001  NA         2020-07-29 2020-07-29
2 3002  2020-12-03 2020-12-01 2020-12-01
3 2001  NA         2021-01-11 2021-01-11
4 1001  NA         2021-02-17 2021-02-17
5 6001  NA         2021-02-20 2021-02-20
6 9001  NA         2021-03-10 2021-03-10

请注意,rowwise速度较慢,最好使用矢量化的pmin

indf %>%
   ungroup %>%
   dplyr::mutate(test = pmin(date1, date2, na.rm = TRUE))
# A tibble: 6 × 4
  id    date1      date2      test      
  <chr> <date>     <date>     <date>    
1 5001  NA         2020-07-29 2020-07-29
2 3002  2020-12-03 2020-12-01 2020-12-01
3 2001  NA         2021-01-11 2021-01-11
4 1001  NA         2021-02-17 2021-02-17
5 6001  NA         2021-02-20 2021-02-20
6 9001  NA         2021-03-10 2021-03-10

相关问题