R中等价(划分)上的sum()

4sup72z8  于 2023-01-15  发布在  其他
关注(0)|答案(1)|浏览(130)

以下是我的数据:

df <- data.frame(
  id=c("1", "1", "2", "2", "3", "4"),
  tube_placement=c("2020-01-01", "2020-01-10", "2020-01-01", "2020-01-15", "2020-01-01", "" ),
  tube_removal = c("2020-01-02", "2020-01-12", "2020-01-02", "", "2020-01-02", ""),
  attempts = c(1, 2, 1, "", 1, "")
)
df[df==""] <- NA

df$attempts <- as.numeric(df$attempts)

我想计算一个新列“total_attempts”如下:

id  tube_placement  tube_removal    attempts    total_attempts
1   2020-01-01      2020-01-02      1           3
1   2020-01-10      2020-01-12      2           3
2   2020-01-01      2020-01-02      1           1
2   2020-01-15      NA              NA          1
3   2020-01-01      2020-01-02      1           1
4   NA              NA              NA          NA

我尝试了以下方法:

df1 <- df %>% group_by(id) %>%
  mutate(total_attempts = sum(attempts))

问题是它无法对其中包含NA的ID求和。有什么建议或提议来改进我的代码吗?

1qczuiv0

1qczuiv01#

您可以使用na.rm = TRUE从计算中删除缺失数据,如下所示:

library(tidyverse)
df <- data.frame(
id=c("1", "1", "2", "2", "3", "4"),
  tube_placement=c("2020-01-01", "2020-01-10", "2020-01-01", "2020-01-15", "2020-01-01", "" ),
  tube_removal = c("2020-01-02", "2020-01-12", "2020-01-02", "", "2020-01-02", ""),
  attempts = c(1, 2, 1, "", 1, "")
)

df[df==""] <- NA

df$attempts <- as.numeric(df$attempts)

df1 <- df %>% group_by(id) %>%
  mutate(total_attempts = sum(attempts, na.rm = TRUE))

相关问题