使用group_by计算乘积总和时出现问题

h6my8fg2  于 2023-02-26  发布在  其他
关注(0)|答案(1)|浏览(149)

我想按组计算乘积之和,但每组的行数不同。以下是我的tibble

d<-c("2019-01-22", "2019-02-05", "2019-02-19" ,"2019-02-19" ,"2019-03-07" ,"2019-03-19" ,"2019-03-19" ,"2019-04-02" ,"2019-04-16",
        "2019-04-16" ,"2019-04-30" ,"2019-05-14" ,"2019-05-14" ,"2019-05-27" ,"2019-01-22" ,"2019-02-05" ,"2019-02-19",
        "2019-02-19" ,"2019-03-07" ,"2019-03-19" ,"2019-03-19" ,"2019-04-02" ,"2019-04-16" ,"2019-04-16" ,"2019-04-30" ,"2019-05-14",
        "2019-05-14" ,"2019-05-27")
mat<-rep(c("092000884483","092000884505"),each=14)
mung<-c("M" ,"M" ,"M" ,"S" ,"M" ,"M" ,"S" ,"M" ,"M" ,"S" ,"M" ,"M" ,"S" ,"M" ,"M" ,"M" ,"M" ,"S" ,"M" ,"M" ,"S" ,"M" ,"M" ,"S" ,"M" ,"M" ,"S" ,"M")
Tg<-c(5.42,4.40,6.39,7.79,3.77,4.65,3.26,5.42,4.17,5.33,4.65,6.43,9.68,8.10,6.68,4.46,6.37,8.90,3.79,5.59,6.66,6.06,6.28,9.48,6.00,6.24,10.48,8.31)
C4<-c(4.29, 5.07, 4.45, 4.15, 4.24, 3.78, 3.62, 4.16, 3.84, 3.54, 3.80, 3.77, 3.93, 3.70, 4.00, 4.22, 4.36, 4.04, 3.92, 3.69, 3.64, 4.27, 3.59, 3.91, 3.84, 3.74, 4.04, 3.01)

my_tbl<-tibble(Matricola=mat,datc=as.Date(d),Mung=mung,tg=Tg,C4_0=C4)

我需要每个日期和每个矩阵的乘积tg*C4_0的总和。如果我手工计算乘积的总和,我会执行以下操作

my_tbl_t<-my_tbl%>%pivot_wider(id_cols = c(Matricola,datc),values_from =c(tg,C4_0),names_from = Mung )
#and calculate the sum of the prodcuts, conditioning to "missing" data
my_prd1<-my_tbl_t%>%mutate(C4_0ps1=case_when(is.na(tg_M)==F & is.na(tg_S)==F~(tg_M*C4_0_M+tg_S*C4_0_S),
                       is.na(tg_M)==F & is.na(tg_S)==T~(tg_M*C4_0_M),
                       is.na(tg_M)==T & is.na(tg_S)==F~(tg_S*C4_0_S)))

或者,我可以先计算产品和汇总在Matricola和日期如下

my_tbl%<>%mutate(C4_0p=C4_0*tg)
#and summarise by group
my_prd2<-my_tbl%>%group_by(Matricola,datc)%>%
  summarise(n=n(),C4_0ps2=sum(C4_0p,n.rm=T))

我原以为my_prd1中的C4_0ps1与my_prd2中的变量C4_0ps2相同,但事实并非如此,因为my_prd2中的乘积之和比my_prod1中的乘积之和高(1个单位)。我看到my_prd2仍按Matricola分组,但我不明白为什么乘积之和是错误的。

j9per5c4

j9per5c41#

我发现了错误!而不是这个代码,其中有一个错别字:

my_prd2 <- my_tbl %>% group_by(Matricola,datc) %>%
  summarise(n=n(),C4_0ps2=sum(**C4_0p,n.rm=T**))

我应该这样写,它有正确的na.rm参数:

my_prd2 <- my_tbl %>% group_by(Matricola,datc) %>%
  summarise(n=n(),C4_0ps2=sum(**C4_0p,na.rm=T**))

相关问题