R中使用两个分组列的因子水平的聚合长度

mtb9vblg  于 2023-09-27  发布在  其他
关注(0)|答案(3)|浏览(102)

我试图弄清楚如何使用两个分组列来聚合一个列,该列是具有二元结果(成功/失败)的因素。一个挑战是尝试还生成对于二元因子的一个结果具有零值的组组合的汇总行。例如,假设这是下面的dataframe:

dat <- data.frame(
  Group1 = c("A", "A", "A", "B", "B", "C", "C", "C"), 
  Group2 = c("D", "D", "D", "E", "E", "F", "F", "F"), 
  Result = c("Success", "Success", "Fail", "Success", "Success", "Success", "Fail", "Fail")
)

理想情况下,我会想要一个如下所示的摘要:

# Group1  Group2  Success Fail
# A       D       2       1
# B       E       2       0
# C       F       1       2

然而,我正在努力生成一个拆分列的结果。
我尝试使用aggregate()来实现以下功能:

aggregate(Result ~ Group1 + Group2, data = dat, FUN=length)

但似乎不能分解二元因子变量。

yiytaume

yiytaume1#

您可以使用dplyr::summarize()并对值求和:

dat %>%
  summarise(Success = sum(Result == "Success"),
            Fail = sum(Result == "Fail"), 
            .by = c(Group1, Group2))

输出量:

#  Group1 Group2 Success Fail
#1      A      D       2    1
#2      B      E       2    0
#3      C      F       1    2

或者使用base R和aggregate,虽然我相信有一种更优雅的方法可以做到这一点,你可以单独计算计数,merge,然后使用setNames重命名:

setNames(
  merge(
    aggregate(Result ~ Group1 + Group2, data = dat, FUN = \(x) sum(x == "Success")),
    aggregate(Result ~ Group1 + Group2, data = dat, FUN = \(x) sum(x == "Fail")),
    by = c("Group1", "Group2")),
  c("Group1", "Group2", "Success", "Fail")
  )

#  Group1 Group2 Success Fail
#1      A      D       2    1
#2      B      E       2    0
#3      C      F       1    2
w1jd8yoj

w1jd8yoj2#

关于aggregate()

dat$Result <- factor(dat$Result)
aggregate(Result ~ Group1 + Group2, data = dat, FUN = table)
#   Group1 Group2 Result.Fail Result.Success
# 1      A      D           1              2
# 2      B      E           0              2
# 3      C      F           2              1

data.table

library(data.table)
setDT(dat)
dcast(dat, Group1 + Group2 ~ Result)
# Key: <Group1, Group2>
#    Group1 Group2  Fail Success
#    <char> <char> <int>   <int>
# 1:      A      D     1       2
# 2:      B      E     0       2
# 3:      C      F     2       1

关于sqldf

library(sqldf)
sqldf("SELECT 
         Group1, Group2, 
         SUM(CASE WHEN Result = 'Success' THEN 1 ELSE 0 END) AS Success,
         SUM(CASE WHEN Result = 'Fail'    THEN 1 ELSE 0 END) AS Fail 
       FROM dat 
       GROUP BY Group1, Group2")
#   Group1 Group2 Success Fail
# 1      A      D       2    1
# 2      B      E       2    0
# 3      C      F       1    2
ego6inou

ego6inou3#

下面是一个类似的方法,使用pivot_wider()

dplyr::pivot_wider(dat, names_from = Result, values_from = Result, values_fn = length, values_fill = 0)

备注:

  1. values_fn = length查找数据集中Group1、Group2和Result的每个组合的长度
    1.对于数据集中没有行的组合,我们使用values_fill填充0
    1.不确定你的实际数据集是什么样的,但是如果结果就像你给出的例子一样,并且只有两种可能性(成功和失败),那么让它们成为逻辑值TRUE和FALSE或1和0是有意义的。如果它们是类别,那么使它们成为因子是可行的方法

相关问题