R语言具有多个重复分组变量值的汇总数据框

n1bvdmb6 于 2023-10-13 发布在其他

关注(0)|答案(2)|浏览(136)

我有以下的df：

df1<- structure(list(Type.of.Goal = c("Nutrition/Hydration Goal", "Nutrition/Hydration Goal", 
"Fitness Goal", "Fitness Goal", "Lifestyle Goal", "Fitness Goal", 
"Lifestyle Goal", "Fitness Goal", "Nutrition/Hydration Goal", 
"Nutrition/Hydration Goal", "Nutrition/Hydration Goal", "Lifestyle Goal", 
"Lifestyle Goal", "Lifestyle Goal", "Nutrition/Hydration Goal", 
"Lifestyle Goal", "Fitness Goal", "Fitness Goal", "Lifestyle Goal", 
"Lifestyle Goal", "Fitness Goal", "Lifestyle Goal", "Lifestyle Goal"
), progress_made = c(1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 
1, 1, 0, 1, 1, 1, 1, 1, 1), id = c("a", "a", "a", "b", "b", "b", 
"c", "c", "c", "c", "d", "d", "d", "e", "e", "e", "e", "f", "f", 
"f", "g", "g", "g")), row.names = c(734L, 736L, 737L, 964L, 965L, 
966L, 1446L, 1447L, 1448L, 1449L, 1485L, 1486L, 1487L, 1553L, 
1554L, 1555L, 1556L, 1918L, 1919L, 1920L, 1952L, 1953L, 1954L
), class = "data.frame")

我试图总结DF，以显示哪些人（由id列显示）1）在他们设定的所有目标上取得了进展，2）在某些目标上取得了进展，但其他目标没有，或者3）在他们设定的任何目标上都没有取得进展。
如果一个人在给定的目标上取得了进展，progress_made = 1，如果一个人没有在目标上取得进展，progress_made = 0。
对于那些只为每种类型（健身，营养/水合作用和生活方式）设定一个目标的人来说，我能够毫无问题地做到这一点，但是对于这些人来说，例如，设定三个目标，只属于两个目标类型，我一直有问题。
基本上我正在寻找一个最终的框架，具有类似的结构：

df_results<- data.frame(id= c("a", "b", "c", "d", "e", "f", "g"),
                     results= c("All goals saw progress", "No goals saw progress", 
                                 "Some goals saw progress, but not all", 
                                 "Some goals saw progress, but not all", 
                                 "Some goals saw progress, but not all", 
                                 "All goals saw progress", "All goals saw progress"))

它不一定是这个确切的结构，但这只是我需要它以某种方式传达的最终信息。
我最初的策略是将df的宽度旋转，使id和Type.of.Goal是列，progress_made值是单元格值。在此之后，我只是使用rowSums和mutate的组合来评估每个结果类别的值，然后使用ifelse创建一个新列，将值汇总到df_results中列出的文本类别。然而，当在一个类型下为任何给定的个人设置多个目标时，这种枢轴方法不起作用。
任何想法/帮助将不胜感激。

r

来源：https://stackoverflow.com/questions/77275654/summarizing-data-frame-with-multiple-repeating-grouping-variable-values

2条答案

按热度按时间

2admgd591#

我们可以使用dplyr。用case_when，group_by id总结。

library(dplyr)

df1 |> 
    summarise(results = case_when(all(progress_made ==1) ~ "All goals saw progress",
                                  all(progress_made ==0) ~ "No goals saw progress",
                                  .default = "Some goals saw progress, but not all"
                                  ),
              .by = id
              )
  id                              results
1  a               All goals saw progress
2  b                No goals saw progress
3  c Some goals saw progress, but not all
4  d Some goals saw progress, but not all
5  e Some goals saw progress, but not all
6  f               All goals saw progress
7  g               All goals saw progress

赞(0）回复(0）举报 2023-10-13

83qze16e2#

一个基本的R选项：

aggregate(
  df1, progress_made ~ id, 
  FUN = \(x) {
    ifelse(
      all(x==1), 
      "All goals saw progress", 
      ifelse(
        any(x==1), 
        "Some goals saw progress, but not all",
        "No goals saw progress"
      )
    )
  } 
)

#   id                              results
# 1  a               All goals saw progress
# 2  b                No goals saw progress
# 3  c Some goals saw progress, but not all
# 4  d Some goals saw progress, but not all
# 5  e Some goals saw progress, but not all
# 6  f               All goals saw progress
# 7  g               All goals saw progress

赞(0）回复(0）举报 2023-10-13

我来回答

R语言具有多个重复分组变量值的汇总数据框

2条答案

相关问题

热门标签

最新问答

R语言 具有多个重复分组变量值的汇总数据框

2条答案

相关问题

热门标签

最新问答

R语言具有多个重复分组变量值的汇总数据框