我试图获得多个变量的分组汇总统计量,条件是其他不同的列。例如,我有三个总差异变量(n.diff.total
)和三个变量,指示每个变量是否应从汇总统计量中排除(na.diff.total
= 1,这意味着应将观察结果从计算中排除)。汇总统计量按id1
变量分组。有没有比下面的代码更好、更有效的方法来获取这些值?
示例 Dataframe
set.seed(100)
df <-
data.frame(
id1 = c(rep('A', 10), rep('B', 10)),
id2 = stri_rand_strings(20, 1),
n.diff.total_rare = sample(0:30, 20, replace=TRUE),
n.diff.total_general = sample(0:30, 20, replace=TRUE),
n.diff.total_specialty = sample(0:30, 20, replace=TRUE),
na.diff.total_rare = sample(0:1, 20, replace=TRUE),
na.diff.total_general = sample(0:1, 20, replace=TRUE),
na.diff.total_specialty = sample(0:1, 20, replace=TRUE)
)
当前代码及示例输出
output_rare <-
df %>%
select(id1, id2, n.diff.total_rare, na.diff.total_rare) %>%
filter(na.diff.total_rare == 0) %>%
mutate(zero = ifelse(n.diff.total_rare == 0, 1, 0)) %>%
group_by(id1) %>%
summarise(
min = min(n.diff.total_rare, na.rm = T),
max = max(n.diff.total_rare, na.rm = T),
sd = sd(n.diff.total_rare, na.rm = T),
mean = mean(n.diff.total_rare, na.rm = T),
zeros = sum(zero, na.rm = T)
) %>%
ungroup %>%
mutate(variable = 'n.diff.total_rare')
output_specialty <-
df %>%
select(id1, id2, n.diff.total_specialty, na.diff.total_specialty) %>%
filter(na.diff.total_specialty == 0) %>%
mutate(zero = ifelse(n.diff.total_specialty == 0, 1, 0)) %>%
group_by(id1) %>%
summarise(
min = min(n.diff.total_specialty, na.rm = T),
max = max(n.diff.total_specialty, na.rm = T),
sd = sd(n.diff.total_specialty, na.rm = T),
mean = mean(n.diff.total_specialty, na.rm = T),
zeros = sum(zero, na.rm = T)
) %>%
ungroup %>%
mutate(variable = 'n.diff.total_specialty')
output_general <-
df %>%
select(id1, id2, n.diff.total_general, na.diff.total_general) %>%
filter(na.diff.total_general == 0) %>%
mutate(zero = ifelse(n.diff.total_general == 0, 1, 0)) %>%
group_by(id1) %>%
summarise(
min = min(n.diff.total_general, na.rm = T),
max = max(n.diff.total_general, na.rm = T),
sd = sd(n.diff.total_general, na.rm = T),
mean = mean(n.diff.total_general, na.rm = T),
zeros = sum(zero, na.rm = T)
) %>%
ungroup %>%
mutate(variable = 'n.diff.total_general')
output <-
output_rare %>%
rbind(
output_specialty
) %>%
rbind(
output_general
)
2条答案
按热度按时间u1ehiz5o1#
要将这三个步骤合并为一个步骤并一次性输出单个 Dataframe ,可以使用
pivot_
s按三个变量分组:它给出的值与上面的绑定 Dataframe 相同。
5cnsuln72#
这可能更容易在长格式中使用。
首先,添加一列以指示行号,因为在筛选中会引用同一行中的值。
然后,使用
pivot_longer
将其转换为长格式。函数extract
可以将标签分隔为“n”或“na”以及total的类型(例如,“total_rare”)。希望这能帮上忙。
输出