我试图创建一个输出,按因子水平计算计数占总计数(在数据框中)的百分比,但似乎无法弄清楚如何在输出中保留分组结构。
我可以得到总计数除以...
df %>% summarise(sum(num))
# 15
字符串
...
df %>% group_by(species) %>% summarise(sum(num))
# A tibble: 3 × 2
# species `sum(num)`
# <chr> <int>
# 1 Farfantepenaeus duorarum 4
# 2 Farfantepenaeus notialis 0
# 3 Farfantepenaeus spp 11
型
但我没法把它弄成这样...
# ???
# species Percent
# <chr> <int>
# 1 Farfantepenaeus duorarum 4 / 15 = 0.267
# 2 Farfantepenaeus notialis 0 / 15 = 0.000
# 3 Farfantepenaeus spp 11 / 15 = 0.733
型
我得到的最接近的结果是这样的,但是因为我使用了reframe(),所以它返回未分组的数据
df %>% group_by(species) %>%
summarise(factor_count=sum(num)) %>%
# ungroup() %>%
# Wanring: # Please use `reframe()` instead., When switching from `summarise()`
# to `reframe()`, remember that `reframe()` always returns an ungrouped data
reframe(percent=factor_count/sum(df$num))
# A tibble: 3 × 1
percent
<dbl>
1 0.267
2 0
3 0.733
型
数据类型:
> dput(df)
structure(list(species = c("Farfantepenaeus notialis", "Farfantepenaeus spp",
"Farfantepenaeus notialis", "Farfantepenaeus notialis", "Farfantepenaeus duorarum",
"Farfantepenaeus duorarum", "Farfantepenaeus notialis", "Farfantepenaeus spp",
"Farfantepenaeus duorarum", "Farfantepenaeus spp", "Farfantepenaeus notialis",
"Farfantepenaeus duorarum", "Farfantepenaeus spp", "Farfantepenaeus notialis",
"Farfantepenaeus notialis", "Farfantepenaeus spp", "Farfantepenaeus duorarum",
"Farfantepenaeus spp", "Farfantepenaeus spp", "Farfantepenaeus duorarum",
"Farfantepenaeus duorarum", "Farfantepenaeus spp", "Farfantepenaeus spp",
"Farfantepenaeus spp", "Farfantepenaeus notialis"), num = c(0L,
0L, 0L, 0L, 1L, 0L, 0L, 2L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 3L, 0L, 2L, 4L, 0L)), row.names = c(159897L, 174698L,
236857L, 190237L, 327321L, 272931L, 304567L, 75538L, 109206L,
351373L, 280332L, 163966L, 282183L, 341197L, 316962L, 354703L,
343971L, 95333L, 244258L, 254061L, 87561L, 186908L, 221318L,
258688L, 97737L), class = "data.frame")
型
4条答案
按热度按时间b1zrtrql1#
两个步骤:汇总组总数,然后对所有组合进行重新计算。
字符串
对于您的代码:
reframe
是不必要的(大多数情况下,当行数 * 改变 * 时,它通常可以代替summarise
,但我还没有验证两者是否/在哪里有显著差异),实际上在这里它将删除species
列df
开头的管道中使用df$
:使用df$num
会忽略自管道开始以来所做的任何操作,这意味着分组、过滤、添加/更改等在该版本的df
中不可用。当然,有时候它是有用的,甚至是必要的,但它们很少。yjghlzjz2#
使用
xtabs
。字符串
cgfeq70w3#
将值传递给
count
函数的wt
参数字符串
bejyjqdl4#
以下是两种替代方法:
使用
map_vec
字符串
base R: