R语言 按组计算频率

7cwmlq89  于 12个月前  发布在  其他
关注(0)|答案(3)|浏览(107)

我有一个如下图所示的框架。我想按recid对数据进行分组,并添加列count 1和count 2,以便所有值都在col 1和col 2中计数。因此,第1行中的示例count 1将为2,因为10在两列中出现2次,Freqcol 1将为2/4

ID col1 col2 recid
 1   10   12 abc_12
 2   10   15 abc_12
 3   10   10 def_34
Desired output:

ID col1 col2 recid count1 count2 Freqcol1 Freqcol2
 1   10   12 abc_12     2     1     0.5     1
 2   10   15 abc_12     2     1     0.5     1
 3   10   10 def_34     2     2     0.5     0.5

字符串
计算两列中数字的出现次数

df %>%
  pivot_longer(-ID) %>%
  mutate(count = n(), .by = value) %>%
  mutate(freq = count / n()) %>%
  pivot_wider(values_from = c(value, count, freq))

a9wyjsp7

a9wyjsp71#

另一个dplyr方法,没有旋转,动态的col#列的数量(如果有用的话):

library(dplyr)
quux %>%
  mutate(
    across(starts_with("col"),
           ~ colSums(sapply(.x, `==`, c(col1, col2))),
           .names = "{sub('col', 'count', .col)}"),
    across(starts_with("count"), 
           ~ .x / sum(.x), 
           .names = "{sub('count', 'freq', .col)}"),
    .by = recid
  )
#   ID col1 col2  recid count1 count2 freq1 freq2
# 1  1   10   12 abc_12      2      1   0.5   0.5
# 2  2   10   15 abc_12      2      1   0.5   0.5
# 3  3   10   10 def_34      2      2   1.0   1.0

字符串
数据

quux <- structure(list(ID = 1:3, col1 = c(10L, 10L, 10L), col2 = c(12L, 15L, 10L), recid = c("abc_12", "abc_12", "def_34")), class = "data.frame", row.names = c(NA, -3L))

ut6juiuv

ut6juiuv2#

我觉得你想要的是

df %>% 
  pivot_longer(col1:col2) %>% 
  mutate(count=n(), .by=c(recid, value)) %>%
  mutate(freq = 1/count, .by=c(recid, name)) %>% 
  pivot_wider(names_from=name, values_from=c(value, count, freq))

字符串
它返回

# A tibble: 3 × 8
     ID recid  value_col1 value_col2 count_col1 count_col2 freq_col1 freq_col2
  <int> <chr>       <int>      <int>      <int>      <int>     <dbl>     <dbl>
1     1 abc_12         10         12          2          1       0.5       1  
2     2 abc_12         10         15          2          1       0.5       1  
3     3 def_34         10         10          2          2       0.5       0.5


对于样本数据,

df <- read.table(text="ID col1 col2 recid
1   10   12 abc_12
2   10   15 abc_12
3   10   10 def_34", header=T)

这将产生略有不同的列名,如果需要,您可以在稍后的步骤中重命名它们。

laximzn5

laximzn53#

library(dplyr)

df %>%
  group_by(recid) %>%
  mutate(
    count1 = sum(col1 == col1[1]),  # Count occurrences of col1 value within group
    count2 = sum(col2 == col2[1]),  # Count occurrences of col2 value within group
    Freqcol1 = count1 / (count1 + count2),  # Calculate frequency for col1
    Freqcol2 = count2 / (count1 + count2)  # Calculate frequency for col2
  ) %>%
  ungroup()  # Release grouping
A tibble: 3 × 8
     ID  col1  col2 recid  count1 count2 Freqcol1 Freqcol2
  <dbl> <dbl> <dbl> <chr>   <int>  <int>    <dbl>    <dbl>
1     1    10    12 abc_12      2      1    0.667    0.333
2     2    10    15 abc_12      2      1    0.667    0.333
3     3    10    10 def_34      1      1    0.5      0.5

字符串

相关问题