R语言 使用值的频率创建新列

uqdfh47h  于 2023-04-18  发布在  其他
关注(0)|答案(1)|浏览(132)

这与我之前发布的问题类似:Create new column with distinct character values
但我还想了解更多信息。
df:

ID <- c(1,1,1,1,1,1,1,2,2,2,2,2)
color <- c("red","red","red","blue","green","green","blue",
           "yellow","yellow","red","blue","green")
df <- data.frame(ID,color)

   ID  color
1   1    red
2   1    red
3   1    red
4   1   blue
5   1  green
6   1  green
7   1   blue
8   2 yellow
9   2 yellow
10  2    red
11  2   blue
12  2  green

创建n_distinct_color(每个ID具有的不同颜色的数量):

df %>% 
  group_by(ID) %>% 
  distinct(color, .keep_all = T) %>% 
  mutate(n_distinct_color = n(), .after = ID) %>% 
  ungroup()

# A tibble: 7 × 3
     ID n_distinct_color color 
  <dbl>            <int> <chr> 
1     1                3 red   
2     1                3 blue  
3     1                3 green 
4     2                4 yellow
5     2                4 red   
6     2                4 blue  
7     2                4 green

现在我想创建:
1.新的“频率”列,显示每个ID的每种颜色出现的次数(从原始df,ID 1有3个红色,2个蓝色,2个绿色等)
1.新的“最常见颜色”列,显示每个ID最常见的颜色。(从原始df开始,ID 1最常见的颜色是红色,ID 2最常见的颜色是黄色。)

ID n_distinct_color color    frequency_of_color   most_frequent_color 
  <dbl>            <int> <chr>    <int>                <chr>
1     1                3 red      3                    red
2     1                3 blue     2                    red
3     1                3 green    2                    red
4     2                4 yellow   2                    yellow
5     2                4 red      1                    yellow
6     2                4 blue     1                    yellow
7     2                4 green    1                    yellow

另外,如果有两种颜色的频率相同(即ID 2最常见的颜色是黄色和红色,数据表会是什么样子?)
df_new:

ID <- c(1,1,1,1,1,1,1,2,2,2,2,2,2)
color <- c("red","red","red","blue","green","green","blue",
           "yellow","yellow","red","blue","green","red")
df_new <- data.frame(ID,color)

   ID  color
1   1    red
2   1    red
3   1    red
4   1   blue
5   1  green
6   1  green
7   1   blue
8   2 yellow
9   2 yellow
10  2    red
11  2   blue
12  2  green
13  2    red

我会很感激所有的帮助!谢谢!!!

brccelvz

brccelvz1#

通过一系列mutatesummarise,您可以实现您的目标。在领带的情况下,这里[1]表示选择第一个领带颜色:

library(dplyr) #1.1.0 or above required
df %>% 
  mutate(n_distinct = n_distinct(color), .by = ID) %>% 
  summarise(frequency = n(), .by = c(ID, n_distinct, color)) %>% 
  mutate(most_frequent = color[which.max(frequency)[1]], .by = ID)

输出

ID n_distinct  color frequency most_frequent
1  1          3    red         3           red
2  1          3   blue         2           red
3  1          3  green         2           red
4  2          4 yellow         2        yellow
5  2          4    red         2        yellow
6  2          4   blue         1        yellow
7  2          4  green         1        yellow

相关问题