这与我之前发布的问题类似:Create new column with distinct character values
但我还想了解更多信息。
df:
ID <- c(1,1,1,1,1,1,1,2,2,2,2,2)
color <- c("red","red","red","blue","green","green","blue",
"yellow","yellow","red","blue","green")
df <- data.frame(ID,color)
ID color
1 1 red
2 1 red
3 1 red
4 1 blue
5 1 green
6 1 green
7 1 blue
8 2 yellow
9 2 yellow
10 2 red
11 2 blue
12 2 green
创建n_distinct_color(每个ID具有的不同颜色的数量):
df %>%
group_by(ID) %>%
distinct(color, .keep_all = T) %>%
mutate(n_distinct_color = n(), .after = ID) %>%
ungroup()
# A tibble: 7 × 3
ID n_distinct_color color
<dbl> <int> <chr>
1 1 3 red
2 1 3 blue
3 1 3 green
4 2 4 yellow
5 2 4 red
6 2 4 blue
7 2 4 green
现在我想创建:
1.新的“频率”列,显示每个ID的每种颜色出现的次数(从原始df,ID 1有3个红色,2个蓝色,2个绿色等)
1.新的“最常见颜色”列,显示每个ID最常见的颜色。(从原始df开始,ID 1最常见的颜色是红色,ID 2最常见的颜色是黄色。)
ID n_distinct_color color frequency_of_color most_frequent_color
<dbl> <int> <chr> <int> <chr>
1 1 3 red 3 red
2 1 3 blue 2 red
3 1 3 green 2 red
4 2 4 yellow 2 yellow
5 2 4 red 1 yellow
6 2 4 blue 1 yellow
7 2 4 green 1 yellow
另外,如果有两种颜色的频率相同(即ID 2最常见的颜色是黄色和红色,数据表会是什么样子?)
df_new:
ID <- c(1,1,1,1,1,1,1,2,2,2,2,2,2)
color <- c("red","red","red","blue","green","green","blue",
"yellow","yellow","red","blue","green","red")
df_new <- data.frame(ID,color)
ID color
1 1 red
2 1 red
3 1 red
4 1 blue
5 1 green
6 1 green
7 1 blue
8 2 yellow
9 2 yellow
10 2 red
11 2 blue
12 2 green
13 2 red
我会很感激所有的帮助!谢谢!!!
1条答案
按热度按时间brccelvz1#
通过一系列
mutate
和summarise
,您可以实现您的目标。在领带的情况下,这里[1]
表示选择第一个领带颜色:输出