R语言使用值的频率创建新列

uqdfh47h 于 2023-04-18 发布在其他

关注(0)|答案(1)|浏览(132)

这与我之前发布的问题类似：Create new column with distinct character values
但我还想了解更多信息。
df：

ID <- c(1,1,1,1,1,1,1,2,2,2,2,2)
color <- c("red","red","red","blue","green","green","blue",
           "yellow","yellow","red","blue","green")
df <- data.frame(ID,color)

   ID  color
1   1    red
2   1    red
3   1    red
4   1   blue
5   1  green
6   1  green
7   1   blue
8   2 yellow
9   2 yellow
10  2    red
11  2   blue
12  2  green

创建n_distinct_color（每个ID具有的不同颜色的数量）：

df %>% 
  group_by(ID) %>% 
  distinct(color, .keep_all = T) %>% 
  mutate(n_distinct_color = n(), .after = ID) %>% 
  ungroup()

# A tibble: 7 × 3
     ID n_distinct_color color 
  <dbl>            <int> <chr> 
1     1                3 red   
2     1                3 blue  
3     1                3 green 
4     2                4 yellow
5     2                4 red   
6     2                4 blue  
7     2                4 green

现在我想创建：
1.新的“频率”列，显示每个ID的每种颜色出现的次数（从原始df，ID 1有3个红色，2个蓝色，2个绿色等）
1.新的“最常见颜色”列，显示每个ID最常见的颜色。（从原始df开始，ID 1最常见的颜色是红色，ID 2最常见的颜色是黄色。）

ID n_distinct_color color    frequency_of_color   most_frequent_color 
  <dbl>            <int> <chr>    <int>                <chr>
1     1                3 red      3                    red
2     1                3 blue     2                    red
3     1                3 green    2                    red
4     2                4 yellow   2                    yellow
5     2                4 red      1                    yellow
6     2                4 blue     1                    yellow
7     2                4 green    1                    yellow

另外，如果有两种颜色的频率相同（即ID 2最常见的颜色是黄色和红色，数据表会是什么样子？）
df_new：

ID <- c(1,1,1,1,1,1,1,2,2,2,2,2,2)
color <- c("red","red","red","blue","green","green","blue",
           "yellow","yellow","red","blue","green","red")
df_new <- data.frame(ID,color)

   ID  color
1   1    red
2   1    red
3   1    red
4   1   blue
5   1  green
6   1  green
7   1   blue
8   2 yellow
9   2 yellow
10  2    red
11  2   blue
12  2  green
13  2    red

我会很感激所有的帮助！谢谢！！！

r

来源：https://stackoverflow.com/questions/75996434/create-new-column-with-frequency-of-values

1条答案

按热度按时间

brccelvz1#

通过一系列mutate和summarise，您可以实现您的目标。在领带的情况下，这里[1]表示选择第一个领带颜色：

library(dplyr) #1.1.0 or above required
df %>% 
  mutate(n_distinct = n_distinct(color), .by = ID) %>% 
  summarise(frequency = n(), .by = c(ID, n_distinct, color)) %>% 
  mutate(most_frequent = color[which.max(frequency)[1]], .by = ID)

输出

ID n_distinct  color frequency most_frequent
1  1          3    red         3           red
2  1          3   blue         2           red
3  1          3  green         2           red
4  2          4 yellow         2        yellow
5  2          4    red         2        yellow
6  2          4   blue         1        yellow
7  2          4  green         1        yellow

赞(0）回复(0）举报 2023-04-18

我来回答

R语言使用值的频率创建新列

1条答案

相关问题

热门标签

最新问答

R语言 使用值的频率创建新列

1条答案

相关问题

热门标签

最新问答

R语言使用值的频率创建新列