R语言 在一个变量中保留x个最常见的观测值?[副本]

mrfwxfqh  于 2023-05-20  发布在  其他
关注(0)|答案(2)|浏览(76)

此问题已在此处有答案

Filter R dataframe to n most frequent cases and order by frequency(2个答案)
Find the most frequent value in a column and take a subset of that(1个答案)
4天前关闭。
我有一个这样的dataframe:

> dput(dt)
structure(list(ID = 1:10, City = c("New York", "New York", "LA", 
"LA", "LA", "Boston", "Chicago ", "New York", "LA", "New York"
), Random_Info_To_Keep = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L)), class = "data.frame", row.names = c(NA, -10L))

如果行包含数据集中最常见的2个城市(纽约/洛杉矶),我只想保留数据。输出应该如下所示:

> dput(dt2)
structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 8L, 9L, 10L), City = c("New York", 
"New York", "LA", "LA", "LA", "New York", "LA", "New York"), 
    Random_Info_To_Keep = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-8L))
s4n0splo

s4n0splo1#

碱基R:

dt[dt$City %in% names(sort(table(dt$City), decreasing = TRUE)[1:2]), ]

# version by @M-- using tail
dt[ dt$City %in% names( tail( table( dt$City) , 2) ) , ]

 ID     City Random_Info_To_Keep
1   1 New York                   1
2   2 New York                   1
3   3       LA                   1
4   4       LA                   1
5   5       LA                   1
8   8 New York                   1
9   9       LA                   1
10 10 New York                   1
sshcrbum

sshcrbum2#

您可以首先计算每个城市在数据集中出现的次数,然后过滤数据中出现次数最多的两个城市。以下是tidyverse解决方案:

library(tidyverse)

df %>%
  add_count(City, name = "helper_count") %>% # adding the frequency with which each city occurs
  slice_max(order_by = helper_count, n = 2) %>% # two most common
  select(-helper_count)

相关问题