如何在R中的组内排序？

ntjbwcob 于 2023-04-03 发布在其他

关注(0)|答案(6)|浏览(123)

这是我的 Dataframe ：

customer_name order_dates order_values
1          John  2010-11-01           15
2           Bob  2008-03-25           12
3          Alex  2009-11-15            5
4          John  2012-08-06           15
5          John  2015-05-07           20

假设我想添加一个订单变量，该变量使用决胜局的最后一个订单日期，按名称、最大订单日期对最高订单值进行排序。
因此，最终数据应该是这样的：

customer_name order_dates order_values ranked_order_values_by_max_value_date
1          John  2010-11-01           15                               3
2           Bob  2008-03-25           12                               1
3          Alex  2009-11-15            5                               1
4          John  2012-08-06           15                               2
5          John  2015-05-07           20                               1

其中每个人的单个订单都获得1，所有后续订单都根据该值进行排名，决胜局是获得优先级的最后一个订单日期。在本例中，John的8/6/2012订单获得#2排名，因为它是在11/1/2010之后下的订单。5/7/2015订单为1，因为它是最大的订单。因此，即使那个订单是20年前下的，它也应该是排名第一，因为它是约翰的最高订单值。
有谁知道我如何在R中做到这一点？我可以在 Dataframe 中的一组指定变量中进行排名？

来源：https://stackoverflow.com/questions/31859175/how-to-rank-within-groups-in-r

6条答案

按热度按时间

hlswsv351#

排名最高的答案（由cdeterman提供）实际上是不正确的。order函数提供了排名第一，第二，第三等的值的位置，而不是值在当前顺序中的排名。
让我们举一个简单的例子，我们想排名，从最大的，按客户名称分组。我已经包括了一个手动排名，所以我们可以检查的价值观

> df
       customer_name order_values manual_rank
    1           John            2           5
    2           John            5           2
    3           John            9           1
    4           John            1           6
    5           John            4           3
    6           John            3           4
    7           Lucy            4           4
    8           Lucy            9           1
    9           Lucy            6           3
    10          Lucy            2           6
    11          Lucy            8           2
    12          Lucy            3           5

如果我运行cdeterman建议的代码，我会得到以下错误的排名：

> df %>%
    +   group_by(customer_name) %>%
    +   mutate(my_ranks = order(order_values, decreasing=TRUE))
    Source: local data frame [12 x 4]
    Groups: customer_name [2]

       customer_name order_values manual_rank my_ranks
              <fctr>        <dbl>       <dbl>    <int>
    1           John            2           5        3
    2           John            5           2        2
    3           John            9           1        5
    4           John            1           6        6
    5           John            4           3        1
    6           John            3           4        4
    7           Lucy            4           4        2
    8           Lucy            9           1        5
    9           Lucy            6           3        3
    10          Lucy            2           6        1
    11          Lucy            8           2        6
    12          Lucy            3           5        4

Order用于将 Dataframe 重新排序为降序或升序。我们实际上想要的是运行order函数两次，第二个order函数为我们提供我们想要的实际排名。

> df %>%
    +   group_by(customer_name) %>%
    +   mutate(good_ranks = order(order(order_values, decreasing=TRUE)))
    Source: local data frame [12 x 4]
    Groups: customer_name [2]

       customer_name order_values manual_rank good_ranks
              <fctr>        <dbl>       <dbl>      <int>
    1           John            2           5          5
    2           John            5           2          2
    3           John            9           1          1
    4           John            1           6          6
    5           John            4           3          3
    6           John            3           4          4
    7           Lucy            4           4          4
    8           Lucy            9           1          1
    9           Lucy            6           3          3
    10          Lucy            2           6          6
    11          Lucy            8           2          2
    12          Lucy            3           5          5

赞(0）回复(0）举报 2023-04-03

xtupzzrd2#

您可以使用dplyr非常简洁地完成此操作

library(dplyr)
df %>%
    group_by(customer_name) %>%
    mutate(my_ranks = order(order(order_values, order_dates, decreasing=TRUE)))

Source: local data frame [5 x 4]
Groups: customer_name

  customer_name order_dates order_values my_ranks
1          John  2010-11-01           15        3
2           Bob  2008-03-25           12        1
3          Alex  2009-11-15            5        1
4          John  2012-08-06           15        2
5          John  2015-05-07           20        1

赞(0）回复(0）举报 2023-04-03

bis0qfac3#

这可以通过ave和rank来实现。ave将适当的组传递给rank。由于所请求的顺序，rank的结果相反：

with(x, ave(as.numeric(order_dates), customer_name, FUN=function(x) rev(rank(x))))
## [1] 3 1 1 2 1

赞(0）回复(0）举报 2023-04-03

nfs0ujit4#

在基本R中，可以使用稍微笨拙的

transform(df,rank=ave(1:nrow(df),customer_name,
  FUN=function(x) order(order_values[x],order_dates[x],decreasing=TRUE)))

customer_name order_dates order_values rank
1          John  2010-11-01           15    3
2           Bob  2008-03-25           12    1
3          Alex  2009-11-15            5    1
4          John  2012-08-06           15    2
5          John  2015-05-07           20    1

其中order是每个组的主要值和平局决胜值。

赞(0）回复(0）举报 2023-04-03

fcipmucu5#

df %>% 
  group_by(customer_name) %>% 
  arrange(customer_name,desc(order_values)) %>% 
  mutate(rank2=rank(order_values))

赞(0）回复(0）举报 2023-04-03

mo49yndu6#

与@t-himmel的答案类似，您可以使用data. table获得排名。

dt[ , rnk := order(order(order_values, decreasing = TRUE)), customer_name ]

赞(0）回复(0）举报 2023-04-03

我来回答

如何在R中的组内排序？

6条答案

相关问题

热门标签

最新问答