R语言基于一段时间内的组对分类变量进行计数(日期列)

v2g6jxz6 于 2022-12-25 发布在其他

关注(0)|答案(1)|浏览(148)

假设我有下面的data：
| 日期|姓名|角色名称|
| - ------| - ------| - ------|
| 二○ ○九年十二月一日|约翰|助手|
| 2010年12月1日|约翰|助手|
| 2011年12月1日|约翰|高级助手|
| 2012年12月1日|约翰|经理|
| 二○ ○九年十二月一日|威尔|助手|
| 2010年12月1日|威尔|高级助手|
| 2011年12月1日|威尔|经理|
| 2012年12月1日|威尔|高级经理|
我正尝试根据name列中人员的rolename列（人员迄今为止工作过）来计算角色数。例如，对于上述数据，我希望第四列用于度量人员迄今为止工作过的职位数：
| 日期|姓名|角色名称|无位置|
| - ------| - ------| - ------| - ------|
| 二○ ○九年十二月一日|约翰|助手|1个|
| 2010年12月1日|约翰|助手|1个|
| 2011年12月1日|约翰|高级助手|第二章|
| 2012年12月1日|约翰|经理|三个|
| 二○ ○九年十二月一日|威尔|助手|1个|
| 2010年12月1日|威尔|高级助手|第二章|
| 2011年12月1日|威尔|经理|三个|
| 2012年12月1日|威尔|高级经理|四个|
我失败的尝试：

#attempt 1
library(dplyr)

data %>%
group_by(name) %>%
mutate(nopositions = count(rolename))

#attempt2
library(runner)

data %>%
group_by(name) %>%
mutate(nopositions = runner(x = rolename,
                            k = inf,
                            idx = date,
                            f = function(x) length(x))

r

来源：https://stackoverflow.com/questions/74858383/count-a-categorical-variable-based-on-a-group-over-time-the-date-column

1条答案

按热度按时间

x4shl7ld1#

假设按date的顺序是确定的，

library(dplyr)
quux %>%
  group_by(name) %>%
  mutate(noposition = cummax(match(rolename, unique(rolename)))) %>%
  ungroup()
# # A tibble: 8 × 4
#   date       name  rolename       noposition
#   <chr>      <chr> <chr>               <int>
# 1 2009-12-01 John  helper                  1
# 2 2010-12-01 John  helper                  1
# 3 2011-12-01 John  senior helper           2
# 4 2012-12-01 John  manager                 3
# 5 2009-12-01 Will  helper                  1
# 6 2010-12-01 Will  senior helper           2
# 7 2011-12-01 Will  manager                 3
# 8 2012-12-01 Will  senior manager          4

我们可以不使用cummax，除非name返回到先前的rolename，它的noposition将 * 减小 *（恢复到先前的值）。
这是假设unique保持了第一次出现的自然顺序，如果出现了什么问题（我一时想不出什么问题），我们可以做一个单词窗口：

quux %>%
  group_by(name) %>%
  mutate(noposition = sapply(seq_along(rolename), \(i) length(unique(rolename[1:i])))) %>%
  ungroup()
# # A tibble: 8 × 4
#   date       name  rolename       noposition
#   <chr>      <chr> <chr>               <int>
# 1 2009-12-01 John  helper                  1
# 2 2010-12-01 John  helper                  1
# 3 2011-12-01 John  senior helper           2
# 4 2012-12-01 John  manager                 3
# 5 2009-12-01 Will  helper                  1
# 6 2010-12-01 Will  senior helper           2
# 7 2011-12-01 Will  manager                 3
# 8 2012-12-01 Will  senior manager          4

这在这里产生了相同的结果，并且它在较大的组中执行得更差（因为它迭代得更多）。我将它作为一个扩展提供，以防假设排除了cummax(match(..))的使用。

赞(0）回复(0）举报 2022-12-25

我来回答

R语言基于一段时间内的组对分类变量进行计数(日期列)

1条答案

相关问题

热门标签

最新问答

R语言 基于一段时间内的组对分类变量进行计数(日期列)

1条答案

相关问题

热门标签

最新问答

R语言基于一段时间内的组对分类变量进行计数(日期列)