R语言 将面板结构中的转场转换为字符串序列

xnifntxz  于 2022-12-30  发布在  其他
关注(0)|答案(1)|浏览(98)

我有一个面板数据,记录了不同年份中个人的雇佣状态。他们中的许多人在我的数据的时间跨度内更换了工作。我希望捕获这些转换并将其合并到字符串序列中。例如:

Year Person Employment_Status
1990 Bob    High School Teacher 
1991 Bob    High School Teacher 
1992 Bob    Freelancer
1993 Bob    High School Teacher 
1990 Peter  Singer
1991 Peter  Singer
1990 James  Actor
1991 James  Actor
1992 James  Producer
1993 James  Producer
1994 James  Investor

理想的输出应如下所示:

Person Job_Sequence
Bob    High School Teacher-Freelancer-High School Teacher 
Peter  Singer
James  Actor-Producer-Investor

本质上,每个人都被减少到一行记录,对我来说的挑战是不同的人有不同数量的转换(从零到十几个不等)。

lyr7nygr

lyr7nygr1#

我们可以对"Employment_Status"应用rleid,将相同的相邻元素分组为一个组,获取"Person'、" grp "的distinct元素,然后按paste进行分组

library(dplyr)
library(data.table)
df1 %>%
   mutate(grp = rleid(Employment_Status)) %>%
   distinct(Person, grp, .keep_all = TRUE) %>%
   group_by(Person) %>%
   summarise(Job_Sequence = str_c(Employment_Status,
     collapse = '-'), .groups = 'drop')
  • 输出
# A tibble: 3 × 2
  Person Job_Sequence                                      
  <chr>  <chr>                                             
1 Bob    High School Teacher-Freelancer-High School Teacher
2 James  Actor-Producer-Investor                           
3 Peter  Singer

或者使用base R

aggregate(cbind(Job_Sequence = Employment_Status) ~ Person, 
  subset(df1, !duplicated(with(rle(Employment_Status), 
   rep(seq_along(values), lengths)))), FUN = paste, collapse = '-')
  • 输出
Person                                       Job_Sequence
1    Bob High School Teacher-Freelancer-High School Teacher
2  James                            Actor-Producer-Investor
3  Peter                                             Singer

数据

df1 <- structure(list(Year = c(1990L, 1991L, 1992L, 1993L, 1990L, 1991L, 
1990L, 1991L, 1992L, 1993L, 1994L), Person = c("Bob", "Bob", 
"Bob", "Bob", "Peter", "Peter", "James", "James", "James", "James", 
"James"), Employment_Status = c("High School Teacher", "High School Teacher", 
"Freelancer", "High School Teacher", "Singer", "Singer", "Actor", 
"Actor", "Producer", "Producer", "Investor")), 
class = "data.frame", row.names = c(NA, 
-11L))

相关问题