R语言 如何在纵向数据中从宽格式转换为长格式?

6uxekuva  于 2023-09-27  发布在  其他
关注(0)|答案(1)|浏览(95)

我的数据集看起来像这样:

Id    smoke_2m  Smoke_3m  smoke_4m alcohol_3m  age sex

1      2          3        5          7         30   M
2      3          3        5          6         26   M
3      7          5        7          2         25   F

我希望我的输出如下:

Id  Time  smoking  alcohol  age sex
1    2m        2      -       30  M
1    3m        3      7       30  M
1    4m        5      -       30  M
2    2m        3      -       26  M
2    3m        3      6       26  M
2    4m        5      -       26  M
3    2m        7      -       25  F
3    3m        5      2       25  F
3    4m        7      -       25  F

在此,对于酒精,只有3 m值可用,因此我希望其他时间点(2 m,4 m)的值为空(-)或由先前值填充,所有三个酒精时间点都是例如7
我尝试使用下面的代码,但它只执行一个变量,即吸烟

data <- data %>%
  pivot_longer(
    cols = c(smoke_2m, smoke_3m, smoke_4m),  
    names_to = "time_point",                     
    values_to = "smoking_status",
)

如何编译所有变量?

ttcibm8c

ttcibm8c1#

这里的技巧是使用names_pattern,这允许其特殊情况".value"作为names_to之一。

pivot_longer(quux, cols = -c(Id, sex, age),
             names_pattern = "(.*)_(.*)", names_to = c(".value", "Time"))
# # A tibble: 9 × 7
#      Id   age sex   Time  smoke Smoke alcohol
#   <int> <int> <chr> <chr> <int> <int>   <int>
# 1     1    30 M     2m        2    NA      NA
# 2     1    30 M     3m       NA     3       7
# 3     1    30 M     4m        5    NA      NA
# 4     2    26 M     2m        3    NA      NA
# 5     2    26 M     3m       NA     3       6
# 6     2    26 M     4m        5    NA      NA
# 7     3    25 F     2m        7    NA      NA
# 8     3    25 F     3m       NA     5       2
# 9     3    25 F     4m        7    NA      NA

不过,推断您打算将Smoke_3m分组在smoke中,让我们解决这个问题:

quux |>
  rename_with(.fn = tolower, .cols = starts_with("Smoke")) |>
  pivot_longer(cols = -c(Id, sex, age),
               names_pattern = "(.*)_(.*)", names_to = c(".value", "Time"))
# # A tibble: 9 × 6
#      Id   age sex   Time  smoke alcohol
#   <int> <int> <chr> <chr> <int>   <int>
# 1     1    30 M     2m        2      NA
# 2     1    30 M     3m        3       7
# 3     1    30 M     4m        5      NA
# 4     2    26 M     2m        3      NA
# 5     2    26 M     3m        3       6
# 6     2    26 M     4m        5      NA
# 7     3    25 F     2m        7      NA
# 8     3    25 F     3m        5       2
# 9     3    25 F     4m        7      NA

数据

quux <- structure(list(Id = 1:3, smoke_2m = c(2L, 3L, 7L), Smoke_3m = c(3L, 3L, 5L), smoke_4m = c(5L, 5L, 7L), alcohol_3m = c(7L, 6L, 2L), age = c(30L, 26L, 25L), sex = c("M", "M", "F")), class = "data.frame", row.names = c(NA, -3L))

相关问题