R语言 如何将纵向数据集旋转得更长

41zrol4v  于 2023-09-27  发布在  其他
关注(0)|答案(1)|浏览(102)

我有一个纵向数据集,我通过使用个人标识符列合并不同的数据集创建。数据集列的顺序是个人标识符,a_sex,a_countryofbirth,a_health,a_educationstatus,b_sex,b_countryofbirth,b_health,b_educationstatus,c_sex,c_countryofbirth,c_health,c_educationstatus等一直到l。所有以a_开头的变量表示第一波,以b开头的变量表示第二波,以此类推-
我尝试使用Pivot longer创建一个名为Wave的新变量,以便我的表看起来像:-

表:InPreg_transformed
**

- Person ID    Wave  Sex CountryofBirth Health

**

我在其他代码中使用了这个代码,但它确实起作用了。

InPreg_transformed<- InPregDF %>%

  pivot_longer(cols = contains("_"),

               names_to = c("_value", "Wave"),

               names_pattern = "(_+)"

我使用的其他代码:-

InPreg_transformed<- InPreg %>%

          pivot_longer(cols = contains("."), names_to = c(".value", 
          "Wave"), names_pattern = "(.+).(.+)")

 summary(InPreg_transformed)

请协助

5jvtdoz2

5jvtdoz21#

为了确保我理解正确,我创建了一个随机的无意义的n“individuals”示例。
首先加载库:

library(tibble)
library(dplyr)
library(tidyr)

然后创建数据集:

sex <- c("Male", "Female")
europe <- c("Belarus", "Belgium", "Bulgaria",
            "Croatia", "CzechRepublic", "Estonia", "France", 
            "Germany", "Hungary", "Ireland", "Italia", "Latvia", "Lithuania", 
            "Luxembourg", "Netherlands", "Poland", "Portugal", "Romania", 
            "Slovakia", "Slovenia", "Spain")
health <- c("Excellent", "Good", "Fair", "Poor")
education <- c("High School", "Bachelor's", "Master's", "PhD")
n <- 10

wdat <- tibble(
  ID = sprintf("Ind%02i", 1:n), # IDs
  a_sex = sample(sex, n, replace = TRUE),
  a_countryofbirth = sample(europe, n, replace = TRUE),
  a_health = sample(health, n, replace = TRUE),
  a_educationstatus = sample(education, 10, replace = TRUE),
  b_sex = sample(sex, n, replace = TRUE),
  b_countryofbirth = sample(europe, n, replace = TRUE),
  b_health = sample(health, n, replace = TRUE),
  b_educationstatus = sample(education, 10, replace = TRUE),
  c_sex = sample(sex, n, replace = TRUE),
  c_countryofbirth = sample(europe, n, replace = TRUE),
  c_health = sample(health, n, replace = TRUE),
  c_educationstatus = sample(education, 10, replace = TRUE))

数据wdat包含每个人的唯一ID,然后是三个列块。
对于这些数据,可以使用pivot_longer函数将其转换为“long”格式,如下所示

wdat %>% 
  pivot_longer(
    -ID,
    names_to = c("wave", ".value"),
    names_pattern = "(.)_(.*)"
    )

哪里

  • names_pattern = "(.)_(.*)"意味着列名中有两条重要的信息,首先是一个字符,然后是一个字符串,用_分隔,
  • names_to = c("wave", ".value")意味着单个字符将放在名为wave的列中,宽列中的值将放在名称基于通用模式的列中,例如,a_sexb_sexc_sex中的所有值将放在名为sex的列中

编辑:在这种情况下使用names_sep要容易得多

wdat %>% 
  pivot_longer(
    -ID,
    names_sep = "_",
    names_to = c("wave", ".value")
  )

相关问题