R语言 数据争吵-从长到宽

hpxqektj  于 2023-03-10  发布在  其他
关注(0)|答案(2)|浏览(134)

我得到了以下数据:

A<-tibble::tribble(
  ~ID,   ~NR,   ~INDX.DATE,   ~REOP_DATE,  ~REC, ~TYPE1, ~TYPE2, ~TYPE3, ~TYPE4,
   1L, 1234L, "2012-11-04", "2013-12-31", "YES",     NA,     NA,     NA,     NA,
   1L,    NA,           NA, "2015-01-01",  "NO",     0L,     0L,     1L,     0L,
   1L,    NA,           NA, "2015-02-02", "YES",     NA,     NA,     NA,     NA,
   1L,    NA,           NA, "2016-01-01",  "NO",     1L,     0L,     0L,     0L,
   2L,    NA,           NA,           NA,    NA,     NA,     NA,     NA,     NA,
   3L,    NA,           NA,           NA,    NA,     NA,     NA,     NA,     NA
  )

它看起来像这样:

# A tibble: 6 × 9
     ID    NR INDX.DATE  REOP_DATE  REC   TYPE1 TYPE2 TYPE3 TYPE4
  <int> <int> <chr>      <chr>      <chr> <int> <int> <int> <int>
1     1  1234 2012-11-04 2013-12-31 YES      NA    NA    NA    NA
2     1    NA NA         2015-01-01 NO        0     0     1     0
3     1    NA NA         2015-02-02 YES      NA    NA    NA    NA
4     1    NA NA         2016-01-01 NO        1     0     0     0
5     2    NA NA         NA         NA       NA    NA    NA    NA
6     3    NA NA         NA         NA       NA    NA    NA    NA

但我想把它重新塑造成这样:

ID    NR INDX.DATE  REC1_DATE  REC2_DATE  REOP1_DATE REOP1_TYPE REOP2_DATE REOP2_TYPE
  <int> <int> <chr>      <chr>      <chr>      <chr>           <int> <chr>           <int>
1     1  1234 2012-11-04 2013-12-31 2015-02-02 2015-01-01          3 2016-01-01          1
2     2    NA NA         NA         NA         NA                 NA NA                 NA
3     3    NA NA         NA         NA         NA                 NA NA                 NA

我一直在研究tidyrpivot_wider,但我不知道如何动态地制作列。

pu82cl6c

pu82cl6c1#

你的数据很混乱,所以你可能想改变它的结构,但是这里有一个方法来恢复你的最终输出。注意它需要dplyr 1.1.0,但是你可以用group_by函数改变.by参数。
第一组mutate只是为了获得一个tidy er Dataframe 。您可以检查here我所说的整洁是什么意思,但这对pivot_wider来说更容易理解。然后,我为每组每行创建一个id,然后为数据创建一个pivot。最后,我删除了所有只包含NA的列(无有用信息),并将每组数据总结在一行中:

library(tidyr)
library(dplyr)
A %>% 
  mutate(EVENT = ifelse(REC == "YES", "REC", "REOP"),
         TYPE = ifelse(if_all(TYPE1:TYPE4, is.na), NA, max.col(across(TYPE1:TYPE4))),
         DATE = REOP_DATE,
         .keep = "unused") %>% 
  mutate(row_n = row_number(), .by = c(ID, EVENT)) %>% 
  pivot_wider(names_from = c("EVENT", "row_n"), values_from = c("DATE", "TYPE"),
              names_glue = "{EVENT}{row_n}_{.value}") %>%
  select(where(~ any(complete.cases(.x)))) %>% 
  summarise(across(everything(), ~ ifelse(all(is.na(.x)), NA, max(.x, na.rm = TRUE))), .by = ID)

# A tibble: 3 × 9
     ID    NR INDX.DATE  REC1_DATE  REOP1_DATE REC2_DATE  REOP2_DATE REOP1_TYPE REOP2_TYPE
  <int> <int> <chr>      <chr>      <chr>      <chr>      <chr>           <int>      <int>
1     1  1234 2012-11-04 2013-12-31 2015-01-01 2015-02-02 2016-01-01          3          1
2     2    NA NA         NA         NA         NA         NA                 NA         NA
3     3    NA NA         NA         NA         NA         NA                 NA         NA
ffx8fchx

ffx8fchx2#

为了解决这个问题,我需要掌握我所有的知识。这个解决方案冗长而不是很优雅:

library(dplyr)
library(tidyr)
library(stringr)
library(glue)

# function to coalesce rowwise
coalesce_by_column <- function(df) {
  return(coalesce(df[1], df[2]))
}

A %>% 
  mutate(REC = ifelse(REC=="YES", REOP_DATE, REC),
         REOP_DATE = ifelse(REC == "NO", REOP_DATE, NA_character_),
         REC = ifelse(REC == "NO", NA_character_, REC)) %>% 
  mutate(across(TYPE1:TYPE4, ~case_when(. == 1 ~ cur_column()), .names = 'new_{col}')) %>%
  unite(REOP_TYPE, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(REOP_TYPE = na_if(REOP_TYPE, "")) %>% 
  pivot_longer(c(REOP_DATE, REC, REOP_TYPE)) %>% 
  arrange(name) %>% 
  group_by(name) %>% 
  mutate(x = cumsum(!is.na(value))) %>% 
  arrange(ID) %>% 
  ungroup() %>% 
  mutate(Col1 = rep(row_number(), each=2, length.out = n())) %>% 
  group_by(Col1) %>% 
  arrange(value, .by_group = TRUE) %>% 
  slice(1) %>% 
  mutate(name = ifelse(name=="REC", paste0(name,x,"_DATE"), str_replace(name, "OP", glue::glue("OP{x}")))) %>%
  ungroup() %>% 
  dplyr::select(-starts_with("TYPE"),-x, -Col1) %>% 
  pivot_wider(names_from = name, values_from = value) %>% 
  group_by(ID) %>% 
  summarise_all(coalesce_by_column)
ID    NR INDX.DATE  REC1_DATE  REC2_DATE  REOP1_DATE REOP2_DATE REOP1_TYPE REOP2_TYPE
  <int> <int> <chr>      <chr>      <chr>      <chr>      <chr>      <chr>      <chr>     
1     1  1234 2012-11-04 2013-12-31 2015-02-02 2015-01-01 2016-01-01 TYPE3      TYPE1     
2     2    NA NA         NA         NA         NA         NA         NA         NA        
3     3    NA NA         NA         NA         NA         NA         NA         NA

相关问题