在R中分隔列

kyvafyod  于 2023-05-20  发布在  其他
关注(0)|答案(1)|浏览(123)

我在一个.CSV中有以下数据,我将其读入R:

team_a <- c('John', 'Tim')
teamb_b <- c('David', 'Jack')
team_c <- c('Sue', 'Frank', 'Jane')

列D,行1包含:约翰;蒂姆;大卫; Jack Col D,Row 2包含:Tim Col D,第3行包含:杰克;大卫;苏等等。
我想创建标记为team_a、team_b和team_c的附加列,并将每行的名称放在相应的team列中。例如,行1将在team_a列中具有John和Tim,并且在team_B列中具有大卫和Jack。第2行中的team_a列中只有Tim。等等,在单元格中会有其他名字,它们不是我想忽略的团队成员。我还希望能够计算每一行中每一个团队的人数,但我猜我可以很容易地计算出后,把他们正确。
谢谢大家!
我知道如何使用separate,将名字放在自己的列中,但我不知道如何根据他们属于哪个团队将他们放在一个列中。

rjee0c15

rjee0c151#

数据:

team_a <- c('John', 'Tim')
team_b <- c('David', 'Jack')
team_c <- c('Sue', 'Frank', 'Jane')

df <- data.frame(D = c("John; Tim; David; Jack", "Tim", "Jack; David; Sue"))

# to create the data frame: 
D <- c(paste(c(team_a, team_b), collapse = "; "),
       paste(team_a[2], collapse = "; "),
       paste(c(team_b, team_c[1]), collapse = "; "))

df <- data.frame(D = D)

task1:

library(dplyr)
library(tidyr)

df %>%
  mutate(id = row_number()) %>% 
  separate_rows(D, sep = "; ") %>%
  mutate(team = case_when(
    D %in% team_a ~ "team_a",
    D %in% team_b ~ "team_b",
    D %in% team_c ~ "team_c",
    TRUE ~ NA_character_
  )) %>%
  filter(!is.na(team)) %>%
  pivot_wider(names_from = team, values_from = D, values_fn = list(D = toString))

 id team_a    team_b      team_c
  <int> <chr>     <chr>       <chr> 
1     1 John, Tim David, Jack NA    
2     2 Tim       NA          NA    
3     3 NA        Jack, David Sue

task2:

df %>%
  mutate(id = row_number()) %>% 
  separate_rows(D, sep = "; ") %>%
  mutate(team = case_when(
    D %in% team_a ~ "team_a",
    D %in% team_b ~ "team_b",
    D %in% team_c ~ "team_c",
    TRUE ~ NA_character_
  )) %>%
  filter(!is.na(team)) %>%
  group_by(id, team) %>%
  summarise(n = n(), .groups = "drop") %>%
  pivot_wider(names_from = team, values_from = n, values_fill = 0)

     id team_a team_b team_c
  <int>  <int>  <int>  <int>
1     1      2      2      0
2     2      1      0      0
3     3      0      2      1

相关问题