在R中将列转换为宽格式的单独列

uz75evzq  于 8个月前  发布在  其他
关注(0)|答案(5)|浏览(64)

我试图将“FRUITS”列转换为宽格式的单独列(“Apple”和“Banana”)。

Gender  AgeGroup              EAT           FRUITS 
1 Female 30yr_39yr              Yes             Apple           
2 Female 20yr_29yr              Yes             Apple              
3 Female 70yr_80yr              Yes             Apple             
4   Male 50yr_59yr              Yes             Banana              
5 Female 40yr_49yr              Yes             Apple                  
6 Female 70yr_80yr              Yes             Apple

字符串
如何将FRUITS列转换为:

Gender AgeGroup       EAT  Apple      Banana 
1 Female 30yr_39yr      Yes  TRUE      FALSE
2 Female 20yr_29yr      Yes  TRUE      FALSE
3 Female 70yr_80yr      Yes  TRUE      FALSE
4   Male 50yr_59yr      Yes  FALSE     TRUE
5 Female 40yr_49yr      Yes  TRUE      FALSE
6 Female 70yr_80yr      Yes  TRUE      FALSE


下面是我使用的JavaScript:

data.frame(
  Gender = c("Female", "Female", "Female", "Male", "Female", "Female"),
  AgeGroup = c("30yr_39yr", "20yr_29yr", "70yr_80yr", "50yr_59yr", "40yr_49yr", "70yr_80yr"),
  EAT = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes"),
  FRUITS = c("Apple", "Apple", "Apple", "Banana", "Apple", "Apple")
)

jjjwad0x

jjjwad0x1#

df[c("Apple", "Banana")] <- list(df$FRUITS == "Apple", df$FRUITS == "Banana")

#   Gender  AgeGroup EAT FRUITS Apple Banana
# 1 Female 30yr_39yr Yes  Apple  TRUE  FALSE
# 2 Female 20yr_29yr Yes  Apple  TRUE  FALSE
# 3 Female 70yr_80yr Yes  Apple  TRUE  FALSE
# 4   Male 50yr_59yr Yes Banana FALSE   TRUE
# 5 Female 40yr_49yr Yes  Apple  TRUE  FALSE
# 6 Female 70yr_80yr Yes  Apple  TRUE  FALSE

字符串
要概括更多的价值,您可以使用用途:

cols <- c("Apple", "Banana")
df[cols] <- lapply(cols, \(x) df$FRUITS == x)

rslzwgfq

rslzwgfq2#

要使用tidyr::pivot_wider整形为wide,您必须向数据集添加一个值列,并为每一行添加一个具有唯一id的列:

df <- data.frame(
  Gender = c("Female", "Female", "Female", "Male", "Female", "Female"),
  AgeGroup = c("30yr_39yr", "20yr_29yr", "70yr_80yr", "50yr_59yr", "40yr_49yr", "70yr_80yr"),
  EAT = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes"),
  FRUITS = c("Apple", "Apple", "Apple", "Banana", "Apple", "Apple")
)

library(tidyr)
library(dplyr, warn = FALSE)

df |>
  mutate(
    value = TRUE,
    id = row_number()
  ) |>
  pivot_wider(
    names_from = FRUITS,
    values_from = value, values_fill = FALSE
  ) |>
  select(-id)
#> # A tibble: 6 × 5
#>   Gender AgeGroup  EAT   Apple Banana
#>   <chr>  <chr>     <chr> <lgl> <lgl> 
#> 1 Female 30yr_39yr Yes   TRUE  FALSE 
#> 2 Female 20yr_29yr Yes   TRUE  FALSE 
#> 3 Female 70yr_80yr Yes   TRUE  FALSE 
#> 4 Male   50yr_59yr Yes   FALSE TRUE  
#> 5 Female 40yr_49yr Yes   TRUE  FALSE 
#> 6 Female 70yr_80yr Yes   TRUE  FALSE

字符串

fkvaft9z

fkvaft9z3#

下面是使用unnest_wider()的方法:

library(purrr)
library(dplyr)
library(tidyr)

data %>% 
  mutate(FRUITS = map(FRUITS, ~ set_names(levels(factor(FRUITS)) == .x, levels(factor(FRUITS))))) %>% 
  unnest_wider(FRUITS)

Gender AgeGroup  EAT   Apple Banana
  <chr>  <chr>     <chr> <lgl> <lgl> 
1 Female 30yr_39yr Yes   TRUE  FALSE 
2 Female 20yr_29yr Yes   TRUE  FALSE 
3 Female 70yr_80yr Yes   TRUE  FALSE 
4 Male   50yr_59yr Yes   FALSE TRUE  
5 Female 40yr_49yr Yes   TRUE  FALSE 
6 Female 70yr_80yr Yes   TRUE  FALSE

字符串
下面是一个使用values_fn的稍微修改的版本:

library(dplyr)
library(tidyr)

data %>% 
  mutate(row_id = row_number()) %>% 
  pivot_wider(names_from = FRUITS, values_from = FRUITS, 
              values_fn = list(FRUITS = ~length(.x) > 0), 
              values_fill = FALSE) %>% 
  select(-row_id)
Gender AgeGroup  EAT   Apple Banana
  <chr>  <chr>     <chr> <lgl> <lgl> 
1 Female 30yr_39yr Yes   TRUE  FALSE 
2 Female 20yr_29yr Yes   TRUE  FALSE 
3 Female 70yr_80yr Yes   TRUE  FALSE 
4 Male   50yr_59yr Yes   FALSE TRUE  
5 Female 40yr_49yr Yes   TRUE  FALSE 
6 Female 70yr_80yr Yes   TRUE  FALSE

对于这个具体的例子,

library(dplyr)

data %>% 
  mutate(Apple = FRUITS == "Apple", Banana = FRUITS == "Banana")

ruarlubt

ruarlubt4#

编辑

使用pivot_wider的原始答案是错误的,我删除了它。正如@stefan在评论中建议的那样,它删除了一行。一个适当的解决方案将包括添加索引列的初步步骤,正如@TarJae的答案中所示,我应该只添加简化values_fn = ~ TRUE。dummy_wider建议是一个有效的替代方案
如果值的独热编码/虚拟表示是可以的(1和0而不是逻辑),fastDummies::dummy_cols也很好:

library(fastDummies)
df |> 
    dummy_cols('FRUITS',
               remove_selected_columns = TRUE,
               omit_colname_prefix = TRUE)

  Gender  AgeGroup EAT Apple Banana
1 Female 30yr_39yr Yes     1      0
2 Female 20yr_29yr Yes     1      0
3 Female 70yr_80yr Yes     1      0
4   Male 50yr_59yr Yes     0      1
5 Female 40yr_49yr Yes     1      0
6 Female 70yr_80yr Yes     1      0

字符串

xcitsw88

xcitsw885#

这里还有两个:

library(dplyr)
df %>%  cbind(model.matrix(~ FRUITS + 0, .) == 1)

个字符

相关问题