R语言将一列拆分为多列，不同数量的元素

7vhp5slm 于 2022-12-24 发布在其他

关注(0)|答案(2)|浏览(164)

我有一个列要拆分，类似于A，每个观察（歌曲）有不同数量的元素（流派）。我可以在不指定R中的目标列的情况下拆分列吗？
| 色谱柱A|
| - ------|
| "['hip hop'，'pop'，'pop rap'，'r & b'，'南方hip hop'，'trap'，'trap soul']"|
| ['流行舞蹈'，'女子组合'，'流行音乐'，'后青少年流行音乐'，'才艺表演'，'英国流行音乐']|
我也想得到这样的结果
| 类型1|类型2|流派...|体裁6|体裁7|
| - ------| - ------| - ------| - ------| - ------|
| 嘻哈|流行音乐|- -|陷阱|新灵魂|
| 流行舞曲|女子团体|- -|英国流行音乐|不适用|
新列的数目等于一首歌可以具有的最大类型数目（例如，如果具有更多类型的歌曲具有十个类型，则我应该具有十列）。
另一种选择是为列中找到的每个流派创建一个虚拟列
| 嘻哈|波普|流行说唱|节奏与蓝调|- -|
| - ------| - ------| - ------| - ------| - ------|
| 1个|1个|1个|1个|- -|
| 无|1个|无|无|- -|
我尝试在R中使用separate，但出现错误

来源：https://stackoverflow.com/questions/74899197/splitting-a-column-in-columns-different-number-of-elements

2条答案

按热度按时间

jfewjypa1#

在base R中，我们可以在删除[、]和引号（'，"）后使用read.csv

df2 <- read.csv(text = gsub('\\[|\\]|\'|"', "", df1$ColumnA), 
 header = FALSE, na.strings = "", col.names = paste0("genre", 1:7))

输出

df2
     genre1      genre2   genre3         genre4            genre5  genre6     genre7
1   hip hop         pop  pop rap            r&b  southern hip hop    trap  trap soul
2 dance pop  girl group      pop  post-teen pop       talent show  uk pop       <NA>

第二个数据集可以使用mtabulate在上面的输出中创建

library(qdapTools)
mtabulate(as.data.frame(t(df2)))

输出

girl group  pop  pop rap  post-teen pop  r&b  southern hip hop  talent show  trap  trap soul  uk pop dance pop hip hop
V1           0    1        1              0    1                 1            0     1          1       0         0       1
V2           1    1        0              1    0                 0            1     0          0       1         1       0

数据

df1 <- structure(list(ColumnA = c("['hip hop', 'pop', 'pop rap', 'r&b', 
'southern hip hop', 'trap', 'trap soul']", 
"['dance pop', 'girl group', 'pop', 'post-teen pop', 'talent show', 'uk pop']"
)), class = "data.frame", row.names = c(NA, -2L))

赞(0）回复(0）举报 2022-12-24

nlejzf6q2#

没有真正知道你想要的输出，但这里有一个想法：

df %>%  
  mutate(col_a = col_a %>% str_remove_all("\\[") %>% 
           str_remove_all("\\]") %>% 
           str_split(pattern = ", ")) %>% 
  unnest(col_a) %>% 
  count(col_a) %>% 
  pivot_wider(names_from = col_a, values_from = n)

# A tibble: 1 × 12
  `'dance pop'` 'girl group…¹ 'hip …² `'pop'` 'pop …³ 'post…⁴ `'r&b'` 'sout…⁵ 'tale…⁶ 'trap…⁷ 'trap…⁸ 'uk p…⁹
          <int>         <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>
1             1             1       1       2       1       1       1       1       1       1       1       1
# … with abbreviated variable names ¹`'girl group'`, ²`'hip hop'`, ³`'pop rap'`, ⁴`'post-teen pop'`,
#   ⁵`'southern hip hop'`, ⁶`'talent show'`, ⁷`'trap'`, ⁸`'trap soul'`, ⁹`'uk pop'`

赞(0）回复(0）举报 2022-12-24

我来回答

R语言将一列拆分为多列，不同数量的元素

2条答案

数据

相关问题

热门标签

最新问答

R语言 将一列拆分为多列，不同数量的元素

2条答案

数据

相关问题

热门标签

最新问答

R语言将一列拆分为多列，不同数量的元素