regex 在使用splitstackshape包中的cSplit_e函数时查找空白的负先行

vktxenjb  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(84)

我希望将包含多个逗号分隔的响应的列分隔为多个列。我使用splitstackshape包中的cSplit_e函数。不幸的是,包中的某些项目在单个项目中包含逗号,因此我试图指示它应该仅在逗号处分隔,而不是后跟空格。
这是我现在得到的语法:

cSplit_e(data=df,split.col="question",sep=",",type="character")

字符串
这需要:

Behavior; green, pink, blue,Sleep; indigo, violet, puce


并为以下项创建单独的列:

question_Behavior; green
question_pink
question_blue
question_Sleep; indigo
question_violet
question_puce


但我想把它分成这样:

question_Behavior; green, pink, blue
question_Sleep; indigo, violet, puce


我不知道如何在cSplit_e的语法中指明我只希望它在后面紧跟not-whitespace的逗号处拆分,希望得到帮助!
一个示例框架:

id_num <- c("1","2","3","4","5")
question <- c("Behavior; green, pink, blue,Sleep; indigo, violet, puce","Behavior; green, pink, blue","","Sleep; indigo, violet, puce","Behavior; green, pink, blue,Sleep; indigo, violet, puce")

df <- data.frame(id_num,question)

mcvgt66p

mcvgt66p1#

如果你不介意使用tidyr package,这里有一个可能的解决方案的建议。也许它不像使用splitstackshape package那么优雅或简单,但我不知道。
我不得不删除两个答案中具有空值的id_num(id = 3)
我的代码:

df %>%
  separate_rows(question, sep = "(?<=\\S),(?=\\S)", convert = FALSE) %>%
  separate(question, into = c("question", "response"), sep = ";", extra = "merge") %>%
  filter(!is.na(response)) %>%
  pivot_wider(names_from = question, values_from = response) %>%
  rename_all(~gsub("\\.", "_", .))

字符串
输出量:

# A tibble: 4 × 3
  id_num Behavior             Sleep                  
  <chr>  <chr>                <chr>                  
1 1      " green, pink, blue" " indigo, violet, puce"
2 2      " green, pink, blue"  NA                    
3 4       NA                  " indigo, violet, puce"
4 5      " green, pink, blue" " indigo, violet, puce"

相关问题