csv 如何用逗号分隔字符串,但保留日期?

nzkunb0c  于 2022-12-06  发布在  其他
关注(0)|答案(4)|浏览(74)

我在R中有一串这样的字符

ABCDE,"January 10, 2010",F,,,,GH,"March 9, 2009",,,

我想做一些类似str.split()的事情,用逗号和引号的所有组合划分成一个字符串数组,但要把逗号放在表示日期的引号中,这样我就得到了:

ABCDE
January 10, 2010
F
GH
March 9, 2009

谢谢

2admgd59

2admgd591#

这是一种方法,

data.frame(list = na.omit(
  unname(unlist(read.csv(
    text = 'ABCDE,"January 10, 2010",F,,,,GH,"March 9, 2009",,,', 
    check.names = F, header = F)))))
              list
1            ABCDE
2 January 10, 2010
3            FALSE
4               GH
5    March 9, 2009
nx7onnlm

nx7onnlm2#

这里您可能应该使用CSV解析器,但是如果您想使用纯正则表达式方法,您可以尝试:

library(stringr)
library(dplyr)

x <- "ABCDE,\"January 10, 2010\",F,,,,GH,\"March 9, 2009\",,,"
y <- str_match_all(x, "\"(.*?)\"|[^,]+")[[1]]
output <- coalesce(y[,2], y[,1])
output

[1] "ABCDE"            "January 10, 2010" "F"                "GH"
[5] "March 9, 2009"

正则表达式模式使用了一个交替的技巧,并表示要匹配:

  • "(.*?)"匹配引号中的日期,但不捕获引号
  • |
  • [^,]+匹配单个CSV术语
368yc8dk

368yc8dk3#

如果模式如图所示,则regex选项将创建分隔符并使用read.table

read.table(text = gsub('"', '', gsub('("[^,"]+,)(*SKIP)(*FAIL)|,',
   '\n', trimws(gsub(",{2,}", ",", str1), whitespace = ","), perl = TRUE)), 
    header = FALSE, fill = TRUE, sep = "\n")
  • 输出
V1
1            ABCDE
2 January 10, 2010
3                F
4               GH
5    March 9, 2009

或者用scan

data.frame(V1 = setdiff(scan(text = str1, sep = ",",
    what = character()), ""))
  • 输出
V1
1            ABCDE
2 January 10, 2010
3                F
4               GH
5    March 9, 2009

数据

str1 <- "ABCDE,\"January 10, 2010\",F,,,,GH,\"March 9, 2009\",,,"
niknxzdl

niknxzdl4#

另一个选项可以是:
第一个

相关问题