如何在R中使用fread忽略带引号字符串中的分隔符

qlvxas9a  于 2022-12-20  发布在  其他
关注(0)|答案(1)|浏览(140)

我有一个文本文件,它是由“|“来分隔列。但是有时候一些发票号也有管道分隔符,但是在引号字符串中,但是使用下面的代码分隔发票号之间的管道,这不是我想要的,我如何让Fread忽略两个引号之间的分隔符?

Notepad File = "DETAIL|29117|Lake Louise Payment Request Policy|Centex Petroleum(CEP001)-39105||39105|2022-11-27|2022-12-17|57562.5100|57562.5100|2022-12-06|2022-12-10|||||CDEWITT||||||||||||||||||LLSA|||||||||||LLSA|LLSA|Y|0.0000|0.0000|B5AC4BCA16504EAA8391|2022-12-01|CLIENT|||||0.0000|Sage 300|Sage 300|9999|3704.5100|3889.7400|DR|CAD|124|Canada, Dollar|9999|LLSA|01|1000|6085|CDEWITT||||||||||||||||191472|||||101351|accounts.payable@skilouise.com|John||Staffieri|accounts.payable@skilouise.com|||||||||||||||||||||||||||LLSA/LLI|LLSA||||7|6|FED FUEL CHARGE|2249   |DEFAULT|27625.0000|0.1341|3704.5100|LLSA|01|1000|6085||||||||||||||||||||||Centex Petroleum|CEP001|CEP001|7EC7866E97F74B85B04D|203, 1717 - 10th Street NW|||CALGARY|AB|T2M 4S2|CA|||0||||||||||||||||N|||||||||||||||||||185.2300||0.0000||0.0000||0.0000|||||0.0000|0.0000||||||N|||||||||||||||||0.0000|2741.0800|0.0000|||54821.4300|0.0000||3704.5100|
DETAIL|29117|Lake Louise Payment Request Policy|Centex Petroleum(CEP001)-39105||39105|2022-11-27|2022-12-17|57562.5100|57562.5100|2022-12-06|2022-12-10|||||CDEWITT||||||||||||||||||LLSA|||||||||||LLSA|LLSA|Y|0.0000|0.0000|B5AC4BCA16504EAA8391|2022-12-01|CLIENT|||||0.0000|Sage 300|Sage 300|9999|454.1000|476.8100|DR|CAD|124|Canada, Dollar|9999|LLSA|01|1000|6085|CDEWITT||||||||||||||||191473|||||101351|accounts.payable@skilouise.com|John||Staffieri|accounts.payable@skilouise.com|||||||||||||||||||||||||||LLSA/LLI|LLSA||||7|7|ExtenData(EXD001)-5642530-IN|2249   |DEFAULT|10091.0000|0.0450|454.1000|LLSA|01|1000|6085||||||||||||||||||||||Centex Petroleum|CEP001|CEP001|7EC7866E97F74B85B04D|203, 1717 - 10th Street NW|||CALGARY|AB|T2M 4S2|CA|||0||||||||||||||||N|||||||||||||||||||22.7100||0.0000||0.0000||0.0000|||||0.0000|0.0000||||||N|||||||||||||||||0.0000|2741.0800|0.0000|||54821.4300|0.0000||454.1000|
DETAIL|28329|Lake Louise Payment Request Policy|Coinamatic(COI001)-SALES000000546031||SALES000000546031|2022-12-01|2022-12-01|916.5200|916.5200|2022-11-14|2022-12-11|||||NBAGGLEY||||||||||||||||||LLSA|||||||||||LLSA|LLSA|Y|0.0000|0.0000|E7B64B5B064F4987A2EF|2022-11-11|CLIENT|||||0.0000|Sage 300|Sage 300|9999|872.8800|916.5200|DR|CAD|124|Canada, Dollar|9999|LLSA|01|9500|6350|NBAGGLEY||||||||||||||||191875|||||101351|accounts.payable@skilouise.com|John||Staffieri|accounts.payable@skilouise.com|||||||||||||||||||||||||||LLSA/LLI|LLSA||||1|1|Rental Fee|2249   |DEFAULT|1.0000|872.8800|872.8800|LLSA|01|9500|6350||||||||||||||||||||||Coinamatic|COI001|COI001|1E980633B41949BAAADA|301 Matheson Blvd West|||MISSISSAUGA|ON|L5R 3G3|CA|||(250) 344-2381||||||||||||||||N|101045318RT0001||||||||||||||||||43.6400||0.0000||0.0000||0.0000|||||0.0000|0.0000||||||N|||||||||||||||||0.0000|43.6400|0.0000|||872.8800|0.0000||872.8800|
DETAIL|28141|Lake Louise Payment Request Policy|"Endeavor Design Inc.(EDI001)-INVC7-6191 | LLOU760-1"||"INVC7-6191 | LLOU760-1"|2022-10-14|2022-11-13|56608.8900|56608.8900|2022-11-09|2022-11-12|||||AE23006||||||||||||||||||LLSA|||||||||||LLSA|LLSA|Y|0.0000|0.0000|155192A2BA72496B8D87|2022-11-05|CLIENT|||||0.0000|Sage 300|Sage 300|9999|50962.4900|53510.6200|DR|CAD|124|Canada, Dollar|9999|LLSA|01|3100|1350|AE23006||||||||||||||||181764|||||168|Christiane.Morel@skilouise.com|Christiane||Morel|Christiane.Morel@skilouise.com|||||||||||||||||||||||||||LLSA/LLI|LLSA||||2|1|Subtotal|2249   |DEFAULT|1.0000|50962.4900|50962.4900|LLSA|01|3100|1350||||||||||||||||||||||Endeavor Design Inc.|EDI001|EDI001|EB6AA8003B374774AF7E|1737 West 3rd Avenue, Unit 110|||VANCOUVER|BC|V6J 1K7|CA|||0||||||||||||||||N|||||||||||||||||||2548.1300||0.0000||0.0000||0.0000|||||0.0000|0.0000||||||N|||||||||||||||||0.0000|2695.6700|0.0000|||53913.2200|0.0000||50962.4900|
DETAIL|28141|Lake Louise Payment Request Policy|"Endeavor Design Inc.(EDI001)-INVC7-6191 | LLOU760-1"||"INVC7-6191 | LLOU760-1"|2022-10-14|2022-11-13|56608.8900|56608.8900|2022-11-09|2022-11-12|||||AE23006||||||||||||||||||LLSA|||||||||||LLSA|LLSA|Y|0.0000|0.0000|155192A2BA72496B8D87|2022-11-05|CLIENT|||||0.0000|Sage 300|Sage 300|9999|2950.7300|3098.2700|DR|CAD|124|Canada, Dollar|9999|LLSA|01|3100|6515|AE23006||||||||||||||||181765|||||168|Christiane.Morel@skilouise.com|Christiane||Morel|Christiane.Morel@skilouise.com|||||||||||||||||||||||||||LLSA/LLI|LLSA||||2|2|Freight|2249   |DEFAULT|1.0000|2950.7300|2950.7300|LLSA|01|3100|6515||||||||||||||||||||||Endeavor Design Inc.|EDI001|EDI001|EB6AA8003B374774AF7E|1737 West 3rd Avenue, Unit 110|||VANCOUVER|BC|V6J 1K7|CA|||0||||||||||||||||N|||||||||||||||||||147.5400||0.0000||0.0000||0.0000|||||0.0000|0.0000||||||N|||||||||||||||||0.0000|2695.6700|0.0000|||53913.2200|0.0000||2950.7300|")

**代码:

Invoice <- fread("invoice.txt", sep="|", quote="\"", headers=FALSE, fill=TRUE, col_types=c(V6="character"))

这在所有情况下都能很好地工作,除非在带引号的字符串之间碰巧有分隔符。
奋进Design发票编号在报价中有一个管道分隔符-原始发票编号为"INVC7-6191 | LLOU760-1",应按原样保留。

xmjla07d

xmjla07d1#

data.table::fread中没有headers=colClasses参数,可能您使用过旧版本,不确定。

r <- data.table::fread('invoice.txt',  sep="|", quote="\"", header=FALSE, fill=TRUE, 
                       colClasses=c(V6="character"))

r[, 'V6'] |> unlist() |> grep(pat='INV', value=TRUE)
#                      V64                      V65 
# "INVC7-6191 | LLOU760-1" "INVC7-6191 | LLOU760-1"

相关问题