我以前用过几次Lumenworks CSV阅读器,从来没有见过这个问题。我有一个简单的CSV文件:
"1","001333","Test Company","","123 Test St","","Eland","NS","58601","USA","","","","","Company 1","","123 Destination St","","Schefield","ND","58601","USA","","","Standard No Options","Label 001","2","1","5","","","","","0","0","0","0","0","0","0","0","0","0","0","05/02/2023","001333","0"
"1","001333","Test Company","","123 Test St","","Eland","NS","58601","USA","","","","","Company 1","","123 Destination St","","Schefield","ND","58601","USA","","","Standard No Options","Label 001","2","2","125","","","","","0","0","0","0","0","0","0","0","0","0","0","05/02/2023","001333","0"
我通过创建一个新的流来阅读它,在顶部指定了头定义,然后复制文件流,这样我就读取了一个带头的CSV。
这是一个非常简单的代码摘录:
using StreamReader filestream = new StreamReader(csvfilepath);
using var finalcsvstream = await PrependHeaderToStream(filestream);
using var csv = new CsvReader(finalcsvstream, true);
while (csv.ReadNextRecord())
{
var fileversionnumber = csv["FileVersionNumber"];
var field2 = csv["field2"];
// etc
}
FileVersionNumber是第一列,也是我唯一遇到问题的列。在文件中,它显然是数字1,但当我读取它时,我得到了一个带转义双引号的字符串:
\"1\"
经过一个多小时的工具和搜索谷歌,我已经尝试指定分隔符和引号,发挥各种修剪选项都无济于事。我查看了库的一个分支的源代码,看起来不应该发生这种情况。目前,我需要这个工作,并提出了一个具体的变通办法来修剪这个列。
你知道哪里出了问题吗?我应该做一个完整的工作玩具的例子,看看问题是否仍然存在?
我还应该提到,我尝试了带双引号和不带双引号的标题,看看它是否会做任何事情。
编辑:我的CSV文件的源代码是一个base64编码的字符串。人们会认为这将是安全的任何疯狂的字符恶作剧。当我在记事本中打开文件时,它看起来很好,但是如果我将base64字符串写入磁盘并使用以下内容:
base64 -d < /tmp/b64.txt | hexdump -c
我看到以下内容:
0000000 357 273 277 " 1 " , " 0 0 1 3 3 3 " ,
0000010 " T e s t C o m p a n y " , "
有什么建议,如何修剪之前,在lumenworks打开它的来源?
1条答案
按热度按时间von4xj4u1#
十六进制转储显示您的数据包含一个UTF-8字节顺序标记。(
hexdump
以八进制显示组成字节)。字节顺序标记与流的开始相关联,而不是与数据的第一行相关联。标题行不能在BOM表前面;如果您尝试,则特殊字符序列不再满足字节顺序标记的定义,而是出现在内容中,这就破坏了内容是否以引号开头的测试。
插入标题行有两个选项:
在执行此操作时,您可以依赖这样一个事实,即只有少数几个可能的BOM表:UTF8、UTF 16-BE和UTF 16-LE。后两个都需要一个Encoding参数到
StreamReader
,所以对于所示的代码,您实际上只需要担心UTF8版本,您可以查找并修剪确切的三个字节集。