我正在寻找一种方法在bash只为linux和posix环境(没有gawk,没有python,没有库或语言依赖性,除了linux/posix的默认cli工具)方法,用于阅读多行csv文件到变量中,一次一行进行处理。CSV值在双引号内有逗号,这会破坏现有的代码:
while IFS=, read -r field1 field2 field3 field4 field5 field6 field7 field8 field9 field10 field11 field12 field13 field14 field15 field18 field17 field18 field19 field29
do
单行CSV输入示例:"AC XA, S.A.","City of Commerce","00","0A348E541E6F5C258A12A5674AEF25F28BA7DCFAECEECC4EE63B71B361606AC3","Included","Included","Included","Not Included","09/30/2003","09/30/2037","","SZ=City of Commerce; FE=http://website.org; O=AC XA SA CIF A39201827; C=QR","--BINARY BLOB--maXJtYSBTQSBDSUYgQTgyNzQzMjg3MSMwIQYDVQQLExpodHRwOi8vd3d3LmNoYW1iZXJzaWduLm9yZzEiMCAGA1UEAxMZQ2hhbWJlcnMgb2YgQ29tbWVyY2UgUm9vdDAeFw0wMzA5MzAUIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MB4XDTE2MDQxNDA3NTAwNloXDTQwMDQwODA3NTAwNlowggEIMQswCQYDVQQGEwJFUzEPMA0GA1UECAwGTUFEUklEMQ8wDQYDVQQHDAZNQURSSUQxOjA4BgNVBAsMMXNlZSBjdXJyZW50IGFkZHJlc3MgYXQgd3d3LmNhbWVyZmlybWEuY29tL2FkZHJlc3MxJzAlBgNVBAsMHkdMT0JBTCBDSEFNQkVSU0lHTiBST09UIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MIICIjANBgkqhkiG9w0BAQEF--END--","SZ1qSRW","Email","AFX Client;Email","AFX Method;AFX Method;Entry Point;AFX Email;Time Stamping","No Entry","",""
所需的输出是将每行的以下值字符串读入变量:
"AC XA, S.A."
"City of Commerce"
"00"
"0A348E541E6F5C258A12A5674AEF25F28BA7DCFAECEECC4EE63B71B361606AC3"
"Included"
"Included"
"Included"
"Not Included"
"09/30/2003"
"09/30/2037"
""
"SZ=City of Commerce; FE=http://website.org; O=AC XA SA CIF A39201827; C=QR"
"--BINARY BLOB--maXJtYSBTQSBDSUYgQTgyNzQzMjg3MSMwIQYDVQQLExpodHRwOi8vd3d3LmNoYW1iZXJzaWduLm9yZzEiMCAGA1UEAxMZQ2hhbWJlcnMgb2YgQ29tbWVyY2UgUm9vdDAeFw0wMzA5MzAUIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MB4XDTE2MDQxNDA3NTAwNloXDTQwMDQwODA3NTAwNlowggEIMQswCQYDVQQGEwJFUzEPMA0GA1UECAwGTUFEUklEMQ8wDQYDVQQHDAZNQURSSUQxOjA4BgNVBAsMMXNlZSBjdXJyZW50IGFkZHJlc3MgYXQgd3d3LmNhbWVyZmlybWEuY29tL2FkZHJlc3MxJzAlBgNVBAsMHkdMT0JBTCBDSEFNQkVSU0lHTiBST09UIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MIICIjANBgkqhkiG9w0BAQEF--END--"
"SZ1qSRW"
"Email"
"AFX Client;Email"
"AFX Method;AFX Method;Entry Point;AFX Email;Time Stamping"
"No Entry"
""
""
目前,IFS方法由于第一个值“AC XA,S.A.”中的逗号而中断。我已经尝试了以下awk并取得了一些成功,但不清楚是否可以将每个项目读入变量并指定行号:awk -F"," '{for(i=1;i<=NF;i++){if(i%NF==0){ORS="\n"} {if($i ~ /^\"/ || $i ~ /\"$/) {a=a OFS $i;j++;{if(j%2==0){sub(/^\,/,X,a); print a;j=0;a=""}}} else {print $i}}} {ORS=","}}' ORS="," OFS=, file.csv
2条答案
按热度按时间pepwfjgg1#
不要将字段读入名为
field1
,field2
等的20个不同标量变量中,而是将它们读入一个数组中。使用bash和awk可以使用NUL
作为ORS,例如GNU awk,但不仅仅是GNU awk,所以无论你已经使用的awk都可以工作:如果你的awk不能使用
NUL
s,那么使用任何awk:只要你的所有字段都用引号括起来,并且没有一个字段包含字符串
","
或换行符,它就可以工作。pprl5pva2#
最简单的方法是使用csv解析器。
一个常见的解析器是Ruby:
以你为例: