DataX TxtFileReader读取csv文件 分割符为逗号 但是字段数据中也存在逗号,保存到数据库中逗号没有了

2hh7jdfx  于 2021-11-29  发布在  Java
关注(0)|答案(1)|浏览(987)

TxtFileReader读取csv文件 分割符为逗号 但是字段数据中也存在逗号,保存到数据库中逗号没有了

csv文件
202101,"cc,dd.",,,

数据库中变成了

202101
cc dd.

c9qzyr3d

c9qzyr3d1#

可以尝试我fork的 Addax 版本

csv 文件内容如下:

cat /tmp/out/test.csv
1,"aa,bb",100
2,"cc,dd",200

对应 MySQL 的建表语句如下:

create table csv2tbl(id int, name varchar(20), cnt int);

执行过程如下:

bin/addax.sh job/txtfile2stream.json

  ___      _     _
 / _ \    | |   | |
/ /_\ \ __| | __| | __ ___  __
|  _  |/ _` |/ _` |/ _` \ \/ /
| | | | (_| | (_| | (_| |>  <
\_| |_/\__,_|\__,_|\__,_/_/\_\

:: Addax version ::    (v4.0.6-SNAPSHOT)

2021-10-29 19:45:47.878 [        main] INFO  VMInfo               - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2021-10-29 19:45:47.901 [        main] INFO  Engine               -
{
"content":[
{
"reader":{
"parameter":{
"path":[
"/tmp/out"
],
"column":[
{
"index":0,
"type":"long"
},
{
"index":1,
"type":"string"
},
{
"index":2,
"type":"long"
}
],
"skipHeader":false,
"encoding":"UTF-8",
"fieldDelimiter":","
},
"name":"txtfilereader"
},
"writer":{
"parameter":{
"password":"*****",
"column":[
"*"
],
"connection":[
{
"jdbcUrl":"jdbc:mysql://127.0.0.1:3306/test",
"table":[
"csv2tbl"
]
}
],
"username":"root",
"preSql":[
"truncate table @table"
]
},
"name":"mysqlwriter"
}
}
],
"setting":{
"speed":{
"bytes":-1,
"channel":1
}
}
}

2021-10-29 19:45:47.919 [        main] INFO  PerfTrace            - PerfTrace traceId=job_-1, isEnable=false, priority=0
2021-10-29 19:45:47.919 [        main] INFO  JobContainer         - Addax jobContainer starts job.
2021-10-29 19:45:47.921 [        main] INFO  JobContainer         - Set jobId = 0
2021-10-29 19:45:49.167 [       job-0] INFO  OriginalConfPretreatmentUtil - table:[csv2tbl] all columns:[id,name,cnt].
2021-10-29 19:45:49.168 [       job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2021-10-29 19:45:49.170 [       job-0] INFO  OriginalConfPretreatmentUtil - Write data [INSERT INTO %s ( id,name,cnt) VALUES ( ?,?,? )], which jdbcUrl [jdbc:mysql://127.0.0.1:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false]
2021-10-29 19:45:49.171 [       job-0] INFO  JobContainer         - Addax Reader.Job [txtfilereader] do prepare work .
2021-10-29 19:45:49.173 [       job-0] INFO  TxtFileReader$Job    - add file [/tmp/out/test.csv] as a candidate to be read.
2021-10-29 19:45:49.173 [       job-0] INFO  TxtFileReader$Job    - The number of files to read is: [1]
2021-10-29 19:45:49.174 [       job-0] INFO  JobContainer         - Addax Writer.Job [mysqlwriter] do prepare work .
2021-10-29 19:45:49.229 [       job-0] INFO  CommonRdbmsWriter$Job - Begin to execute preSqls:[truncate table csv2tbl]. context info:jdbc:mysql://127.0.0.1:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false.
2021-10-29 19:45:49.246 [       job-0] INFO  JobContainer         - Job set Channel-Number to 1 channels.
2021-10-29 19:45:49.246 [       job-0] INFO  JobContainer         - Addax Reader.Job [txtfilereader] splits to [1] tasks.
2021-10-29 19:45:49.247 [       job-0] INFO  JobContainer         - Addax Writer.Job [mysqlwriter] splits to [1] tasks.
2021-10-29 19:45:49.262 [       job-0] INFO  JobContainer         - Scheduler starts [1] taskGroups.
2021-10-29 19:45:49.273 [ taskGroup-0] INFO  TaskGroupContainer   - taskGroupId=[0] start [1] channels for [1] tasks.
2021-10-29 19:45:49.283 [ taskGroup-0] INFO  Channel              - Channel set byte_speed_limit to -1, No bps activated.
2021-10-29 19:45:49.284 [ taskGroup-0] INFO  Channel              - Channel set record_speed_limit to -1, No tps activated.
2021-10-29 19:45:49.291 [0-0-0-reader] INFO  TxtFileReader$Task   - reading file : [/tmp/out/test.csv]
2021-10-29 19:45:49.320 [0-0-0-reader] INFO  StorageReaderUtil    - The configure item [{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}] is illegal, use default CsvReader [null]
2021-10-29 19:45:52.289 [       job-0] INFO  AbstractScheduler    - Scheduler accomplished all tasks.
2021-10-29 19:45:52.290 [       job-0] INFO  JobContainer         - Addax Writer.Job [mysqlwriter] do post work.
2021-10-29 19:45:52.290 [       job-0] INFO  JobContainer         - Addax Reader.Job [txtfilereader] do post work.
2021-10-29 19:45:52.292 [       job-0] INFO  JobContainer         - PerfTrace not enable!
2021-10-29 19:45:52.293 [       job-0] INFO  StandAloneJobContainerCommunicator - Total 2 records, 18 bytes | Speed 6B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-10-29 19:45:52.294 [       job-0] INFO  JobContainer         -
任务启动时刻                    : 2021-10-29 19:45:47
任务结束时刻                    : 2021-10-29 19:45:52
任务总计耗时                    :                  4s
任务平均流量                    :                6B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   2
读写失败总数                    :                   0

数据库表结果如下:

mysql> select * from csv2tbl;
+------+-------+------+
| id   | name  | cnt  |
+------+-------+------+
|    1 | aa,bb |  100 |
|    2 | cc,dd |  200 |
+------+-------+------+
2 rows in set (0.00 sec)

相关问题