TxtFileReader读取csv文件 分割符为逗号 但是字段数据中也存在逗号,保存到数据库中逗号没有了
csv文件202101,"cc,dd.",,,
数据库中变成了
202101cc dd.
c9qzyr3d1#
可以尝试我fork的 Addax 版本
csv 文件内容如下:
cat /tmp/out/test.csv 1,"aa,bb",100 2,"cc,dd",200
对应 MySQL 的建表语句如下:
create table csv2tbl(id int, name varchar(20), cnt int);
执行过程如下:
bin/addax.sh job/txtfile2stream.json ___ _ _ / _ \ | | | | / /_\ \ __| | __| | __ ___ __ | _ |/ _` |/ _` |/ _` \ \/ / | | | | (_| | (_| | (_| |> < \_| |_/\__,_|\__,_|\__,_/_/\_\ :: Addax version :: (v4.0.6-SNAPSHOT) 2021-10-29 19:45:47.878 [ main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl 2021-10-29 19:45:47.901 [ main] INFO Engine - { "content":[ { "reader":{ "parameter":{ "path":[ "/tmp/out" ], "column":[ { "index":0, "type":"long" }, { "index":1, "type":"string" }, { "index":2, "type":"long" } ], "skipHeader":false, "encoding":"UTF-8", "fieldDelimiter":"," }, "name":"txtfilereader" }, "writer":{ "parameter":{ "password":"*****", "column":[ "*" ], "connection":[ { "jdbcUrl":"jdbc:mysql://127.0.0.1:3306/test", "table":[ "csv2tbl" ] } ], "username":"root", "preSql":[ "truncate table @table" ] }, "name":"mysqlwriter" } } ], "setting":{ "speed":{ "bytes":-1, "channel":1 } } } 2021-10-29 19:45:47.919 [ main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0 2021-10-29 19:45:47.919 [ main] INFO JobContainer - Addax jobContainer starts job. 2021-10-29 19:45:47.921 [ main] INFO JobContainer - Set jobId = 0 2021-10-29 19:45:49.167 [ job-0] INFO OriginalConfPretreatmentUtil - table:[csv2tbl] all columns:[id,name,cnt]. 2021-10-29 19:45:49.168 [ job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改. 2021-10-29 19:45:49.170 [ job-0] INFO OriginalConfPretreatmentUtil - Write data [INSERT INTO %s ( id,name,cnt) VALUES ( ?,?,? )], which jdbcUrl [jdbc:mysql://127.0.0.1:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false] 2021-10-29 19:45:49.171 [ job-0] INFO JobContainer - Addax Reader.Job [txtfilereader] do prepare work . 2021-10-29 19:45:49.173 [ job-0] INFO TxtFileReader$Job - add file [/tmp/out/test.csv] as a candidate to be read. 2021-10-29 19:45:49.173 [ job-0] INFO TxtFileReader$Job - The number of files to read is: [1] 2021-10-29 19:45:49.174 [ job-0] INFO JobContainer - Addax Writer.Job [mysqlwriter] do prepare work . 2021-10-29 19:45:49.229 [ job-0] INFO CommonRdbmsWriter$Job - Begin to execute preSqls:[truncate table csv2tbl]. context info:jdbc:mysql://127.0.0.1:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false. 2021-10-29 19:45:49.246 [ job-0] INFO JobContainer - Job set Channel-Number to 1 channels. 2021-10-29 19:45:49.246 [ job-0] INFO JobContainer - Addax Reader.Job [txtfilereader] splits to [1] tasks. 2021-10-29 19:45:49.247 [ job-0] INFO JobContainer - Addax Writer.Job [mysqlwriter] splits to [1] tasks. 2021-10-29 19:45:49.262 [ job-0] INFO JobContainer - Scheduler starts [1] taskGroups. 2021-10-29 19:45:49.273 [ taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks. 2021-10-29 19:45:49.283 [ taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated. 2021-10-29 19:45:49.284 [ taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated. 2021-10-29 19:45:49.291 [0-0-0-reader] INFO TxtFileReader$Task - reading file : [/tmp/out/test.csv] 2021-10-29 19:45:49.320 [0-0-0-reader] INFO StorageReaderUtil - The configure item [{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}] is illegal, use default CsvReader [null] 2021-10-29 19:45:52.289 [ job-0] INFO AbstractScheduler - Scheduler accomplished all tasks. 2021-10-29 19:45:52.290 [ job-0] INFO JobContainer - Addax Writer.Job [mysqlwriter] do post work. 2021-10-29 19:45:52.290 [ job-0] INFO JobContainer - Addax Reader.Job [txtfilereader] do post work. 2021-10-29 19:45:52.292 [ job-0] INFO JobContainer - PerfTrace not enable! 2021-10-29 19:45:52.293 [ job-0] INFO StandAloneJobContainerCommunicator - Total 2 records, 18 bytes | Speed 6B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00% 2021-10-29 19:45:52.294 [ job-0] INFO JobContainer - 任务启动时刻 : 2021-10-29 19:45:47 任务结束时刻 : 2021-10-29 19:45:52 任务总计耗时 : 4s 任务平均流量 : 6B/s 记录写入速度 : 0rec/s 读出记录总数 : 2 读写失败总数 : 0
数据库表结果如下:
mysql> select * from csv2tbl; +------+-------+------+ | id | name | cnt | +------+-------+------+ | 1 | aa,bb | 100 | | 2 | cc,dd | 200 | +------+-------+------+ 2 rows in set (0.00 sec)
1条答案
按热度按时间c9qzyr3d1#
可以尝试我fork的 Addax 版本
csv 文件内容如下:
对应 MySQL 的建表语句如下:
执行过程如下:
数据库表结果如下: