DataX sqlserver同步es速度慢

6ioyuze2  于 4个月前  发布在  其他
关注(0)|答案(6)|浏览(135)

{ "job": { "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 0, "percentage": 0.02 } }, "content": [ { "reader": { "name": "sqlserverreader", "parameter": { "username": "xx", "password": "xxx", "fetchSize": 2000, "connection": [ { "querySql": [ "select id,apply_date,apply_no,datepart(yy,apply_date) as apply_year,apply_country,classify,fmr,last_second_legal_status,last_simple_legal_status,oss_pic_url,publish_date,publish_no,datepart(yy,publish_date) as publish_year,sqr,sqr_city,sqr_province,summary,title,zlqr,org_name,loc_type,ipc_type,claim,instructions,background_technology,txt_content,implement_details,fmr_first,org_agent,sqr_address,ipc_main_type,priority_no,pct_international_apply_no,pct_international_publish_no,grant_date,priority_date,e_priority_date,pct_to_country_date,expire_date,legal_status_date from dt_zl_main;" ], "jdbcUrl": [ "jdbc:sqlserver://xxx:xxxx;DatabaseName=xxx;" ] } ] } }, "writer": { "name": "elasticsearchwriter", "parameter": { "endpoint": "http://xxx:xxx", "index": "xxx", "accessKey": "xxx", "accessId": "xxx", "type": "_doc", "cleanup": true, "discovery": false, "dynamic": true, "batchSize": 1000, "ignoreWriteError": true, "ignoreParseError": true, "splitter": ";", "column": [ { "name": "id", "type": "id" }, { "name": "apply_date", "type": "date", "format": "yyyy-MM-dd HH:mm:ss" }, { "name": "apply_no", "type": "keyword" }, { "name": "apply_year", "type": "keyword" }, { "name": "apply_country", "type": "keyword" }, { "name": "classify", "type": "keyword" }, { "name": "fmr", "type": "keyword", "array": true }, { "name": "last_second_legal_status", "type": "keyword" }, { "name": "last_simple_legal_status", "type": "keyword" }, { "name": "oss_pic_url", "type": "keyword", "index": false }, { "name": "publish_date", "type": "date", "format": "yyyy-MM-dd HH:mm:ss" }, { "name": "publish_no", "type": "keyword" }, { "name": "publish_year", "type": "keyword" }, { "name": "sqr", "type": "text" }, { "name": "sqr_city", "type": "keyword" }, { "name": "sqr_province", "type": "keyword" }, { "name": "summary", "type": "text" }, { "name": "title", "type": "text" }, { "name": "zlqr", "type": "text" }, { "name": "org_name", "type": "text" }, { "name": "loc_type", "type": "keyword" }, { "name": "ipc_type", "type": "keyword", "array": true }, { "name": "claim", "type": "text" }, { "name": "instructions", "type": "text" }, { "name": "background_technology", "type": "text" }, { "name": "txt_content", "type": "text" }, { "name": "implement_details", "type": "text" }, { "name": "fmr_first", "type": "keyword" }, { "name": "org_agent", "type": "keyword", "array": true }, { "name": "sqr_address", "type": "text" }, { "name": "ipc_main_type", "type": "keyword" }, { "name": "priority_no", "type": "keyword" }, { "name": "pct_international_apply_no", "type": "keyword" }, { "name": "pct_international_publish_no", "type": "keyword" }, { "name": "grant_date", "type": "date" }, { "name": "priority_date", "type": "date", "array": true }, { "name": "e_priority_date", "type": "date" }, { "name": "pct_to_country_date", "type": "date" }, { "name": "expire_date", "type": "date" }, { "name": "legal_status_date", "type": "date" } ] } } } ] } }

采用 querySql 单channel,同步es,12个分片。同步5000W数据(5T),速度越来越慢,刚开始400 records/s,后面成200 records/s,不知道什么原因。有什么优化的方式吗?

xdnvmnnf

xdnvmnnf1#

也试过 splitPk,指定了一个自增列(非主键),channel 为 12,好像也怎么快,不知道是不是因为 文本数据,内容比较大的原因。

gtlvzcf8

gtlvzcf82#

同步了50分钟,有同步 50G,60w的数据

ttvkxqim

ttvkxqim3#

单channel也就这个速度了,在core工程的Channel类中有限制,默认单通道是3Mb/s.

速度的限制有2处,一个是速率限制即Speed,一个是记录数限制即Record.

不过都可以修改.

oxcyiej7

oxcyiej74#

好的,我去看下。

nnt7mjpx

nnt7mjpx5#

单channel也就这个速度了,在core工程的Channel类中有限制,默认单通道是3Mb/s.

速度的限制有2处,一个是速率限制即Speed,一个是记录数限制即Record.

不过都可以修改.

可以利用 where 多个并发么?

cigdeys3

cigdeys36#

"type": "id"表示的是类型为主键吗

相关问题