Search before asking
- I had searched in the issues and found no similar issues.
Version
From 0.13 to the latest.
What's Wrong?
The bucket id is incorrect when distributed keys contains string type
in spark load etl.
What You Expected?
The bucket id is correct when distributed keys contains string type
in spark load etl.
How to Reproduce?
First, create a table whose all distributed keys are string type.
CREATE TABLE `table_destribute_by_string` (
`dt` int(11) NULL COMMENT "日期分区字段,格式为datekey(yyyymmdd)",
`phone_hash` varchar(512) NULL COMMENT "电话",
`file_name` varchar(512) NULL COMMENT "文件名称",
`stripe_index` varchar(10) NULL COMMENT "stripe下标",
`row_index` varchar(10) NULL COMMENT "行下标"
) ENGINE=OLAP
DUPLICATE KEY(`dt`, `phone_hash`)
COMMENT "回归测试"
PARTITION BY RANGE(`dt`)
(PARTITION p20220412 VALUES [("19700101"), ("20220412")),
PARTITION p20220413 VALUES [("20220412"), ("20220413")))
DISTRIBUTED BY HASH(`phone_hash`) BUCKETS 100
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false",
"storage_format" = "V2"
);
Then, start spark load of the created table.
Finally, you will find that there is only one task running to write the hdfs file
Anything Else?
- No response*
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
暂无答案!
目前还没有任何答案,快来回答吧!