incubator-doris [Bug] getHashValue of string type is always zero in spark load

ercv8c1e  于 2022-04-22  发布在  Java
关注(0)|答案(0)|浏览(141)

Search before asking

  • I had searched in the issues and found no similar issues.

Version

From 0.13 to the latest.

What's Wrong?

The bucket id is incorrect when distributed keys contains string type in spark load etl.

What You Expected?

The bucket id is correct when distributed keys contains string type in spark load etl.

How to Reproduce?

First, create a table whose all distributed keys are string type.

CREATE TABLE `table_destribute_by_string` (
  `dt` int(11) NULL COMMENT "日期分区字段,格式为datekey(yyyymmdd)",
  `phone_hash` varchar(512) NULL COMMENT "电话",
  `file_name` varchar(512) NULL COMMENT "文件名称",
  `stripe_index` varchar(10) NULL COMMENT "stripe下标",
  `row_index` varchar(10) NULL COMMENT "行下标"
) ENGINE=OLAP
DUPLICATE KEY(`dt`, `phone_hash`)
COMMENT "回归测试"
PARTITION BY RANGE(`dt`)
(PARTITION p20220412 VALUES [("19700101"), ("20220412")),
PARTITION p20220413 VALUES [("20220412"), ("20220413")))
DISTRIBUTED BY HASH(`phone_hash`) BUCKETS 100
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false",
"storage_format" = "V2"
);

Then, start spark load of the created table.
Finally, you will find that there is only one task running to write the hdfs file

Anything Else?

  • No response*

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题