org.apache.hadoop.hbase.regiontobobusyException

jjjwad0x  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(461)

我正在尝试使用hive-hbase集成将30亿条记录(orc文件)从hive加载到hbase。
配置单元创建表ddl

CREATE EXTERNAL TABLE cs.account_dim_hbase(`account_number` string,`encrypted_account_number` string,`affiliate_code` string,`alternate_party_name` string, `alternate_party_name` string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,account_dim:encrypted_account_number,account_dim:affiliate_code,account_dim:alternate_party_name")TBLPROPERTIES ("hbase.table.name" = "default:account_dim");

hive insert query to hbase,我正在运行128 insert命令,类似于下面的示例。

insert  into table cs.account_dim_hbase  select account_number ,encrypted_account_number ,    affiliate_code ,alternate_party_name,mod_account_number from cds.account_dim where mod_account_number=1;

当我试图同时运行所有128个插入时,我得到了下面的错误

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 438 actions: org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=2.0G, regionName=jhgjhsdgfjgsdjf, server=cldf0007.com

帮我解决这个问题,让我知道如果我做错了什么。我用的是hdp3

olqngx59

olqngx591#

在rowkey字段上使用md5哈希从配置单元加载数据,并使用区域拆分创建hbase表。现在,每个分区只需5分钟即可加载数据(以前是20分钟,但现在已修复)

create ‘users, ‘usercf’, SPLITS=›
['10000000000000000000000000000000',
'20000000000000000000000000000000',
'30000000000000000000000000000000',
'40000000000000000000000000000000',
'50000000000000000000000000000000',
'60000000000000000000000000000000',
'70000000000000000000000000000000',
'80000000000000000000000000000000',
'90000000000000000000000000000000',
'a0000000000000000000000000000000',
'b0000000000000000000000000000000',
'c0000000000000000000000000000000',
'd0000000000000000000000000000000',
'e0000000000000000000000000000000',
'f0000000000000000000000000000000']

相关问题