apache kylin:在步骤5创建多维数据集失败-keyvalue大小太大

bgibtngc  于 2021-06-10  发布在  Hbase
关注(0)|答案(3)|浏览(557)

我开始使用ApacheKylin(1.5.3版)。在创建立方体时,我在第5步“保存立方体统计信息”中遇到一个错误。日志上写着:

java.lang.IllegalArgumentException: KeyValue size too large
at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1521)
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.validatePut(BufferedMutatorImpl.java:147)
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:134)
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1038)
at org.apache.kylin.storage.hbase.HBaseResourceStore.putResourceImpl(HBaseResourceStore.java:242)
at org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:208)
at org.apache.kylin.engine.mr.steps.SaveStatisticsStep.doWork(SaveStatisticsStep.java:113)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:127)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

首先,我试着用更小的维度创建同一个立方体,结果成功了。用遗漏的尺寸创建另一个立方体也很有效。但当我尝试创建一个包含所有这些(13)维的立方体时,它失败了。我还尝试将hbase.client.keyvalue.maxsize设置为0以禁用检查。还是一样的错误。
有人知道问题是什么,我怎么解决吗?
顺便说一下,我在sandboxhdp2.4上使用了kylin。
提前谢谢你的帮助
sø任

rhfm7lfc

rhfm7lfc1#

我们在拼接机上也达到了关键极限。。。
还要记住,在keyvalue规范中,需要将键放入一个short中。键值#getrowoffset()

0yg35tkg

0yg35tkg2#

@尼钦克尼尔
在kylin.properties中找不到kylin.hbase.client.keyvalue.maxsize。kylin.properties如下所示:

> [root@sandbox conf]# cat kylin.properties

# 

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

# 

# http://www.apache.org/licenses/LICENSE-2.0

# 

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# 

# kylin server's mode

kylin.server.mode=all

# optional information for the owner of kylin platform, it can be your team's email

# currently it will be attached to each kylin's htable attribute

kylin.owner=whoami@kylin.apache.org

# List of web servers in use, this enables one web server instance to sync up with other servers.

kylin.rest.servers=localhost:7070

# The metadata store in hbase

kylin.metadata.url=kylin_metadata@hbase

# The storage for final cube file in hbase

kylin.storage.url=hbase

# Temp folder in hdfs, make sure user has the right access to the hdfs directory

kylin.hdfs.working.dir=/kylin

# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020

# leave empty if hbase running on same cluster with hive and mapreduce

kylin.hbase.cluster.fs=

kylin.job.mapreduce.default.reduce.input.mb=500

# max job retry on error, default 0: no retry

kylin.job.retry=0

# If true, job engine will not assume that hadoop CLI reside on the same server as it self

# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password

# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine

# (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands)

kylin.job.run.as.remote.cmd=false

# Only necessary when kylin.job.run.as.remote.cmd=true

kylin.job.remote.cli.hostname=

# Only necessary when kylin.job.run.as.remote.cmd=true

kylin.job.remote.cli.username=

# Only necessary when kylin.job.run.as.remote.cmd=true

kylin.job.remote.cli.password=

# Used by test cases to prepare synthetic data for sample cube

kylin.job.remote.cli.working.dir=/tmp/kylin

# Max count of concurrent jobs running

kylin.job.concurrent.max.limit=10

# Time interval to check hadoop job status

kylin.job.yarn.app.rest.check.interval.seconds=10

# Hive database name for putting the intermediate flat tables

kylin.job.hive.database.for.intermediatetable=default

# default compression codec for htable,snappy,lzo,gzip,lz4

kylin.hbase.default.compression.codec=snappy

# the percentage of the sampling, default 100%

kylin.job.cubing.inmem.sampling.percent=100

# The cut size for hbase region, in GB.

kylin.hbase.region.cut=5

# The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster

# set 0 to disable this optimization

kylin.hbase.hfile.size.gb=2

# Enable/disable ACL check for cube query

kylin.query.security.enabled=true

# whether get job status from resource manager with kerberos authentication

kylin.job.status.with.kerberos=false

## kylin security configurations

# spring security profile, options: testing, ldap, saml

# with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login

kylin.security.profile=testing

# default roles and admin roles in LDAP, for ldap and saml

acl.defaultRole=ROLE_ANALYST,ROLE_MODELER
acl.adminRole=ROLE_ADMIN

# LDAP authentication configuration

ldap.server=ldap://ldap_server:389
ldap.username=
ldap.password=

# LDAP user account directory;

ldap.user.searchBase=
ldap.user.searchPattern=
ldap.user.groupSearchBase=

# LDAP service account directory

ldap.service.searchBase=
ldap.service.searchPattern=
ldap.service.groupSearchBase=

# SAML configurations for SSO

# SAML IDP metadata file location

saml.metadata.file=classpath:sso_metadata.xml
saml.metadata.entityBaseURL=https://hostname/kylin
saml.context.scheme=https
saml.context.serverName=hostname
saml.context.serverPort=443
saml.context.contextPath=/kylin

ganglia.group=
ganglia.port=8664

## Config for mail service

# If true, will send email notification;

mail.enabled=false
mail.host=
mail.username=
mail.password=
mail.sender=

########################### config info for web#######################

# help info ,format{name|displayName|link} ,optional

kylin.web.help.length=4
kylin.web.help.0=start|Getting Started|
kylin.web.help.1=odbc|ODBC Driver|
kylin.web.help.2=tableau|Tableau Guide|
kylin.web.help.3=onboard|Cube Design Tutorial|

# guide user how to build streaming cube

kylin.web.streaming.guide=http://kylin.apache.org/

# hadoop url link ,optional

kylin.web.hadoop=

# job diagnostic url link ,optional

kylin.web.diagnostic=

# contact mail on web page ,optional

kylin.web.contact_mail=

########################### config info for front#######################

# env DEV|QA|PROD

deploy.env=QA

########################### deprecated configs#######################

kylin.sandbox=true
kylin.web.hive.limit=20

# The cut size for hbase region,

# in GB.

# E.g, for cube whose capacity be marked as "SMALL", split region per 5GB by default

kylin.hbase.region.cut.small=5
kylin.hbase.region.cut.medium=10
kylin.hbase.region.cut.large=50
wydwbb8l

wydwbb8l3#

确保kylin.hbase.client.keyvalue.maxsize(驻留在kylin配置文件-conf/kylin.properteis中)和hbase.client.keyvalue.maxsize(驻留在hbase配置文件中)的值相同。当kylin.hbase.client.keyvalue.maxsize的值大于hbase.client.keyvalue.maxsize时,通常会出现键值大小过大的错误
请在下面找到Kylin地产的样本


# kylin server's mode

kylin.server.mode=all

# optional information for the owner of kylin platform, it can be your team's email

# currently it will be attached to each kylin's htable attribute

kylin.owner=whoami@kylin.apache.org

# List of web servers in use, this enables one web server instance to sync up with other servers.

kylin.rest.servers=localhost:7070

# The metadata store in hbase

kylin.metadata.url=kylin_metadata@hbase

# The storage for final cube file in hbase

kylin.storage.url=hbase

# Temp folder in hdfs, make sure user has the right access to the hdfs directory

kylin.hdfs.working.dir=/kylin

# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020

# leave empty if hbase running on same cluster with hive and mapreduce

kylin.hbase.cluster.fs=

kylin.job.mapreduce.default.reduce.input.mb=500

# max job retry on error, default 0: no retry

kylin.job.retry=0

# If true, job engine will not assume that hadoop CLI reside on the same server as it self

# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password

# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine

# (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands)

kylin.job.run.as.remote.cmd=false

# Only necessary when kylin.job.run.as.remote.cmd=true

kylin.job.remote.cli.hostname=

# Only necessary when kylin.job.run.as.remote.cmd=true

kylin.job.remote.cli.username=

# Only necessary when kylin.job.run.as.remote.cmd=true

kylin.job.remote.cli.password=

# Used by test cases to prepare synthetic data for sample cube

kylin.job.remote.cli.working.dir=/tmp/kylin

# Max count of concurrent jobs running

kylin.job.concurrent.max.limit=10

# Time interval to check hadoop job status

kylin.job.yarn.app.rest.check.interval.seconds=10

# Hive database name for putting the intermediate flat tables

kylin.job.hive.database.for.intermediatetable=default

# default compression codec for htable,snappy,lzo,gzip,lz4

kylin.hbase.default.compression.codec=snappy

# the percentage of the sampling, default 100%

kylin.job.cubing.inmem.sampling.percent=100

# The cut size for hbase region, in GB.

kylin.hbase.region.cut=5

# The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster

# set 0 to disable this optimization

kylin.hbase.hfile.size.gb=2

# Enable/disable ACL check for cube query

kylin.query.security.enabled=true

# whether get job status from resource manager with kerberos authentication

kylin.job.status.with.kerberos=false

## kylin security configurations

# spring security profile, options: testing, ldap, saml

# with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login

kylin.security.profile=testing

# default roles and admin roles in LDAP, for ldap and saml

acl.defaultRole=ROLE_ANALYST,ROLE_MODELER
acl.adminRole=ROLE_ADMIN

# LDAP authentication configuration

ldap.server=ldap://ldap_server:389
ldap.username=
ldap.password=

# LDAP user account directory;

ldap.user.searchBase=
ldap.user.searchPattern=
ldap.user.groupSearchBase=

# LDAP service account directory

ldap.service.searchBase=
ldap.service.searchPattern=
ldap.service.groupSearchBase=

# SAML configurations for SSO

# SAML IDP metadata file location

saml.metadata.file=classpath:sso_metadata.xml
saml.metadata.entityBaseURL=https://hostname/kylin
saml.context.scheme=https
saml.context.serverName=hostname
saml.context.serverPort=443
saml.context.contextPath=/kylin

ganglia.group=
ganglia.port=8664

## Config for mail service

# If true, will send email notification;

mail.enabled=false
mail.host=
mail.username=
mail.password=
mail.sender=

########################### config info for web#######################

# help info ,format{name|displayName|link} ,optional

kylin.web.help.length=4
kylin.web.help.0=start|Getting Started|
kylin.web.help.1=odbc|ODBC Driver|
kylin.web.help.2=tableau|Tableau Guide|
kylin.web.help.3=onboard|Cube Design Tutorial|

# guide user how to build streaming cube

kylin.web.streaming.guide=http://kylin.apache.org/

# hadoop url link ,optional

kylin.web.hadoop=

# job diagnostic url link ,optional

kylin.web.diagnostic=

# contact mail on web page ,optional

kylin.web.contact_mail=

########################### config info for front#######################

# env DEV|QA|PROD

deploy.env=QA

########################### deprecated configs#######################

kylin.sandbox=true
kylin.web.hive.limit=20

# The cut size for hbase region,

# in GB.

# E.g, for cube whose capacity be marked as "SMALL", split region per 5GB by default

kylin.hbase.region.cut.small=5
kylin.hbase.region.cut.medium=10
kylin.hbase.region.cut.large=50
kylin.hbase.client.keyvalue.maxsize=1048576

内部属性集kylin.hbase.client.keyvalue.maxsize=1048576

相关问题