在kerberized环境中读取配置单元hbase表的spark集群模式问题

gywdnpxw  于 2021-06-08  发布在  Hbase
关注(0)|答案(1)|浏览(433)

错误描述

我们无法在Yarn集群或Yarn客户机模式下执行spark作业,尽管它在本地模式下运行良好。
当我们试图读取kerberized集群中的配置单元hbase表时,就会出现此问题。

到目前为止我们所做的

在spark submi的–jar参数中传递所有hbase jar --jars /usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.5.3.16-1.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/current/hbase-client/lib/protobuf-java-2.5.0.jar,/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar 在spark submit中传递文件参数中的hbase站点和配置单元站点 --files /usr/hdp/2.5.3.16-1/hbase/conf/hbase-site.xml,/usr/hdp/current/spark-client/conf/hive-site.xml,/home/pasusr/pasusr.keytab 在应用程序内执行kerberos身份验证。在代码中,我们显式地传递key选项卡
usergroupinformation.setconfiguration(configuration)val ugi:usergroupinformation=usergroupinformation.loginuserfromkeytabandreturnugi(原理,keytab)usergroupinformation.setloginuser(ugi)connectionfactory.createconnection(configuration)return ugi.doas(new privilegedexceptionactionconnection{@throws[ioexception]def run:connection={connectionfactory.createconnection(configuration)})
在spark submit中传递键选项卡信息
在spark.driver.extraclasspath和spark.executor.extraclasspath中传递hbase jar

错误日志

18/03/20 15:33:24 WARN TableInputFormatBase: You are using an HTable instance that relies on an HBase-managed Connection. This is usually due to directly creating an HTable, which is deprecated. Instead, you should create a Connection object and then request a Table instance from it. If you don't need the Table instance for your own use, you should instead use the TableInputFormatBase.initalizeTable method directly.
18/03/20 15:47:38 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 406, hadoopnode.server.name): java.lang.IllegalStateException: Error while configuring input job properties
    at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:444)
    at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:342)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=50, exceptions:
Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$1.run(RpcClientImpl.java:679)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
5lwkijsr

5lwkijsr1#

我可以通过在spark-env.sh中添加以下配置来解决这个问题
export spark_classpath=/usr/hdp/current/hbase client/lib/hbase common.jar:/usr/hdp/current/hbase client/lib/hbase client.jar:/usr/hdp/current/hbase client/lib/hbase server.jar:/usr/hdp/current/hbase client/lib/hbase protocol.jar:/usr/hdp/current/hbase client/lib/guava-12.0.1.jar
删除spark.driver.extraclasspath和spark.executor.extraclasspath,我在其中从spark submit命令传递上述jar。

相关问题