我正在运行spark应用程序 local
from pyspark.sql import SparkSession
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
spark.sql('show databases').show()
$ spark-submit --master local app.py
20/08/28 19:28:52 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://...:9083
20/08/28 19:28:52 INFO HiveMetaStoreClient: HMSC::open(): Could not find delegation token. Creating KERBEROS-based thrift connection.
20/08/28 19:28:52 INFO HiveMetaStoreClient: Opened a connection to metastore, current connections: 1
20/08/28 19:28:52 INFO HiveMetaStoreClient: Connected to metastore.
20/08/28 19:28:52 INFO RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=... (auth:KERBEROS) retries=1 delay=5 lifetime=0
但是,如果我尝试使用 --proxy-user
$ spark-submit --master local --proxy-user otheruser app.py
20/08/28 19:32:17 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://...:9083
20/08/28 19:32:17 INFO HiveMetaStoreClient: HMSC::open(): Could not find delegation token. Creating KERBEROS-based thrift connection.
20/08/28 19:32:17 ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
有趣的是,读/写hdfs(也是kerberized)在有代理用户和没有代理用户的情况下都没有任何问题。也, spark-sql
$ spark-sql --master local --proxy-user otheruser
20/08/28 19:35:26 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://...:9083
20/08/28 19:35:26 INFO HiveMetaStoreClient: HMSC::open(): Found delegation token. Creating DIGEST-based thrift connection.
20/08/28 19:35:26 INFO HiveMetaStoreClient: Opened a connection to metastore, current connections: 1
20/08/28 19:35:26 INFO HiveMetaStoreClient: Connected to metastore.
20/08/28 19:35:26 INFO RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=otheruser (auth:PROXY) via ... (auth:KERBEROS) retries=1 delay=5 lifetime=0
我在spark 2.3.1、2.4.5和3.0.0上看到了这个问题。