我正在运行spark应用程序 local
模式下,唯一的任务是列出数据库:
from pyspark.sql import SparkSession
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
spark.sql('show databases').show()
如果我使用当前的kerberos令牌运行作业,则一切都按预期工作:
$ spark-submit --master local app.py
(...)
20/08/28 19:28:52 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://...:9083
20/08/28 19:28:52 INFO HiveMetaStoreClient: HMSC::open(): Could not find delegation token. Creating KERBEROS-based thrift connection.
20/08/28 19:28:52 INFO HiveMetaStoreClient: Opened a connection to metastore, current connections: 1
20/08/28 19:28:52 INFO HiveMetaStoreClient: Connected to metastore.
20/08/28 19:28:52 INFO RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=... (auth:KERBEROS) retries=1 delay=5 lifetime=0
但是,如果我尝试使用 --proxy-user
,失败:
$ spark-submit --master local --proxy-user otheruser app.py
(...)
20/08/28 19:32:17 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://...:9083
20/08/28 19:32:17 INFO HiveMetaStoreClient: HMSC::open(): Could not find delegation token. Creating KERBEROS-based thrift connection.
20/08/28 19:32:17 ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
有趣的是,读/写hdfs(也是kerberized)在有代理用户和没有代理用户的情况下都没有任何问题。也, spark-sql
与代理用户连接良好:
$ spark-sql --master local --proxy-user otheruser
(...)
20/08/28 19:35:26 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://...:9083
20/08/28 19:35:26 INFO HiveMetaStoreClient: HMSC::open(): Found delegation token. Creating DIGEST-based thrift connection.
20/08/28 19:35:26 INFO HiveMetaStoreClient: Opened a connection to metastore, current connections: 1
20/08/28 19:35:26 INFO HiveMetaStoreClient: Connected to metastore.
20/08/28 19:35:26 INFO RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=otheruser (auth:PROXY) via ... (auth:KERBEROS) retries=1 delay=5 lifetime=0
可能是由于这个代码段在spark启动之前发布了凭据(请参阅spark-23639)
你知道本地模式的哪种spark选项可以让代理用户工作吗?或者这是我的环境问题,上面的示例应该与本地模式和代理用户一起工作?我将感谢任何帮助!
我在spark 2.3.1、2.4.5和3.0.0上看到了这个问题。
暂无答案!
目前还没有任何答案,快来回答吧!