问题描述
我正在local
模式下运行spark应用程序,唯一的工作就是列出数据库:
from pyspark.sql import SparkSession
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
spark.sql('show databases').show()
如果我使用当前的Kerberos令牌运行作业,则一切都会按预期进行:
$ spark-submit --master local app.py
(...)
20/08/28 19:28:52 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://...:9083
20/08/28 19:28:52 INFO HiveMetaStoreClient: HMSC::open(): Could not find delegation token. Creating KERBEROS-based thrift connection.
20/08/28 19:28:52 INFO HiveMetaStoreClient: Opened a connection to metastore,current connections: 1
20/08/28 19:28:52 INFO HiveMetaStoreClient: Connected to metastore.
20/08/28 19:28:52 INFO RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=... (auth:KERBEROS) retries=1 delay=5 lifetime=0
但是,如果我尝试使用--proxy-user
,它将失败:
$ spark-submit --master local --proxy-user otheruser app.py
(...)
20/08/28 19:32:17 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://...:9083
20/08/28 19:32:17 INFO HiveMetaStoreClient: HMSC::open(): Could not find delegation token. Creating KERBEROS-based thrift connection.
20/08/28 19:32:17 ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
有趣的是,无论是否有代理用户,对HDFS(也采用Kerberized)的读/写都不会出现任何问题。另外,spark-sql
与代理用户的连接良好:
$ spark-sql --master local --proxy-user otheruser
(...)
20/08/28 19:35:26 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://...:9083
20/08/28 19:35:26 INFO HiveMetaStoreClient: HMSC::open(): Found delegation token. Creating DIGEST-based thrift connection.
20/08/28 19:35:26 INFO HiveMetaStoreClient: Opened a connection to metastore,current connections: 1
20/08/28 19:35:26 INFO HiveMetaStoreClient: Connected to metastore.
20/08/28 19:35:26 INFO RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=otheruser (auth:PROXY) via ... (auth:KERBEROS) retries=1 delay=5 lifetime=0
可能要感谢this snippet在启动Spark之前发布了凭据(请参见SPARK-23639)
您是否知道,本地模式的哪种Spark选项会使代理用户正常工作?还是这是我的环境问题,并且上面的示例应该与本地模式和代理用户一起工作?我将不胜感激!
我正在Spark 2.3.1、2.4.5和3.0.0上观察到此问题。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)