在缓存中找不到Spark HDFS_DELEGATION_TOKEN

问题描述

I am running simplest Driver alone long running job to reproduce this error
Hadoop Version        2.7.3.2.6.5.0-292 
Spark-core version    2_11.2.3.0.2.6.5.0-292

Code:
FileSystem fs = tmpPath.getFileSystem(sc.hadoopConfiguration())
log.info("Path {} is ",path,fs.exists(tmpPath);

行为:我的工作运行了大约17-18个小时,没有任何问题,之后,新密钥作为HadoopFSDelagationTokenProvider的一部分被释放,并且工作继续使用新发行的委托令牌运行,但是在委托的下一个小时内令牌续签,作业失败,并在缓存中找不到错误令牌。我已经继续并以编程方式为涉及的名称节点生成了自己的dfs.adddelegationtoken,我看到了相同的行为。

问题:

  1. 将委派令牌从服务器中删除的机会有哪些,并且有哪些属性对此进行控制?。
  2. 哪些服务器端日志显示此令牌即将被删除或从缓存中删除
Path /test/abc.parquet is true
Path /test/abc.parquet is true
INFO Successfully logged into KDC
INFO getting token for DFS[DFSClient][clientName=DFSClient_NONMAPREDUCE_2324234_29,ugi=qa_user@ABC.com(auth:KERBEROS)](org.apache.spark.deploy.security.HadoopFSDelagationTokenProvider)
INFO Created HDFS_DELEGATION_TOKEN token 31615466 for qa_user on ha:hdfs:hacluster
INFO getting token for DFS[DFSClient][clientName=DFSClient_NONMAPREDUCE_2324234_29,ugi=qa_user@ABC.com(auth:KERBEROS)](org.apache.spark.deploy.security.HadoopFSDelagationTokenProvider)
INFO Created HDFS_DELEGATION_TOKEN token 31615467 for qa_user on ha:hdfs:hacluster
INFO writing out delegation tokens to hdfs://abc/user/qa/.sparkstaging/application_121212.....tmp
INFO delegation tokens written out successfully,renaming file to hdfs://.....
INFO delegation token file rename complete(org.apache.spark.deploy.yarn.security.AMCredentialRenewer)
Scheduling login from keytab in 64799125 millis
Path /test/abc.parquet is true
Path /test/abc.parquet is true

Caused by: org.apache.hadoop.ipc.remoteexception(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 31615466 for qa_user) can't be found in cache
 at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
 at org.apache.hadoop.ipc.Client.call(Client.java:1498)
 at org.apache.hadoop.ipc.Client.call(Client.java:1398)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
 at com.sun.proxy.$Proxy13.getListing(UnkNown Source)
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:620)
 at sun.reflect.GeneratedMethodAccessor8.invoke(UnkNown Source)
FYI submitted in yarn-cluster-mode with:
--keytab /path/to/the/headless-keytab,--principal principalNameAsPerTheKeytab 
--conf spark.hadoop.fs.hdfs.impl.disable.cache=true
Note that Token renewer is issuing new keys and new keys are working too,But it;s somehow gets revoked from server,AM logs doesn't have any clue on the same.

解决方法

回答我自己的问题:

从这里可以摘取几个非常重要的观点。

  1. 委托令牌是存储在UserGroupInformation.getCredentials.getAllTokens()中的单个副本,可以获取 由保存JVM中运行的任何其他线程更新。我的问题是 设定固定 mapreduce.job.complete.cancel.delegation.tokens=false用于所有其他 在相同上下文中运行的作业,尤其是运行MAPREDUCE的作业 上下文。
  2. HadoopFSDelagationTokenProvider应该为每个(fraction*renewal time)更新密钥,即默认0.75*24 hrs(如果已提交) --keytab和--principal
  3. 确保为hdfs文件系统设置了fs.disable.cache,即,每次获取新的文件系统对象时,它的操作成本很高,但是新鲜 确保具有新密钥的fsObject而不是从中获取它 CACHE.get(fsname)

如果没有任何作用,则可以通过以下方式创建自己的委托令牌: 用新的Credentials()调用 https://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/api/org/apache/hadoop/fs/FileSystem.html#addDelegationTokens(java.lang.String,%20org.apache.hadoop.security.Credentials) 但必须使用kerberosUGI.doAS({});

调用此方法