在kerberos上运行spark时无法获取spark作业日志

qacovj5a  于 2021-05-31  发布在  Hadoop
关注(0)|答案(0)|浏览(1086)

我有一个hadoophdfs和yarn集群,在那里我运行spark作业,默认情况下,我没有启用“yarn日志聚合”。
然后,在启用kerberos之后,我将hdfs和yarn配置为使用linux帐户“app01”和同名的kerberos principle和keytab运行,并以最终用户“user01”为例(没有这样的linux帐户)生成一些特定于用户的principle和keytab。
然后,当我运行“kinit”以验证为“user01”并提交作业时,我无法从yarn web gui查看其作业日志(“日志”):我得到了以下信息:在此处输入图像描述然后当我单击其中任何一个时,我得到了异常:

Exception reading log file. Application submitted by 'user01' doesn't own requested log file : stderr

在此处输入图像描述
当我从linux检查时,我可以看到在工作节点中,linux日志文件是用linux“app01”帐户创建的,但是看起来yarn希望日志文件归“user01”所有,这是不可能的。
接下来,我尝试第二种解决方案:启用“Yarn日志聚合”。我将下面的配置添加到yarn-site.xml中,并重新启动hdfs和yarn。

<property>
     <name>yarn.log-aggregation-enable</name>
     <value>true</value>
 </property>

 <property>
     <name>yarn.log-aggregation.retain-seconds</name>
     <value>86400</value>
 </property>

 <property>
     <name>yarn.nodemanager.delete.debug-delay-sec</name>
     <value>600</value>
 </property>

当事情发生时,我把它改成“user01”,然后提交另一个spark作业。作业完成后,我尝试从yarn gui的“日志”链接查看日志,然后出现以下异常:

Failed while trying to construct the redirect url to the log server. Log Server url may not be configured
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.

然后我尝试从spark ui检查日志,当在spark ui中从“stdout”或“stderr”链接钻取时,我到达了与上面相同的地方,但有相同的异常。
然后,我又尝试了一次,使用“yarn logs-applicationid x”来获取作业日志,然后得到了异常:

Can not find any log file matching the pattern: [ALL] for the application: application_1588684192939_0001
Can not find the logs for the application: application_1588684192939_0001 with the appOwner: user01

然后我从worker节点查看了yarn的日志,发现了几个异常,重复了下面的一个:

2020-05-05 14:10:31,450 ERROR org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat: Error aggregating log file. Log file : /opt/disk1/data/yarn/userlogs/application_1588684192939_0001/
container_e57_1588684192939_0001_01_000001/stdout. Owner 'app01' for path /opt/disk1/data/yarn/userlogs/application_1588684192939_0001/container_e57_1588684192939_0001_01_000001/stdout did not match expected owner 'user01'
java.io.IOException: Owner 'app01' for path /opt/disk1/data/yarn/userlogs/application_1588684192939_0001/container_e57_1588684192939_0001_01_000001/stdout did not match expected owner 'user01'
    at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:284)
    at org.apache.hadoop.io.SecureIOUtils.forceSecureOpenForRead(SecureIOUtils.java:218)
    at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:203)
    at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.secureOpenFile(AggregatedLogFormat.java:293)
    at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.write(AggregatedLogFormat.java:245)
    at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:544)
    at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.write(LogAggregationTFileController.java:107)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl$ContainerLogAggregator.doContainerLogAggregation(AppLogAggregatorImpl.java:581)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:323)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:459)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:415)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:265)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

所以现在的问题是,在启用kerberos之后,最终用户不再能够查看spark作业日志,无论是yarn gui、spark ui还是yarn命令行。我只能从linux文件系统或hdfs日志聚合文件夹(最终用户不允许访问)中检查日志。
有人能告诉我一些想法吗?谢谢!

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题