spark提交作业在yarn nodemanager上失败,错误为客户端无法通过:[令牌,kerberos]进行身份验证

4smxwvx5  于 2021-05-29  发布在  Hadoop

我正在以客户机模式运行spark submit。已使用启用kerberos的hdp沙盒设置了yarn。hdp沙盒正在mac主机上的docker容器上运行。当从沙盒的docker容器中运行spark submit时,它会成功运行,但当从主机运行spark submit时,它会在接受状态后立即失败,出现错误:

19/07/28 00:41:21 INFO yarn.Client: Application report for application_1564298049378_0008 (state: ACCEPTED)
19/07/28 00:41:22 INFO yarn.Client: Application report for application_1564298049378_0008 (state: ACCEPTED)
19/07/28 00:41:23 INFO yarn.Client: Application report for application_1564298049378_0008 (state: FAILED)
19/07/28 00:41:23 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1564298049378_0008 failed 2 times due to AM Container for appattempt_1564298049378_0008_000002 exited with  exitCode: -1000
Failing this attempt.Diagnostics: (
... 37 more
Caused by: Client cannot authenticate via:[TOKEN,  KERBEROS]


2019-07-28 22:39:04,654 INFO  resourcemanager.ClientRMService ( - Allocated new applicationId: 20
2019-07-28 22:39:10,982 INFO  capacity.CapacityScheduler ( - Application 'application_1564332457320_0020' is submitted without priority hence considering default queue/cluster priority: 0
2019-07-28 22:39:10,982 INFO  capacity.CapacityScheduler ( - Priority '0' is acceptable in queue : santosh for application: application_1564332457320_0020
2019-07-28 22:39:10,983 WARN  rmapp.RMAppImpl ( - The specific max attempts: 0 for application: 20 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead.
2019-07-28 22:39:10,983 INFO  collector.TimelineCollectorManager ( - the collector for application_1564332457320_0020 was added
2019-07-28 22:39:10,984 INFO  resourcemanager.ClientRMService ( - Application with id 20 submitted by user santosh
2019-07-28 22:39:10,984 INFO  security.DelegationTokenRenewer ( - application_1564332457320_0020 found existing hdfs token Kind: HDFS_DELEGATION_TOKEN, Service:, Ident: (token for santosh: HDFS_DELEGATION_TOKEN owner=santosh@XXX.XX, renewer=yarn, realUser=, issueDate=1564353550169, maxDate=1564958350169, sequenceNumber=125, masterKeyId=20)
2019-07-28 22:39:11,011 INFO  security.DelegationTokenRenewer ( - Renewed delegation-token= [Kind: HDFS_DELEGATION_TOKEN, Service:, Ident: (token for santosh: HDFS_DELEGATION_TOKEN owner=santosh@XXX.XX, renewer=yarn, realUser=, issueDate=1564353550169, maxDate=1564958350169, sequenceNumber=125, masterKeyId=20);exp=1564439951007; apps=[application_1564332457320_0020]]
2019-07-28 22:39:11,011 INFO  security.DelegationTokenRenewer ( - Renew Kind: HDFS_DELEGATION_TOKEN, Service:, Ident: (token for santosh: HDFS_DELEGATION_TOKEN owner=santosh@XXX.XX, renewer=yarn, realUser=, issueDate=1564353550169, maxDate=1564958350169, sequenceNumber=125, masterKeyId=20);exp=1564439951007; apps=[application_1564332457320_0020] in 86399996 ms, appId = [application_1564332457320_0020]
2019-07-28 22:39:11,011 INFO  rmapp.RMAppImpl ( - Storing application with id application_1564332457320_0020
2019-07-28 22:39:11,012 INFO  rmapp.RMAppImpl ( - application_1564332457320_0020 State change from NEW to NEW_SAVING on event = START
2019-07-28 22:39:11,012 INFO  recovery.RMStateStore ( - Storing info for app: application_1564332457320_0020
2019-07-28 22:39:11,022 INFO  rmapp.RMAppImpl ( - application_1564332457320_0020 State change from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
2019-07-28 22:39:11,022 INFO  capacity.ParentQueue ( - Application added - appId: application_1564332457320_0020 user: santosh leaf-queue of parent: root #applications: 1
2019-07-28 22:39:11,023 INFO  capacity.CapacityScheduler ( - Accepted application application_1564332457320_0020 from user: santosh, in queue: santosh
2019-07-28 22:39:11,023 INFO  rmapp.RMAppImpl ( - application_1564332457320_0020 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
2019-07-28 22:39:11,023 INFO  resourcemanager.ApplicationMasterService ( - Registering app attempt : appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,024 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000001 State change from NEW to SUBMITTED on event = START
2019-07-28 22:39:11,024 INFO  capacity.LeafQueue ( - Application application_1564332457320_0020 from user: santosh activated in queue: santosh
2019-07-28 22:39:11,025 INFO  capacity.LeafQueue ( - Application added - appId: application_1564332457320_0020 user: santosh, leaf-queue: santosh #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications: 0 #queue-active-applications: 1
2019-07-28 22:39:11,025 INFO  capacity.CapacityScheduler ( - Added Application Attempt appattempt_1564332457320_0020_000001 to scheduler from user santosh in queue santosh
2019-07-28 22:39:11,028 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2019-07-28 22:39:11,033 INFO  allocator.AbstractContainerAllocator ( - assignedContainer application attempt=appattempt_1564332457320_0020_000001 container=null queue=santosh clusterResource= type=OFF_SWITCH requestedPartition=
2019-07-28 22:39:11,034 INFO  rmcontainer.RMContainerImpl ( - container_e20_1564332457320_0020_01_000001 Container Transitioned from NEW to ALLOCATED
2019-07-28 22:39:11,035 INFO  fica.FiCaSchedulerNode ( - Assigned container container_e20_1564332457320_0020_01_000001 of capacity  on host, which has 1 containers,  used and  available after allocation
2019-07-28 22:39:11,038 INFO  security.NMTokenSecretManagerInRM ( - Sending NMToken for nodeId : for container : container_e20_1564332457320_0020_01_000001
2019-07-28 22:39:11,043 INFO  rmcontainer.RMContainerImpl ( - container_e20_1564332457320_0020_01_000001 Container Transitioned from ALLOCATED to ACQUIRED
2019-07-28 22:39:11,043 INFO  security.NMTokenSecretManagerInRM ( - Clear node set for appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,044 INFO  capacity.ParentQueue ( - assignedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used= cluster=
2019-07-28 22:39:11,044 INFO  capacity.CapacityScheduler ( - Allocation proposal accepted
2019-07-28 22:39:11,044 INFO  attempt.RMAppAttemptImpl ( - Storing attempt: AppId: application_1564332457320_0020 AttemptId: appattempt_1564332457320_0020_000001 MasterContainer: Container: [ContainerId: container_e20_1564332457320_0020_01_000001, AllocationRequestId: -1, Version: 0, NodeId:, NodeHttpAddress:, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: }, ExecutionType: GUARANTEED, ]
2019-07-28 22:39:11,051 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000001 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
2019-07-28 22:39:11,057 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000001 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2019-07-28 22:39:11,060 INFO  amlauncher.AMLauncher ( - Launching masterappattempt_1564332457320_0020_000001
2019-07-28 22:39:11,068 INFO  amlauncher.AMLauncher ( - Setting up container Container: [ContainerId: container_e20_1564332457320_0020_01_000001, AllocationRequestId: -1, Version: 0, NodeId:, NodeHttpAddress:, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: }, ExecutionType: GUARANTEED, ] for AM appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,069 INFO  security.AMRMTokenSecretManager ( - Create AMRMToken for ApplicationAttempt: appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,069 INFO  security.AMRMTokenSecretManager ( - Creating password for appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,265 INFO  amlauncher.AMLauncher ( - Done launching container Container: [ContainerId: container_e20_1564332457320_0020_01_000001, AllocationRequestId: -1, Version: 0, NodeId:, NodeHttpAddress:, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: }, ExecutionType: GUARANTEED, ] for AM appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,265 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000001 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2019-07-28 22:39:11,852 INFO  resourcemanager.ResourceTrackerService ( - Update collector information for application application_1564332457320_0020 with new address: timestamp: 1564332457320, 36
2019-07-28 22:39:11,854 INFO  rmcontainer.RMContainerImpl ( - container_e20_1564332457320_0020_01_000001 Container Transitioned from ACQUIRED to RUNNING
2019-07-28 22:39:12,833 INFO  provider.BaseAuditHandler ( - Audit Status Log: name=yarn.async.batch.hdfs, interval=01:11.979 minutes, events=162, succcessCount=162, totalEvents=17347, totalSuccessCount=17347
2019-07-28 22:39:12,834 INFO  destination.HDFSAuditDestination ( - Flushing HDFS audit. Event Size:1
2019-07-28 22:39:12,857 INFO  resourcemanager.ResourceTrackerService ( - Update collector information for application application_1564332457320_0020 with new address: timestamp: 1564332457320, 37
2019-07-28 22:39:14,054 INFO  rmcontainer.RMContainerImpl ( - container_e20_1564332457320_0020_01_000001 Container Transitioned from RUNNING to COMPLETED
2019-07-28 22:39:14,055 INFO  attempt.RMAppAttemptImpl ( - Updating application attempt appattempt_1564332457320_0020_000001 with final state: FAILED, and exit status: -1000
2019-07-28 22:39:14,055 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000001 State change from LAUNCHED to FINAL_SAVING on event = CONTAINER_FINISHED
2019-07-28 22:39:14,066 INFO  resourcemanager.ApplicationMasterService ( - Unregistering app attempt : appattempt_1564332457320_0020_000001
2019-07-28 22:39:14,066 INFO  security.AMRMTokenSecretManager ( - Application finished, removing password for appattempt_1564332457320_0020_000001
2019-07-28 22:39:14,066 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000001 State change from FINAL_SAVING to FAILED on event = ATTEMPT_UPDATE_SAVED
2019-07-28 22:39:14,067 INFO  rmapp.RMAppImpl ( - The number of failed attempts is 1. The max attempts is 2
2019-07-28 22:39:14,067 INFO  resourcemanager.ApplicationMasterService ( - Registering app attempt : appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,067 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000002 State change from NEW to SUBMITTED on event = START
2019-07-28 22:39:14,067 INFO  capacity.CapacityScheduler ( - Application Attempt appattempt_1564332457320_0020_000001 is done. finalState=FAILED
2019-07-28 22:39:14,067 INFO  scheduler.AppSchedulingInfo ( - Application application_1564332457320_0020 requests cleared
2019-07-28 22:39:14,067 INFO  capacity.LeafQueue ( - Application removed - appId: application_1564332457320_0020 user: santosh queue: santosh #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2019-07-28 22:39:14,068 INFO  capacity.LeafQueue ( - Application application_1564332457320_0020 from user: santosh activated in queue: santosh
2019-07-28 22:39:14,068 INFO  capacity.LeafQueue ( - Application added - appId: application_1564332457320_0020 user: santosh, leaf-queue: santosh #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications: 0 #queue-active-applications: 1
2019-07-28 22:39:14,068 INFO  capacity.CapacityScheduler ( - Added Application Attempt appattempt_1564332457320_0020_000002 to scheduler from user santosh in queue santosh
2019-07-28 22:39:14,068 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000002 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2019-07-28 22:39:14,074 INFO  allocator.AbstractContainerAllocator ( - assignedContainer application attempt=appattempt_1564332457320_0020_000002 container=null queue=santosh clusterResource= type=OFF_SWITCH requestedPartition=
2019-07-28 22:39:14,074 INFO  rmcontainer.RMContainerImpl ( - container_e20_1564332457320_0020_02_000001 Container Transitioned from NEW to ALLOCATED
2019-07-28 22:39:14,075 INFO  fica.FiCaSchedulerNode ( - Assigned container container_e20_1564332457320_0020_02_000001 of capacity  on host, which has 1 containers,  used and  available after allocation
2019-07-28 22:39:14,075 INFO  security.NMTokenSecretManagerInRM ( - Sending NMToken for nodeId : for container : container_e20_1564332457320_0020_02_000001
2019-07-28 22:39:14,076 INFO  rmcontainer.RMContainerImpl ( - container_e20_1564332457320_0020_02_000001 Container Transitioned from ALLOCATED to ACQUIRED
2019-07-28 22:39:14,076 INFO  security.NMTokenSecretManagerInRM ( - Clear node set for appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,076 INFO  capacity.ParentQueue ( - assignedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used= cluster=
2019-07-28 22:39:14,076 INFO  capacity.CapacityScheduler ( - Allocation proposal accepted
2019-07-28 22:39:14,076 INFO  attempt.RMAppAttemptImpl ( - Storing attempt: AppId: application_1564332457320_0020 AttemptId: appattempt_1564332457320_0020_000002 MasterContainer: Container: [ContainerId: container_e20_1564332457320_0020_02_000001, AllocationRequestId: -1, Version: 0, NodeId:, NodeHttpAddress:, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: }, ExecutionType: GUARANTEED, ]
2019-07-28 22:39:14,077 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000002 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
2019-07-28 22:39:14,088 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000002 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2019-07-28 22:39:14,089 INFO  amlauncher.AMLauncher ( - Launching masterappattempt_1564332457320_0020_000002
2019-07-28 22:39:14,091 INFO  amlauncher.AMLauncher ( - Setting up container Container: [ContainerId: container_e20_1564332457320_0020_02_000001, AllocationRequestId: -1, Version: 0, NodeId:, NodeHttpAddress:, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: }, ExecutionType: GUARANTEED, ] for AM appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,092 INFO  security.AMRMTokenSecretManager ( - Create AMRMToken for ApplicationAttempt: appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,092 INFO  security.AMRMTokenSecretManager ( - Creating password for appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,110 INFO  amlauncher.AMLauncher ( - Done launching container Container: [ContainerId: container_e20_1564332457320_0020_02_000001, AllocationRequestId: -1, Version: 0, NodeId:, NodeHttpAddress:, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: }, ExecutionType: GUARANTEED, ] for AM appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,110 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000002 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2019-07-28 22:39:15,056 INFO  rmcontainer.RMContainerImpl ( - container_e20_1564332457320_0020_02_000001 Container Transitioned from ACQUIRED to RUNNING
2019-07-28 22:39:16,752 INFO  rmcontainer.RMContainerImpl ( - container_e20_1564332457320_0020_02_000001 Container Transitioned from RUNNING to COMPLETED
2019-07-28 22:39:16,755 INFO  attempt.RMAppAttemptImpl ( - Updating application attempt appattempt_1564332457320_0020_000002 with final state: FAILED, and exit status: -1000
2019-07-28 22:39:16,755 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000002 State change from LAUNCHED to FINAL_SAVING on event = CONTAINER_FINISHED
2019-07-28 22:39:16,899 INFO  resourcemanager.ApplicationMasterService ( - Unregistering app attempt : appattempt_1564332457320_0020_000002
2019-07-28 22:39:16,900 INFO  security.AMRMTokenSecretManager ( - Application finished, removing password for appattempt_1564332457320_0020_000002
2019-07-28 22:39:16,900 INFO  attempt.RMAppAttemptImpl ( - appattempt_1564332457320_0020_000002 State change from FINAL_SAVING to FAILED on event = ATTEMPT_UPDATE_SAVED
2019-07-28 22:39:16,900 INFO  rmapp.RMAppImpl ( - The number of failed attempts is 2. The max attempts is 2
2019-07-28 22:39:16,900 INFO  rmapp.RMAppImpl ( - Updating application application_1564332457320_0020 with final state: FAILED
2019-07-28 22:39:16,900 INFO  rmapp.RMAppImpl ( - application_1564332457320_0020 State change from ACCEPTED to FINAL_SAVING on event = ATTEMPT_FAILED
2019-07-28 22:39:16,900 INFO  recovery.RMStateStore ( - Updating info for app: application_1564332457320_0020
2019-07-28 22:39:16,900 INFO  capacity.CapacityScheduler ( - Application Attempt appattempt_1564332457320_0020_000002 is done. finalState=FAILED
2019-07-28 22:39:16,901 INFO  scheduler.AppSchedulingInfo ( - Application application_1564332457320_0020 requests cleared
2019-07-28 22:39:16,901 INFO  capacity.LeafQueue ( - Application removed - appId: application_1564332457320_0020 user: santosh queue: santosh #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2019-07-28 22:39:16,916 INFO  rmapp.RMAppImpl ( - Application application_1564332457320_0020 failed 2 times due to AM Container for appattempt_1564332457320_0020_000002 exited with  exitCode: -1000
Failing this attempt.Diagnostics: (
    ... 37 more
Caused by: Client cannot authenticate via:[TOKEN, KERBEROS]
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(
    at org.apache.hadoop.ipc.Client$Connection.access$2300(
    at org.apache.hadoop.ipc.Client$Connection$
    at org.apache.hadoop.ipc.Client$Connection$
    at Method)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(
    ... 40 more
Caused by: Client cannot authenticate via:[TOKEN, KERBEROS]


