Skip to content

Conversation

alexdongli0829
Copy link

In the fix, it will get the AM UGI when creating this DAGClientAMProtocolBlockingPBServerImpl and this UGI has all the needed token which Tez AM container used, so if there is need to talk to HDFS, using the AM UGI instead if possible

@alexdongli0829 alexdongli0829 changed the title [TEZ-4638]Fix the kerberos issue when there is big DAG plan using HDFS TEZ-4638: Fix the kerberos issue when there is big DAG plan using HDFS Jul 7, 2025
@alexdongli0829
Copy link
Author

Integration test on cluster

Prepare the DAG which is big enough (adjust the IPC limit "ipc.maximum.data.length" if necessary)
Before the fix

2025-06-30T10:18:18,760 INFO  [ce4666f9-a278-4f15-be97-ae59b727e14b main([])]: client.TezClient (:()) - Send dag plan using YARN local resources since it's too large, dag plan size=385547, max dag plan size through IPC=128974848, max IPC message size= 134217728
2025-06-30T10:18:18,809 INFO  [ce4666f9-a278-4f15-be97-ae59b727e14b main([])]: exec.Task (:()) - Dag submit failed due to DestHost:destPort ip-172-31-93-189.ec2.internal:8020 , LocalHost:localPort ip-172-31-93-68.ec2.internal/172.31.93.68:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:964)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:939)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1679)
        at org.apache.hadoop.ipc.Client.call(Client.java:1620)
        at org.apache.hadoop.ipc.Client.call(Client.java:1517)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)

After the fix

2025-06-30T10:34:52,975 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: client.TezClient (:()) - Send dag plan using YARN local resources since it's too large, dag plan size=389516, max dag plan size through IPC=128974848, max IPC message size= 134217728
2025-06-30T10:34:53,171 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: client.FrameworkClient (:()) - Submitted dag to TezSession, sessionName=HIVE-8f4c8a93-f6a9-4d6d-a813-cb946649815a, applicationId=application_1751278111719_0008, dagId=dag_1751278111719_0008_1, dagName=select count(*) from drone_orders where ...0 (Stage-1)
2025-06-30T10:34:53,490 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Status: Running (Executing on YARN cluster with App id application_1751278111719_0008)

2025-06-30T10:34:53,505 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: -/-  Reducer 2: 0/1
2025-06-30T10:34:56,532 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: -/-  Reducer 2: 0/1
2025-06-30T10:34:57,541 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 0/54 Reducer 2: 0/1
2025-06-30T10:35:00,565 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 0/54 Reducer 2: 0/1
2025-06-30T10:35:01,070 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 0(+3)/54     Reducer 2: 0/1
2025-06-30T10:35:02,084 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 0(+5)/54     Reducer 2: 0/1
2025-06-30T10:35:02,589 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 0(+7)/54     Reducer 2: 0/1
2025-06-30T10:35:03,598 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 0(+9)/54     Reducer 2: 0/1
2025-06-30T10:35:04,103 INFO  [8f4c8a93-f6a9-4d6d-a813-cb946649815a main([])]: SessionState (:()) - Map 1: 0(+11)/54    Reducer 2: 0/1

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 14m 49s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 9m 25s master passed
+1 💚 compile 0m 24s master passed
+1 💚 checkstyle 0m 52s master passed
+1 💚 javadoc 0m 20s master passed
+0 🆗 spotbugs 1m 40s tez-dag in master has 785 extant spotbugs warnings.
-0 ⚠️ patch 1m 47s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 17s the patch passed
-1 ❌ codespell 0m 15s /results-codespell.txt The patch generated 7 new + 0 unchanged - 0 fixed = 7 total (was 0)
+1 💚 compile 0m 16s the patch passed
+1 💚 javac 0m 16s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 0m 9s /results-checkstyle-tez-dag.txt tez-dag: The patch generated 3 new + 4 unchanged - 0 fixed = 7 total (was 4)
+1 💚 javadoc 0m 6s the patch passed
+1 💚 spotbugs 1m 3s the patch passed
_ Other Tests _
+1 💚 unit 3m 53s tez-dag in the patch passed.
+1 💚 asflicense 0m 11s The patch does not generate ASF License warnings.
34m 27s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-422/1/artifact/out/Dockerfile
GITHUB PR #422
Optional Tests dupname asflicense javac javadoc unit spotbugs checkstyle codespell detsecrets compile
uname Linux f931006b647c 5.15.0-143-generic #153-Ubuntu SMP Fri Jun 13 19:10:45 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-422/src/.yetus/personality.sh
git revision master / f28d7cb
Default Java Ubuntu-21.0.7+6-Ubuntu-0ubuntu124.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-422/1/testReport/
Max. process+thread count 216 (vs. ulimit of 5500)
modules C: tez-dag U: tez-dag
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-422/1/console
versions git=2.43.0 maven=3.8.7 spotbugs=4.9.3 codespell=2.0.0
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 10s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 9m 37s master passed
+1 💚 compile 0m 27s master passed
+1 💚 checkstyle 0m 50s master passed
+1 💚 javadoc 0m 24s master passed
+0 🆗 spotbugs 1m 50s tez-dag in master has 785 extant spotbugs warnings.
-0 ⚠️ patch 1m 58s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 18s the patch passed
-1 ❌ codespell 0m 21s /results-codespell.txt The patch generated 7 new + 0 unchanged - 0 fixed = 7 total (was 0)
+1 💚 compile 0m 17s the patch passed
+1 💚 javac 0m 17s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 8s /results-checkstyle-tez-dag.txt tez-dag: The patch generated 3 new + 4 unchanged - 0 fixed = 7 total (was 4)
+1 💚 javadoc 0m 5s the patch passed
+1 💚 spotbugs 1m 12s the patch passed
_ Other Tests _
+1 💚 unit 4m 13s tez-dag in the patch passed.
+1 💚 asflicense 0m 11s The patch does not generate ASF License warnings.
20m 59s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-422/2/artifact/out/Dockerfile
GITHUB PR #422
Optional Tests dupname asflicense javac javadoc unit spotbugs checkstyle codespell detsecrets compile
uname Linux 1e986561ba05 5.15.0-143-generic #153-Ubuntu SMP Fri Jun 13 19:10:45 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-422/src/.yetus/personality.sh
git revision master / a926b41
Default Java Ubuntu-21.0.7+6-Ubuntu-0ubuntu124.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-422/2/testReport/
Max. process+thread count 258 (vs. ulimit of 5500)
modules C: tez-dag U: tez-dag
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-422/2/console
versions git=2.43.0 maven=3.8.7 spotbugs=4.9.3 codespell=2.0.0
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants