Skip to content

[SPARK-51956][K8S] Fix KerberosConfDriverFeatureStep to warn in case of failures #50758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 29, 2025

What changes were proposed in this pull request?

This PR aims to fix KerberosConfDriverFeatureStep to warn in case of failures and continue.

Why are the changes needed?

DelegationTokenProvider.obtainDelegationTokens functions are designed to warn in case of failures.

} catch {
case NonFatal(e) =>
logWarning(Utils.createFailedToGetTokenMessage(serviceName, e))
None
case e: NoClassDefFoundError =>
logWarning(classNotFoundErrorStr)
None

case NonFatal(e) =>
logWarning(Utils.createFailedToGetTokenMessage(serviceName, e))

KerberosConfDriverFeatureStep had better follow the behavior during getting credentials and obtaining delegation tokens instead of failing at job submission.

val creds = UserGroupInformation.getCurrentUser().getCredentials()
tokenManager.obtainDelegationTokens(creds)

Failed to request driver from scheduler backend. StackTrace:  ...
at ....KerberosConfDriverFeatureStep.delegationTokens$lzycompute (KerberosConfDriverFeatureStep.scala:94)
at ....KerberosConfDriverFeatureStep$$delegationTokens (KerberosConfDriverFeatureStep.scala:90)

Does this PR introduce any user-facing change?

Previously KerberosConfDriverFeatureStep fails if there are exceptions. Now, it will continue to next steps.

How was this patch tested?

It's a little difficult to write a test case.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

Could you review this PR, @viirya ?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. If the token is required, then the submission will be failed eventually.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya .

dongjoon-hyun added a commit that referenced this pull request Apr 30, 2025
…e of failures

### What changes were proposed in this pull request?

This PR aims to fix `KerberosConfDriverFeatureStep` to warn in case of failures and continue.

### Why are the changes needed?

`DelegationTokenProvider.obtainDelegationTokens` functions are designed to warn in case of failures.

https://github.com/apache/spark/blob/54eb1a2f863bd7d8706c5c9a568895adb026c78d/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L115-L121

https://github.com/apache/spark/blob/54eb1a2f863bd7d8706c5c9a568895adb026c78d/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala#L100-L101

`KerberosConfDriverFeatureStep` had better follow the behavior during getting credentials and obtaining delegation tokens instead of failing at job submission.

https://github.com/apache/spark/blob/54eb1a2f863bd7d8706c5c9a568895adb026c78d/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala#L94-L95

```
Failed to request driver from scheduler backend. StackTrace:  ...
at ....KerberosConfDriverFeatureStep.delegationTokens$lzycompute (KerberosConfDriverFeatureStep.scala:94)
at ....KerberosConfDriverFeatureStep$$delegationTokens (KerberosConfDriverFeatureStep.scala:90)
```

### Does this PR introduce _any_ user-facing change?

Previously `KerberosConfDriverFeatureStep` fails if there are exceptions. Now, it will continue to next steps.

### How was this patch tested?

It's a little difficult to write a test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50758 from dongjoon-hyun/SPARK-51956.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 4445cd8)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member Author

Merged to master/4.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-51956 branch April 30, 2025 01:02
ericm-db pushed a commit to ericm-db/spark that referenced this pull request May 5, 2025
…e of failures

### What changes were proposed in this pull request?

This PR aims to fix `KerberosConfDriverFeatureStep` to warn in case of failures and continue.

### Why are the changes needed?

`DelegationTokenProvider.obtainDelegationTokens` functions are designed to warn in case of failures.

https://github.com/apache/spark/blob/54eb1a2f863bd7d8706c5c9a568895adb026c78d/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L115-L121

https://github.com/apache/spark/blob/54eb1a2f863bd7d8706c5c9a568895adb026c78d/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala#L100-L101

`KerberosConfDriverFeatureStep` had better follow the behavior during getting credentials and obtaining delegation tokens instead of failing at job submission.

https://github.com/apache/spark/blob/54eb1a2f863bd7d8706c5c9a568895adb026c78d/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala#L94-L95

```
Failed to request driver from scheduler backend. StackTrace:  ...
at ....KerberosConfDriverFeatureStep.delegationTokens$lzycompute (KerberosConfDriverFeatureStep.scala:94)
at ....KerberosConfDriverFeatureStep$$delegationTokens (KerberosConfDriverFeatureStep.scala:90)
```

### Does this PR introduce _any_ user-facing change?

Previously `KerberosConfDriverFeatureStep` fails if there are exceptions. Now, it will continue to next steps.

### How was this patch tested?

It's a little difficult to write a test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50758 from dongjoon-hyun/SPARK-51956.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Kimahriman pushed a commit to Kimahriman/spark that referenced this pull request May 13, 2025
…e of failures

### What changes were proposed in this pull request?

This PR aims to fix `KerberosConfDriverFeatureStep` to warn in case of failures and continue.

### Why are the changes needed?

`DelegationTokenProvider.obtainDelegationTokens` functions are designed to warn in case of failures.

https://github.com/apache/spark/blob/54eb1a2f863bd7d8706c5c9a568895adb026c78d/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L115-L121

https://github.com/apache/spark/blob/54eb1a2f863bd7d8706c5c9a568895adb026c78d/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala#L100-L101

`KerberosConfDriverFeatureStep` had better follow the behavior during getting credentials and obtaining delegation tokens instead of failing at job submission.

https://github.com/apache/spark/blob/54eb1a2f863bd7d8706c5c9a568895adb026c78d/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala#L94-L95

```
Failed to request driver from scheduler backend. StackTrace:  ...
at ....KerberosConfDriverFeatureStep.delegationTokens$lzycompute (KerberosConfDriverFeatureStep.scala:94)
at ....KerberosConfDriverFeatureStep$$delegationTokens (KerberosConfDriverFeatureStep.scala:90)
```

### Does this PR introduce _any_ user-facing change?

Previously `KerberosConfDriverFeatureStep` fails if there are exceptions. Now, it will continue to next steps.

### How was this patch tested?

It's a little difficult to write a test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50758 from dongjoon-hyun/SPARK-51956.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
yhuang-db pushed a commit to yhuang-db/spark that referenced this pull request Jun 9, 2025
…e of failures

### What changes were proposed in this pull request?

This PR aims to fix `KerberosConfDriverFeatureStep` to warn in case of failures and continue.

### Why are the changes needed?

`DelegationTokenProvider.obtainDelegationTokens` functions are designed to warn in case of failures.

https://github.com/apache/spark/blob/54eb1a2f863bd7d8706c5c9a568895adb026c78d/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L115-L121

https://github.com/apache/spark/blob/54eb1a2f863bd7d8706c5c9a568895adb026c78d/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala#L100-L101

`KerberosConfDriverFeatureStep` had better follow the behavior during getting credentials and obtaining delegation tokens instead of failing at job submission.

https://github.com/apache/spark/blob/54eb1a2f863bd7d8706c5c9a568895adb026c78d/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala#L94-L95

```
Failed to request driver from scheduler backend. StackTrace:  ...
at ....KerberosConfDriverFeatureStep.delegationTokens$lzycompute (KerberosConfDriverFeatureStep.scala:94)
at ....KerberosConfDriverFeatureStep$$delegationTokens (KerberosConfDriverFeatureStep.scala:90)
```

### Does this PR introduce _any_ user-facing change?

Previously `KerberosConfDriverFeatureStep` fails if there are exceptions. Now, it will continue to next steps.

### How was this patch tested?

It's a little difficult to write a test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50758 from dongjoon-hyun/SPARK-51956.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants