-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-39320: [C++][FS][Azure] Add managed identity auth configuration #39321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39320: [C++][FS][Azure] Add managed identity auth configuration #39321
Conversation
|
Looks like merging workload identity caused merge conflicts. I'll fix it tomorrow. |
e71b386 to
bdb0a00
Compare
cpp/src/arrow/filesystem/azurefs.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that account_name isn't used. Is it needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Sorry, I have not been paying enough attention 🤦♂️ . I think @felipecrv refactored this slightly. I need to check how the account name is configured now. I'll do it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
It seems that both of ConfigureWorkloadIdentityCredential() and ConfigureDefaultCredential() are the same situation too. Could you check them too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now. I have added a check so that filesystem initialisation fails if account_name_.empty().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I messed up and forgot to set account_name_ on the Configure... functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should undo something I did and move account_name_ back to a public account_name field (the first in AzureOptions, but remove the account_name parameter from all the Configure... functions. That allows us to avoid the repetition of the account_name parameter and is more in line to how every credential constructor doesn't really need an account_name.
@Tom-Newton what you think and can you do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to make account_name_ public though? Personally I would just make account_name a required argument of the AzureOptions constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a good idea in normal classes, but these *Options classes tend to be designed like simple C structs that can be constructed from any language and easily serialized. The only exception being the secret stuff that we don't want to leak and lazily initialized fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I think its as you recommended now.
bdb0a00 to
d4af840
Compare
cpp/src/arrow/filesystem/azurefs.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I messed up and forgot to set account_name_ on the Configure... functions.
cpp/src/arrow/filesystem/azurefs.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should undo something I did and move account_name_ back to a public account_name field (the first in AzureOptions, but remove the account_name parameter from all the Configure... functions. That allows us to avoid the repetition of the account_name parameter and is more in line to how every credential constructor doesn't really need an account_name.
@Tom-Newton what you think and can you do this?
| const { | ||
| if (account_name_.empty()) { | ||
| return Status::Invalid("AzureOptions doesn't contain a valid account name"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another simplification we can make:
- Rename
kAnonymoustokDefaultCredential - On the
kDefaultCredentialcase here, initializetoken_credential_tostd::make_shared<Azure::Identity::DefaultAzureCredential>()if it's not initialized already.
In the kTokenCredential case you can add a DCHECK(token_credential_) as documentation that when credential_kind_ is set to kTokenCredential we expect it to be set already (something that is guaranteed by credential_kind_ being private and set only by functions that also initialize token_credential_.
What you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its useful to support both anonymous and default credential as distinct things. Admittedly, its a niche use-case but storage accounts can be public, which is when anonymous is useful https://learn.microsoft.com/en-us/azure/storage/blobs/anonymous-read-access-configure?tabs=portal.
Its a good question though which should be the default. I know adlfs deafults to anonomous rather than default credential. In some respects it might be good to be consistent but also this choice of default has caused issues for me in the past e.g. flyteorg/flyte#3962
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Makes sense to keep kAnonymous then. But what would you return in AzureOptions::MakeBlobServiceClient() when credential_kind_ is anonymous? Lazy-init can you create the client without any credentials?token_credential_ with DefaultAzureCredential and create the client with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked and there is a constructor that doesn't take any credentials. I guess that's what we should call :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Do you mind if we do that in a separate PR to fix up anonymous and change the default to default credential. This was supposed to be a super simple PR until kou realised all the auths were broken 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem.
|
the MINGW failure is just |
|
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit d519544. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
…ion (apache#39321) ### Rationale for this change Workload identity is a useful Azure authentication method. Also I failed to set the account_name correctly for a bunch of auths (I think this got lost in a rebase then I copy pasted the broken code). ### What changes are included in this PR? - Make filesystem initialisation fail if `account_name_.empty()`. This prevents the account name configuration bug we had. Also added a test asserting that filesystem initialization fails in this case. - Remove account name configuration on all auth configs, in favour of setting in separately from the auth configuration. - Implement `AzureOptions::ConfigureManagedIdentityCredential` ### Are these changes tested? Added a simple test initialising a filesystem using `ConfigureManagedIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for apache#39263. ### Are there any user-facing changes? Managed identity authentication is now supported. * Closes: apache#39320 Authored-by: Thomas Newton <[email protected]> Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
Rationale for this change
Workload identity is a useful Azure authentication method. Also I failed to set the account_name correctly for a bunch of auths (I think this got lost in a rebase then I copy pasted the broken code).
What changes are included in this PR?
account_name_.empty(). This prevents the account name configuration bug we had. Also added a test asserting that filesystem initialization fails in this case.AzureOptions::ConfigureManagedIdentityCredentialAre these changes tested?
Added a simple test initialising a filesystem using
ConfigureManagedIdentityCredential. This is not the most comprehensive test but its the same as what we agreed on for #39263.Are there any user-facing changes?
Managed identity authentication is now supported.