-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Describe the enhancement requested
Read support probably requires an Azure implementation for arrow::io::RandomAccessFile then that can be used to implement the OpenInputStream and OpenInputFile methods of the AzureFileSystem.
#12914 implemented all of these features so this will be largely a case of just extracting the relevant parts from there. One modification I would suggest compared to that would be to avoid branching logic based on whether the Azure storage account has the hierarchical namespace enabled. Utilising features of the hierarchical namespace can make renames and listing tasks faster but for just reading blobs it shouldn't make any difference.
If we want to use features of the hierarchical namespace that adds some complexities:
- Makes things harder to test because its not supported by azurite Is there a plan to support AdlsGen2 (Datalake Storage) on top of blobstore emulator ? Azure/Azurite#553
- Its a bit difficult to query the storage account to determine if it supports hierarchical namespace.
ServiceClient::GetAccountInfo()requires Storage Account Contributor permissions (https://learn.microsoft.com/en-us/rest/api/storageservices/get-blob-service-properties?tabs=azure-ad#authorization) which is quite significantly elevated. Hadoop solves this by essentially callingPathClient::GetAccessControlList()and if it raises an exception hierarchical namespace is not supported https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java#L356-L385.
Related Issues:
- [C++] Filesystem implementation for Azure Blob Storage #18014 (is a child of)
Component(s)
C++