Skip to content

[C++] Implement file reads for Azure filesystem #37511

@Tom-Newton

Description

@Tom-Newton

Describe the enhancement requested

Read support probably requires an Azure implementation for arrow::io::RandomAccessFile then that can be used to implement the OpenInputStream and OpenInputFile methods of the AzureFileSystem.

#12914 implemented all of these features so this will be largely a case of just extracting the relevant parts from there. One modification I would suggest compared to that would be to avoid branching logic based on whether the Azure storage account has the hierarchical namespace enabled. Utilising features of the hierarchical namespace can make renames and listing tasks faster but for just reading blobs it shouldn't make any difference.

If we want to use features of the hierarchical namespace that adds some complexities:

  1. Makes things harder to test because its not supported by azurite Is there a plan to support AdlsGen2 (Datalake Storage) on top of blobstore emulator ?  Azure/Azurite#553
  2. Its a bit difficult to query the storage account to determine if it supports hierarchical namespace. ServiceClient::GetAccountInfo() requires Storage Account Contributor permissions (https://learn.microsoft.com/en-us/rest/api/storageservices/get-blob-service-properties?tabs=azure-ad#authorization) which is quite significantly elevated. Hadoop solves this by essentially calling PathClient::GetAccessControlList() and if it raises an exception hierarchical namespace is not supported https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java#L356-L385.

Related Issues:

Component(s)

C++

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions