-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-40037: [C++][FS][Azure] Make attempted reads and writes against directories fail fast #40119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40037: [C++][FS][Azure] Make attempted reads and writes against directories fail fast #40119
Conversation
|
@felipecrv I expect you will be interested in reviewing this. |
| DCHECK_GE(content_length_, 0); | ||
| pos_ = content_length_; | ||
| Status Init(const bool truncate, | ||
| std::function<Status()> ensure_not_flat_namespace_directory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can inject AzureFileSystem *azure_file_system here and not have to allocate a closure for this. You would call AzureFileSystem::Impl::EnsureNotFlatNamespaceDirectory(location) via azure_file_system->impl_ (accessible because the handles produced by the azure file system can be friends with the filesystem class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the extra info. I was planning to do this but I was struggling with the friends thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trick is to forward-declare the class as well.
diff --git a/cpp/src/arrow/filesystem/azurefs.h b/cpp/src/arrow/filesystem/azurefs.h
index 2a131e40c..d48ef9dd7 100644
--- a/cpp/src/arrow/filesystem/azurefs.h
+++ b/cpp/src/arrow/filesystem/azurefs.h
@@ -44,6 +44,7 @@ class DataLakeServiceClient;
namespace arrow::fs {
+class ObjectAppendStream;
class TestAzureFileSystem;
/// Options for the AzureFileSystem implementation.
@@ -180,6 +181,7 @@ class ARROW_EXPORT AzureFileSystem : public FileSystem {
explicit AzureFileSystem(std::unique_ptr<Impl>&& impl);
+ friend class ObjectAppendStream;
friend class TestAzureFileSystem;
void ForceCachedHierarchicalNamespaceSupport(int hns_support);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my main problem was that ObjectAppendStream is defined inside an anonymous namespace but I still haven't got it working as you describe.
Are you suggesting to use AzureFileSystem *azure_file_system or AzureFileSystem:Impl *azure_file_system as the argument to ObjectAppendStream::Impl. I don't know how I can get a AzureFileSystem pointer from inside AzureFileSystem::Impl and using AzureFileSystem::Impl as the argument leads to incomplete type errors which I don't think I can avoid.
Also if you wouldn't mind I would be interested to know what the disadvantage of a lambda function is compared to what you proposed.
Sorry about my lacking C++ knowledge here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if you wouldn't mind I would be interested to know what the disadvantage of a lambda function is compared to what you proposed.
To create the std::function, you heap allocate an object with copies of the values in the capture list and generate a lot more extra code in the binary:
class function {
T valuesfromthecpapturelist;
RetType operator()(ArgsType ...) {...};
}
When you think about an std::function this way (a pair of context data and a function), you realize the class you already serves that purpose.
But hey, this is becoming challenging, so I won't hold the PR anymore because of this. Moving to Init() was a big step in the right direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining
| DCHECK_GE(content_length_, 0); | ||
| pos_ = content_length_; | ||
| Status Init(const bool truncate, | ||
| std::function<Status()> ensure_not_flat_namespace_directory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if you wouldn't mind I would be interested to know what the disadvantage of a lambda function is compared to what you proposed.
To create the std::function, you heap allocate an object with copies of the values in the capture list and generate a lot more extra code in the binary:
class function {
T valuesfromthecpapturelist;
RetType operator()(ArgsType ...) {...};
}
When you think about an std::function this way (a pair of context data and a function), you realize the class you already serves that purpose.
But hey, this is becoming challenging, so I won't hold the PR anymore because of this. Moving to Init() was a big step in the right direction.
| TYPED_TEST(TestAzureFileSystemOnAllScenarios, | ||
| OpenOutputStreamWithMissingContainer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is causing the linter to fail @Tom-Newton. Please fix and I will merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
My apologies for the noise. Looking into this now. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8a62f30. There was 1 benchmark result with an error:
There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
That should be the last one. Sorry again about the noise! |
|
@austin3dickey no worries :) |
Rationale for this change
Prevent confusion if a user attempts to read or write a directory.
What changes are included in this PR?
ObjectAppendStream::Flusha noop ifObjectAppendStream::Inithas not run successfully. This avoids an unhandled error when the destructor calls flush.ObjectInputFileorObjectAppendStream.ObjectAppendStreamcallGetFileInfoif it is a flat namespace account.Are these changes tested?
Add new tests
DisallowReadingOrWritingDirectoryMarkersandDisallowCreatingFileAndDirectoryWithTheSameNameto cover the new fail fast behaviour.Also updated
WriteMetadatato ensure that my changes to Flush didn't break setting metadata without callingWriteon the stream.Are there any user-facing changes?
Yes. Invalid read and write operations will now fail fast and gracefully. Previously could get into a confusing state where there were files and directories at the same path and there were some un-graceful failures.