-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-37626: [Java][Cpp] support using existing java file system #37690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
I think what we'd want is to add a FileSystem parameter to FileSystemDatasetFactory::Make or FileSystemFactoryOptions, instead of a global. |
hmm, just the naming, it is a field in FileSystemFactoryOptions and then passed down not a global variable. |
|
Ah, sorry. In that case it should be named something less Java-specific and should probably be typed |
|
|
||
| /// when java context have a file system reference to be used | ||
| // then use this file system reference in the context | ||
| std::shared_ptr<void> file_system_java = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be Java-specific at all.
This should be shared_ptr<FileSystem>.
| std::string internal_path; | ||
| ARROW_ASSIGN_OR_RAISE(std::shared_ptr<fs::FileSystem> filesystem, | ||
| fs::FileSystemFromUri(uri, &internal_path)) | ||
| fs::FileSystemFromUriAndFs(uri, &internal_path, options.file_system_java)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you would do is, if a filesystem is provided, call FileSystem::PathFromUri to set internal_path. If it succeeds, then proceed, else raise an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds correct, will do later
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change (allowing a filesystem to be specified when running discovery) seems good. I agree with @lidavidm that this shouldn't be java-specific.
|
|
|
Thank you for your contribution. Unfortunately, this pull request has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this PR will be closed in 14 days. Feel free to re-open this if it has been closed in error. If you do not have repository permissions to reopen the PR, please tag a maintainer. |
|
Closing this now that the Java implementation has moved to: https://github.com/apache/arrow-java |
Rationale for this change
when using the dataset api to repeatedly read chunks of file, FileSystem instances are created every time. causing performance degrade.
What changes are included in this PR?
to reuse existing filesystem passed from java context.
Are these changes tested?
tested on my local integration. as there seems no end to end dataset testing in the base
Are there any user-facing changes?
NO