Skip to content

Conversation

chimeyrock999
Copy link
Contributor

This PR adds support for hdfs:// URIs in to_remote_storage. Spark DataFrames can now be written to HDFS as Parquet files, and all files under the target directory are listed using Spark’s native HDFS integration.
This change leverages Spark’s built-in Hadoop FileSystem API, so no additional HDFS client is needed.

@chimeyrock999 chimeyrock999 requested a review from a team as a code owner September 26, 2025 03:57
@chimeyrock999 chimeyrock999 force-pushed the feature/spark_hdfs_staging branch from 583275f to 916788e Compare September 26, 2025 04:17
@chimeyrock999 chimeyrock999 changed the title Support hdfs:// URIs in to_remote_storage feat: support hdfs:// uris in to_remote_storage Sep 26, 2025
@chimeyrock999 chimeyrock999 changed the title feat: support hdfs:// uris in to_remote_storage feat: support hdfs:// uris in to_remote_storage for Spark offline store Sep 26, 2025
@ntkathole ntkathole changed the title feat: support hdfs:// uris in to_remote_storage for Spark offline store feat: Support hdfs:// uris in to_remote_storage for Spark offline store Sep 26, 2025
@HaoXuAI
Copy link
Collaborator

HaoXuAI commented Sep 26, 2025

Thanks for your contribution! Would be great to add a test

@chimeyrock999 chimeyrock999 force-pushed the feature/spark_hdfs_staging branch from 61fd065 to df17b70 Compare September 26, 2025 08:59
@chimeyrock999
Copy link
Contributor Author

chimeyrock999 commented Sep 26, 2025

Just added safety checks for spark_session._jvm and _jsc in _list_hdfs_files to fix mypy union-attr errors, mainly to pass lint.

Copy link
Collaborator

@HaoXuAI HaoXuAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HaoXuAI HaoXuAI merged commit 5e4b9fd into feast-dev:master Sep 27, 2025
1 of 2 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Sep 30, 2025
# [0.54.0](v0.53.0...v0.54.0) (2025-09-30)

### Bug Fixes

* Column quoting in query of `PostgreSQLOfflineStore.pull_all_from_table_or_query` ([#5621](#5621)) ([e8eae71](e8eae71))
* Correct column list polars materialization engine ([#5595](#5595)) ([39aeb0c](39aeb0c))
* Fix Go feature server entitykey serialization for version 3 ([#5622](#5622)) ([5ab18a6](5ab18a6))
* Fix hostname resolution for spark tests ([#5610](#5610)) ([8f0e22d](8f0e22d))
* Fixed filtering based on data_source for ODFVs ([#5593](#5593)) ([c3e6c56](c3e6c56))
* Fixed project_description to set in registry and UI ([#5602](#5602)) ([02c3006](02c3006))
* Fixed Registry Cache Refresh Issues ([#5604](#5604)) ([3c7a022](3c7a022))
* Fixed tls issue when running both grpc and rest servers ([#5617](#5617)) ([51c16b1](51c16b1))
* Fixed transaction handling with SQLite registry ([#5588](#5588)) ([0052754](0052754))
* Update the deprecated functions in Go feature server. ([#5632](#5632)) ([a24e06e](a24e06e))
* Updated python packages conflicting with kserve dependencies ([#5580](#5580)) ([d56baf4](d56baf4))

### Features

* Add 'featureView' in global search api result for features. ([#5626](#5626)) ([76590bf](76590bf))
* Add aggregation in OnDemandFeatureView ([#5629](#5629)) ([8715ae8](8715ae8))
* Added codeflare-sdk to requirements ([#5640](#5640)) ([51a0ee6](51a0ee6))
* Added RemoteDatasetProxy that executes Ray Data operations remotely ([7128024](7128024))
* Added support for image search ([#5577](#5577)) ([56c5910](56c5910))
* Enable ingestion without event timestamp ([#5625](#5625)) ([eb51f00](eb51f00))
* Feast dataframe phase1 ([#5611](#5611)) ([2ce4198](2ce4198))
* Feast dataframe phase2 ([#5612](#5612)) ([1d08786](1d08786))
* Feast Namespaces registry for client ConfigMaps availability ([#5599](#5599)) ([728589a](728589a))
* Support hdfs:// uris in to_remote_storage for Spark offline store ([#5635](#5635)) ([5e4b9fd](5e4b9fd))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants