-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Open
Labels
dataRay Data-related issuesRay Data-related issuesenhancementRequest for new feature and/or capabilityRequest for new feature and/or capability
Description
Description
Ray Data community has been rapidly growing every month. We're putting out here the roadmap for the upcoming quarter.
Feel free to leave any requests or questions below. If there's anything missing, please feel free to leave below!
Read/Write integrations
- Improved Iceberg Support
- Upsert support:
- Overwrite support:
- Leveraging partition / metadata information:
- Creating, altering, deleting tables
- Bounded Kafka Support:
- write_kafka support:
- Turbopuffer Support
- Land DataSourceV2
- Hive support:
Expressions
- Exposing core expression ops: [Data] - Ray Data Compute Expressions #58674
Query Planning
- Predicate pushdown: [Data] - Add Predicate Pushdown Rule #58150, [Data] - Predicate Pushdown - Push predicate exprs past eligible operators #58555
- Projection pushdown: [data] support shuffle projection pushdown #58151
Improved streaming support
- map_groups to support iterator outputs: [data] support generator udf for map_groups #58039
Preprocessors
- Transform optimizations
- Migrate Preprocessors to Aggregate V2 : [Data] Use Approximate Quantile for RobustScaler Preprocessor #58371
- Simplify API for custom preprocessors
- Save/Load framework for preprocessors: [Data] Add serialization framework for preprocessors #58321
Training Dataloading
- Improve shuffle flexibility (i.e., shuffle metadata at row groups or row level)
- Collation off training thread: [Data] Support Exact Batch Size Enforcement in map_batches to Enable Collate Functions in Ray Data Pipeline #58837
- Proper patterns for mixing dataloaders
Shuffle scalability envelope
We plan to publish a scalability envelope for Ray Data and a roadmap for improving scalability to make Ray Data competitive with other data processing frameworks.
LLM integrations
- Multimodal Data Processors: [data][llm] Introduce generic multimodal preparation #58260
- Stage-based configuration: [data][llm] Ray Data LLM Config Refactor #58298
- Tool-calling: [data][llm] Support tool calling in Ray Data LLM #58091
Preview for 2026
- External Shuffle Service: [Data] Support integration with Apache Celeborn #58687
- Unbounded data source support: [Data] Unbounded Data Sources #40513
- Windowed Expressions
- Integration with Ibis
Metadata
Metadata
Assignees
Labels
dataRay Data-related issuesRay Data-related issuesenhancementRequest for new feature and/or capabilityRequest for new feature and/or capability