Release Highlights
- Ray Data: This release features a new Delta Lake and Unity Catalog integration and performance improvements to various reading/writing operators.
- Ray Core: Enhanced GPU object support with intra-process communication and improved Autoscaler v2 functionality
- Ray Train: Improved hardware metrics integration with Grafana and enhanced collective operations support
- Ray Serve LLM: This release features early proof of concept for prefill-decode disaggregation deployment and LLM-aware request routing such as prefix-cache aware routing.
- Ray Data LLM: Improved throughput and CPU memory utilization for ray data workers.
Ray Libraries
Ray Data
🎉 New Features:
- Add reading from Delta Lake tables and Unity Catalog integration (#53701)
- Enhanced pin_memory support in iter_torch_batches (#53792)
- Add pin_memory to iter_torch_batches (#53792)
💫 Enhancements:
- Re-enabled sorting in Ray Data tests with performance improvements (#54475)
- Enhanced handling of mismatched columns and pandas.NA values (#53861, #53859)
- Improved read_text trailing newline semantics (#53860)
- Optimized backpressure handling with policy-based resource management (#54376)
- Enhanced write_parquet with support for both partition_by and row limits (#53930)
- Prevent filename collisions on write operations (#53890)
- Improved execution performance for One Hot encoding in preprocessors (#54022)
🔨 Fixes:
- Fixing map_groups issues (#54462)
- Prevented Op fusion for streaming repartition to avoid performance degradation (#54469)
- Fixed ActorPool autoscaler scaling up logic (#53983)
- Resolved empty dataset repartitioning issues (#54107)
- Fixed PyArrow overflow handling in data processing (#53971, #54390)
- Fixed IcebergDatasink to properly generate individual file uuids (#52956)
- Avoid OOMs with read_json(..., lines=True) (#54436)
- Handle HuggingFace parquet dataset resolve URLs (#54146)
- Fixed BlockMetadata derivation for Read operator (#53908)
📖 Documentation:
- Updated AggregateFnV2 documentation to clarify finalize method (#53835)
- Improved preprocessor and vectorizer API documentation
Ray Train
🎉 New Features:
- Added broadcast_from_rank_zero and barrier collective operations (#54066)
- Enhanced hardware metrics integration with Grafana dashboards (#53218)
- Added support for dynamically loading callbacks via environment variables (#54233)
💫 Enhancements:
- Improved checkpoint population from before_init_train_context (#54453)
- Enhanced controller state logging and metrics (#52805)
- Added structured logging environment variable support (#52952)
- Improved handling of Noop scaling decisions for smoother scaling logic (#53180)
- Logging of controller state transitions to aid in debugging and analysis (#53344)
🔨 Fixes:
- Fixed GPU tensor reporting in ray.train.report (#53725)
- Enhanced move_tensors_to_device utility for complex tensor structures (#53109)
- Improved worker health check error handling with trace information (#53626)
- Fixed GPU transfer support for non-contiguous tensors (#52548)
- Force abort on SIGINT spam and do not abort finished runs (#54188)
📖 Documentation:
- Updated beginner PyTorch example (#54124)
- Added documentation for ray.train.collective APIs (#54340)
- Added a note about PyTorch DataLoader's multiprocessing and forkserver usage (#52924)
- Fixed various docstring format and indentation issues (#52855, #52878)
- Added note that ray.train.report API docs should mention optional checkpoint_dir_name (#54391)
🏗 Architecture refactoring:
- Removed subclass relationship between RunConfig and RunConfigV1 (#54293)
- Enhanced error handling for finished training runs (#54188)
- Deduplicated ML doctest runners in CI for efficiency (#53157)
- Converted isort configuration to Ruff for consistency (#52869)
Ray Tune
💫 Enhancements:
- Updated test_train_v2_integration to use the correct RunConfig (#52882)
🔨 Fixes:
- Fixed RayTaskError serialization logic (#54396)
- Improved experiment restore timeout handling (#53387)
📖 Documentation:
- Replaced session.report with tune.report and corrected import paths (#52801)
- Removed outdated graphics cards reference in docs (#52922)
- Fixed various docstring format issues (#52879)
Ray Serve
🎉 New Features:
- Added RouterConfig field to DeploymentConfig for custom RequestRouter configuration (#53870)
- Added support for implementing custom request routing algorithms (#53251)
💫 Enhancements:
- Enhanced FastAPI ingress deployment validation for multiple deployments (#53647)
- Optimized get_live_deployments performance (#54454)
- Progress towards making ray.serve.llm compatible with vLLM serve frontend (#54481, #54443, #54440)
🔨 Fixes:
- Fixed deployment scheduler issues with component scheduling (#54479)
- Fixed runtime_env validation for py_modules (#53186)
- Added descriptive error message when deployment name is not found (#45181)
📖 Documentation:
- Added troubleshooting guide for DeepSeek/multi-node GPU deployment on KubeRay (#54229)
- Updated the guide on serving models with Triton Server in Ray Serve
- Added documentation for custom request routing algorithms
- Added custom request router docs (#53511)
🏗 Architecture refactoring:
- Remove indirection layers of node initialization (#54481)
- Incremental refactor of LLMEngine (#54443)
- Remove random v0 logic from serve endpoints (#54440)
- Remove usage of internal_api.memory_summary() (#54417)
- Remove usage of ray._private.state (#54140)
Ray Serve/Data LLM
🎉 New Features
- Support separate deployment config for PDProxy in PrefixAwareReplicaSet (#53935)
- Support for prefix-aware request router (#52725)
💫 Enhancements
- Log engine stats after each batch task is done. (#54360)
- Decouple max_tasks_in_flight from max_concurrent_batches (#54362)
- Make llm serve endpoints compatible with vLLM serve frontend, including streaming, tool_code, and health check support (#54440)
- Remove botocore dependency in Ray Serve LLM (#54156)
- Update vLLM version to 0.9.2 (#54407)
🔨 Fixes
- Fix health check in prefill disagg (#53937)
- Fix doc to only support int concurrency (#54196)
- Fix vLLM batch test by changing to Pixtral (#53744)
- Fix pickle error with remote code models in vLLM Ray workloads (#53868)
- Adaption of the change of vllm.PoolingOutput (#54467)
📖 Documentation
- Ray serve/lora doc fix (#53553)
- Add Ray serve/LLM doc (#52832)
- Add a doc snippet to inform users about existing diffs between vLLM and Ray Serve LLM behavior in some APIs like streaming, tool_code, and health check (#54123)
- Troubleshooting DeepSeek/multi-node GPU deployment on KubeRay (#54229)
🏗 Architecture refactoring
- Make llm serve endpoints compatible with vLLM serve frontend, including streaming, tool_code, and health check support (#54490)
- Prefix-aware scheduler [2/N] Configure PrefixAwareReplicaSet to correctly handle the number of available GPUs for each worker and to ensure efficient GPU utilization in vLLM (#53192)
- Organize spread out utils.py (#53722)
- Remove ImageRetriever class and related tests from the LLM serving codebase. (#54018)
- Return a batch of rows in the udf instead of row by row (#54329)
RLlib
🎉 New Features:
- Implemented Offline Policy Evaluation (OPE) via Importance Sampling (#53702)
- Enhanced ConnectorV2 ObservationPreprocessor APIs with multi-agent support (#54209)
- Add GPU inference to offline evaluation (#52718)
💫 Enhancements:
- Enhanced MetricsLogger to handle tensors in state management (#53514)
- Improved env seeding in EnvRunners with deterministic training example rewrite (#54039)
- Cleanup of meta learning classes and examples (#52680)
🔨 Fixes:
- Fixed EnvRunner restoration when no local EnvRunner is available (#54091)
- Fixed shapes in explained_variance for recurrent policies (#54005)
- Resolved device check issues in Learner implementation (#53706)
- Enhanced numerical stability in MeanStdFilter (#53484)
- Fixed weight synching in offline evaluation (#52757)
- Fixed bug in split_and_zero_pad utility function (#52818)
📖 Documentation:
- Do-over of examples for connector pipelines (#52604)
- Remove "new API stack" banner from all RLlib docs pages as it's now the default (#54282)
Ray Core
🎉 New Features:
- Enhanced GPU object support with intra-process communication (#53798)
- Integrated single-controller collective APIs with GPU objects (#53720)
- Added support for ray.get on driver process for GPU objects (#53902)
- Supporting allreduce on list of input nodes in compiled graphs (#51047)
- Add single-controller API for ray.util.collective and torch gloo backend (#53319)
💫 Enhancements:
- Improved autoscaler v2 functionality with cloud instance ID reusing (#54397)
- Enhanced cluster task manager with better resource management (#54413)
- Upgraded OpenTelemetry SDK for better observability (#53745)
- Improved actor scheduling to prevent deadlocks in ordered actors (#54034)
- Enhanced get_max_resources_from_cluster_config functionality (#54455)
- Use std::move in cluster task manager constructor (#54413)
- Improve status messages and add comments about stale seq_no handling (#54470)
- uv run integration is now enabled by default (#53060)
🔨 Fixes:
- Fixed race conditions in object eviction and repinning for recovery (#53934)
- Resolved GCS crash issues on duplicate MarkJobFinished RPCs (#53951)
- Enhanced actor restart handling on node failures (#54088)
- Improved reference counting during worker graceful shutdown (#53002)
- Fix race condition when canceling task that hasn't started yet (#52703)
- Fix the issue where a valid RestartActor rpc is ignored (#53330)
- Fixed "Check failed: it->second.num_retries_left == -1" error (#54116)
- Fix detached actor being unexpectedly killed (#53562)
📖 Documentation:
- Enhanced troubleshooting guides and API documentation
- Updated reStructuredText formatting on Resources page (#53882)
- Fix working code snippets (#52748)
- Add doc for running KubeRay dashboard (#53830)
- Add antipattern for nested ray.get (#43184)
🏗 Architecture refactoring:
- Delete old skipped tests and unused code (#54427)
- Consolidate TaskManager interface (#54317)
- Move dependencies of NodeManager to main.cc for better testability (#53782)
- Use smart pointer in logging.cc (#54351)
- Delete event_label and unused environment variables (#54378, #54095)
- Remove actor task path in normal task submitter (#53996)
- Rename GcsFunctionManager and use fake in test (#53973)
Dashboard
🎉 New Features:
- Add dynolog for on-demand GPU profiling for Torch training (#53191)
💫 Enhancements:
- Added TPU usage metrics to reporter agent (#53678)
- Enhanced GPU profiling manager IP address retrieval (#53807)
- Improved configurability of Grafana dashboard parameters (#53236)
- Add configurability of 'orgId' param for requesting Grafana dashboards (#53236)
🔨 Fixes:
- Fixed Grafana dashboard dropdowns for data and train dashboards (#52752)
- Resolved daylight savings time issues in dashboard (#52755)
- Fix retrieving IP address from the GPUProfilingManager on the dashboard agent (#53807)
Docs
🎉 New Features:
- New end-to-end examples:
💫 Enhancements:
Breaking Changes
- Removed deprecated ray.workflow package (#53612)
- Removed deprecated storage parameter from ray.init (#53669)
- Removed deprecated ray start CLI options (#53675)
- Removed experimental "array" library (#54105)
- Remove dask from byod 3.9 deps (#54521)
Dependencies & Build
- Added uv binary v0.7.19 for improved package management (#54437)
- Upgraded datasets in release tests (#54425)
- Enhanced wheel building process with single bazel call optimization (#54476)
- Fixed uv run parser for handling extra arguments (#54488)
- Upgrade h11, requests, starlette, jinja2, pyopenssl and cryptography
- Generate multi-arch image indexes (#52816)
Thanks!
Thank you to everyone who contributed to this release!
@kouroshHakha, @davidwagnerkc, @MengjinYan, @minerharry, @simonsays1980, @Myasuka, @noemotiovon, @goutamvenkat-anyscale, @harshit-anyscale, @jugalshah291, @tianyi-ge, @sven1977, @crypdick, @JohnsonKuan, @lk-chen, @richardsliu, @alexeykudinkin, @EagleLo, @soffer-anyscale, @zcin, @AdrienVannson, @nilsmelchert, @raulchen, @jujipotle, @DrehanM, @vigneshka, @Ziy1-Tan, @Blaze-DSP, @ArthurBook, @GokuMohandas, @walkoss, @bveeramani, @edoakes, @omatthew98, @SeanQuant, @CheyuWu, @cszhu, @win5923, @kevin85421, @angelinalg, @iamjustinhsu, @eicherseiji, @kunling-anyscale, @vickytsang, @MortalHappiness, @aslonnie, @psr-ai, @sbhat98, @anyadontfly, @marwan116, @cristianjd, @2niuhe, @codope, @fscnick, @ryanaoleary, @srinathk10, @TimothySeah, @han-steve, @Future-Outlier, @Syulin7, @Qiaolin-Yu, @elliot-barn, @JoshKarpel, @dayshah, @can-anyscale, @ok-scale, @mattip, @SolitaryThinker, @owenowenisme, @nehiljain, @GeneDer, @rnkrtt, @israbbani, @DriverSong, @sinalallsite, @pcmoritz, @akyang-anyscale, @xinyuangui2, @nrghosh, @davidxia, @rueian, @stephanie-wang, @jjyao, @chris-ray-zhang, @czgdp1807, @justinvyu, @Daraan, @landscapepainter, @troychiu, @khluu, @hipudding, @ruisearch42, @robertnishihara, @ArturNiederfahrenhorst, @abrarsheikh, @alanwguo, @HollowMan6, @ran1995data, @matthewdeng