Releases: ray-project/ray
Ray-2.49.2
There is no difference between 2.49.2 and 2.49.1, though we needed a patch version for other out of band reasons. To fill the awkward blankness, here is a haiku about Ray:
Summit drawing near
Ray advances, step by step
Scaling without end
Ray-2.49.1
Ray-2.49.0
Release Highlights
Ray Data:
- We’ve implemented a variety of performance enhancements, including improved actor/node autoscaling with budget-aware decisions; faster/more accurate shuffle accounting; reduced Parquet metadata footprint; and out-of-order execution for higher throughput.
- We’ve also implemented anti/semi joins, stratified train_test_split, and added Snowflake connectors.
Ray Core:
- Performance/robustness cleanups around GCS publish path and raylet internals; simpler OpenTelemetry flagging; new user-facing API to wait for GPU tensor free; plus assorted test/infra tidy-ups
Ray Train:
- We’ve introduced a new JaxTrainer with SPMD support for TPUs.
Ray Serve:
- Custom Autoscaling per Deployment Serve now supports user-defined autoscaling policies via AutoscalingContext and AutoscalingPolicy, enabling fine-grained scaling logic at the deployment level. This is part of a large effort where we are adding support for autoscaling based on custom metrics in Serve, see this RFC for more details.
- Async Inference (Initial Support): Ray Serve introduces asynchronous inference execution, laying the foundation for better throughput and latency in async workloads. Please see this RFC for more details.
- Major Performance Gains: This version of ray serve brings double digit % performance improvements both in throughput and latency. See release notes for more details.
Ray Serve/Data LLM:
- We’ve refactored Ray Serve LLM to be fully compatible with the default vllm serve and also now supports vLLM=0.10.
- We’ve added a prefix cache-aware router with PrefixCacheAffinityRouter for optimized cache utilization; dynamic cache management via reset prefix cache remote methods; enhanced LMCacheConnectorV1 with kv_transfer_config support.
Ray Libraries
Ray Data
🎉 New Features:
- Wrapped batch indices in a BatchMetadata object to make per-batch metadata explicit. (#55643)
- Added support for Anti/Semi Join types. (#55272)
- Introduced an Issue Detection Framework. (#55155)
- Added an option to enable out-of-order execution for better performance. (#54504)
- Introduced a StreamingSplit logical operator for DAG rewrite. (#54994)
- Added a stratify parameter to train_test_split. (#54624)
- Added Snowflake connectors. (#51429)
- Updated Hudi integration to support incremental query. (#54301)
- Added an Actor location tracker. (#54590)
- Added BundleQueue.has_next. (#54710)
- Made DEFAULT_OBJECT_STORE_MEMORY_LIMIT_FRACTION configurable. (#54873)
- Added Expression support & a with_columns API. (#54322)
- Allocate GPU resources in ResourceManager. (#54445)
💫 Enhancements:
- Decoupled actor and node autoscaling; autoscaling now also considers budget. (#55673, #54902)
- Faster hash-shuffle resource usage calculation; more accurate shuffle progress totals. (#55503, #55543)
- Reduced Parquet metadata storage usage. (#54821)
- Export API improvements: refresh dataset/operator state, sanitize metadata, and truncate exported metadata. (#55355, #55379, #55216, #54623)
- Metrics & observability: task metric improvements, external-buffer block-count metric, row-based metrics, clearer operator names in logs, single debug log when aggregators are ready. (#55429, #55022, #54693, #52949, #54483)
- Dashboard: added “Max Bytes to Read” panel/budget, panels for blocks-per-task and bytes-per-block, and streaming executor duration. (#55024, #55020, #54614)
- Planner/execution & infra cleanups: ExecutionResources and StatsManager cleanup, planner interface refactor, node trackers init, removed ray.get in _MapWorker ctor, removed target_shuffle_max_block_size. (#54694, #55400, #55018, #54665, #54734, #55158)
- Behavior/interop tweaks: map_batches defaults to row_modification=False and avoids pushing past limit; limited operator pushdown; prefetch for PandasJSONDatasource; use cloudpickle for Arrow tensor extension ser/des; bumped Arrow to 21.0; schema warning tone change. (#54992, #54457, #54667, #54831, #55426, #54630)
- Removed randomize-blocks reorder rule for more stable behavior. (#55278)
🔨 Fixes:
- AutoscalingActorPool now properly downscales after execution. (#55565)
- StatsManager handles StatsActor loss on disconnect. (#55163)
- Handle missing chunks key when Databricks UC query returns zero rows. (#54526)
- Handle empty fragments in sampling when num_row_groups=0. (#54822)
- Restored handling of PyExtensionType to keep compatibility with previously written datasets. (#55498)
- Prevent negative resource budget when concurrency exceeds the global limit; fixed resource-manager log calculation. (#54986, #54878)
- Default write_parquet warning removed; handled unhashable types in OneHotEncoding. (#54864, #54863)
- Overwrite mode now maps to the correct Arrow behavior for parallel writes. (#55118)
- Added back from_daft Arrow-version checks. (#54907)
- Pandas chained in-place assignment warning resolved. (#54486)
- Test stability/infra: fixed flaky tests, adjusted bounds and sizes, added additional release tests/chaos variants for image workloads, increased join test size, adjusted sorting release test to produce 1 GB blocks. (#55485, #55489, #54806, #55120, #54716, #55402, #54971)
📖 Documentation:
- Added a user guide for aggregations. (#53568)
- Added a code snippet in docs for partitioned writes. (#55002)
- Updated links to Lance documentation. (#54836)
Ray Train
🎉 New Features:
- Introduced JaxTrainer with SPMD support on TPUs (#55207)
💫 Enhancements:
- ray.train.get_dataset_shard now lazily configures dataset sharding for better startup behavior (#55230)
- Clearer worker error logging (#55222)
- Fail fast when placement group requirements can never be satisfied (#54402)
- New ControllerError surfaced and handled via failure policy for improved resiliency (#54801, #54833)
- TrainStateActor periodically checks controller health and aborts when necessary (#53818)
🔨 Fixes:
- Resolve circular import in ray.train.v2.lightning.lightning_utils (#55668)
- Fix XGBoost v2 callback behavior (#54787)
- Suppress a spurious type error (#50994)
- Reduce test flakiness: remove randomness and bump a data-integration test size (#55315, #55633)
📖 Documentation:
- New LightGBMTrainer user guide (#54492)
- Fix code-snippet syntax highlighting (#54909)
- Minor correction in experiment-tracking guide comment (#54605)
🏗 Architecture refactoring:
- Public Train APIs routed through TrainFnUtils for consistency (#55226)
- LoggingManager utility for Train logging (#55121)
- Convert DEFAULT variables from strings to bools (#55581)
Ray Tune
🎉 New Features:
- Add video FPS support to WandbLoggerCallback (#53638)
💫 Enhancements:
- Typing: reset_config now explicitly returns bool (#54581)
- CheckpointManager supports recording scoring metric only (#54642)
🔨 Fixes:
📖 Documentation:
Ray Serve
🎉 New Features:
- Async inference support in Ray Serve (initial phase). Provides basic asynchronous inference execution, with follow-up work planned for failed/unprocessed queues and additional tests. #54824
- Per-deployment custom autoscaling controls. Introduces AutoscalingContext and AutoscalingPolicy classes, enabling user-defined autoscaling strategies at the deployment level. #55253
- Same event loop router. Adds option to run the Serve router in the same event loop as the proxy, yielding ~17% throughput improvement. #55030
💫 Enhancements:
- Async get_current_servable_instance(). Converts the FastAPI dependency to async def, removing threadpool overhead and boosting performance: 35% higher RPS and reduced latency. #55457
- Access log optimization. Cached contexts in request path logging improved request throughput by ~16% with lower average latency. #55166
- Batching improvements. Default batch wait timeout increased from 0.0s to 0.01s (10ms) to enable meaningful batching. #55126
- HTTP receive refactor. Cleaned up handling of replica-side HTTP receive tasks. #54543 / #54565
- Configurable replica router backoff. Added knobs for retry/backoff control when routing to replicas. #54723
- Autoscaling ergonomics. Marked per-deployment autoscaling metrics push interval config as deprecated for consistency. #55102
- Health check & env var safety. Introduced warnings for invalid/zero/negative environment variable values, with migration path planned for Ray 2.50.0. #55464, #54944
- Improved CLI UX. serve config now prints No configuration was found. instead of an empty string. #54767
🔨 Fixes:
Ray-2.48.0
Release Highlights
- Ray Data: This release features a new Delta Lake and Unity Catalog integration and performance improvements to various reading/writing operators.
- Ray Core: Enhanced GPU object support with intra-process communication and improved Autoscaler v2 functionality
- Ray Train: Improved hardware metrics integration with Grafana and enhanced collective operations support
- Ray Serve LLM: This release features early proof of concept for prefill-decode disaggregation deployment and LLM-aware request routing such as prefix-cache aware routing.
- Ray Data LLM: Improved throughput and CPU memory utilization for ray data workers.
Ray Libraries
Ray Data
🎉 New Features:
- Add reading from Delta Lake tables and Unity Catalog integration (#53701)
- Enhanced pin_memory support in iter_torch_batches (#53792)
- Add pin_memory to iter_torch_batches (#53792)
💫 Enhancements:
- Re-enabled sorting in Ray Data tests with performance improvements (#54475)
- Enhanced handling of mismatched columns and pandas.NA values (#53861, #53859)
- Improved read_text trailing newline semantics (#53860)
- Optimized backpressure handling with policy-based resource management (#54376)
- Enhanced write_parquet with support for both partition_by and row limits (#53930)
- Prevent filename collisions on write operations (#53890)
- Improved execution performance for One Hot encoding in preprocessors (#54022)
🔨 Fixes:
- Fixing map_groups issues (#54462)
- Prevented Op fusion for streaming repartition to avoid performance degradation (#54469)
- Fixed ActorPool autoscaler scaling up logic (#53983)
- Resolved empty dataset repartitioning issues (#54107)
- Fixed PyArrow overflow handling in data processing (#53971, #54390)
- Fixed IcebergDatasink to properly generate individual file uuids (#52956)
- Avoid OOMs with read_json(..., lines=True) (#54436)
- Handle HuggingFace parquet dataset resolve URLs (#54146)
- Fixed BlockMetadata derivation for Read operator (#53908)
📖 Documentation:
- Updated AggregateFnV2 documentation to clarify finalize method (#53835)
- Improved preprocessor and vectorizer API documentation
Ray Train
🎉 New Features:
- Added broadcast_from_rank_zero and barrier collective operations (#54066)
- Enhanced hardware metrics integration with Grafana dashboards (#53218)
- Added support for dynamically loading callbacks via environment variables (#54233)
💫 Enhancements:
- Improved checkpoint population from before_init_train_context (#54453)
- Enhanced controller state logging and metrics (#52805)
- Added structured logging environment variable support (#52952)
- Improved handling of Noop scaling decisions for smoother scaling logic (#53180)
- Logging of controller state transitions to aid in debugging and analysis (#53344)
🔨 Fixes:
- Fixed GPU tensor reporting in ray.train.report (#53725)
- Enhanced move_tensors_to_device utility for complex tensor structures (#53109)
- Improved worker health check error handling with trace information (#53626)
- Fixed GPU transfer support for non-contiguous tensors (#52548)
- Force abort on SIGINT spam and do not abort finished runs (#54188)
📖 Documentation:
- Updated beginner PyTorch example (#54124)
- Added documentation for ray.train.collective APIs (#54340)
- Added a note about PyTorch DataLoader's multiprocessing and forkserver usage (#52924)
- Fixed various docstring format and indentation issues (#52855, #52878)
- Added note that ray.train.report API docs should mention optional checkpoint_dir_name (#54391)
🏗 Architecture refactoring:
- Removed subclass relationship between RunConfig and RunConfigV1 (#54293)
- Enhanced error handling for finished training runs (#54188)
- Deduplicated ML doctest runners in CI for efficiency (#53157)
- Converted isort configuration to Ruff for consistency (#52869)
Ray Tune
💫 Enhancements:
- Updated test_train_v2_integration to use the correct RunConfig (#52882)
🔨 Fixes:
- Fixed RayTaskError serialization logic (#54396)
- Improved experiment restore timeout handling (#53387)
📖 Documentation:
- Replaced session.report with tune.report and corrected import paths (#52801)
- Removed outdated graphics cards reference in docs (#52922)
- Fixed various docstring format issues (#52879)
Ray Serve
🎉 New Features:
- Added RouterConfig field to DeploymentConfig for custom RequestRouter configuration (#53870)
- Added support for implementing custom request routing algorithms (#53251)
💫 Enhancements:
- Enhanced FastAPI ingress deployment validation for multiple deployments (#53647)
- Optimized get_live_deployments performance (#54454)
- Progress towards making ray.serve.llm compatible with vLLM serve frontend (#54481, #54443, #54440)
🔨 Fixes:
- Fixed deployment scheduler issues with component scheduling (#54479)
- Fixed runtime_env validation for py_modules (#53186)
- Added descriptive error message when deployment name is not found (#45181)
📖 Documentation:
- Added troubleshooting guide for DeepSeek/multi-node GPU deployment on KubeRay (#54229)
- Updated the guide on serving models with Triton Server in Ray Serve
- Added documentation for custom request routing algorithms
- Added custom request router docs (#53511)
🏗 Architecture refactoring:
- Remove indirection layers of node initialization (#54481)
- Incremental refactor of LLMEngine (#54443)
- Remove random v0 logic from serve endpoints (#54440)
- Remove usage of internal_api.memory_summary() (#54417)
- Remove usage of ray._private.state (#54140)
Ray Serve/Data LLM
🎉 New Features
- Support separate deployment config for PDProxy in PrefixAwareReplicaSet (#53935)
- Support for prefix-aware request router (#52725)
💫 Enhancements
- Log engine stats after each batch task is done. (#54360)
- Decouple max_tasks_in_flight from max_concurrent_batches (#54362)
- Make llm serve endpoints compatible with vLLM serve frontend, including streaming, tool_code, and health check support (#54440)
- Remove botocore dependency in Ray Serve LLM (#54156)
- Update vLLM version to 0.9.2 (#54407)
🔨 Fixes
- Fix health check in prefill disagg (#53937)
- Fix doc to only support int concurrency (#54196)
- Fix vLLM batch test by changing to Pixtral (#53744)
- Fix pickle error with remote code models in vLLM Ray workloads (#53868)
- Adaption of the change of vllm.PoolingOutput (#54467)
📖 Documentation
- Ray serve/lora doc fix (#53553)
- Add Ray serve/LLM doc (#52832)
- Add a doc snippet to inform users about existing diffs between vLLM and Ray Serve LLM behavior in some APIs like streaming, tool_code, and health check (#54123)
- Troubleshooting DeepSeek/multi-node GPU deployment on KubeRay (#54229)
🏗 Architecture refactoring
- Make llm serve endpoints compatible with vLLM serve frontend, including streaming, tool_code, and health check support (#54490)
- Prefix-aware scheduler [2/N] Configure PrefixAwareReplicaSet to correctly handle the number of available GPUs for each worker and to ensure efficient GPU utilization in vLLM (#53192)
- Organize spread out utils.py (#53722)
- Remove ImageRetriever class and related tests from the LLM serving codebase. (#54018)
- Return a batch of rows in the udf instead of row by row (#54329)
RLlib
🎉 New Features:
- Implemented Offline Policy Evaluation (OPE) via Importance Sampling (#53702)
- Enhanced ConnectorV2 ObservationPreprocessor APIs with multi-agent support (#54209)
- Add GPU inference to offline evaluation (#52718)
💫 Enhancements:
- Enhanced MetricsLogger to handle tensors in state management (#53514)
- Improved env seeding in EnvRunners with deterministic training example rewrite (#54039)
- Cleanup of meta learning classes and examples (#52680)
🔨 Fixes:
- Fixed EnvRunner restoration when no local EnvRunner is available (#54091)
- Fixed shapes in explained_variance for recurrent policies (#54005)
- Resolved device check issues in Learner implementation (#53706)
- Enhanced numerical stability in MeanStdFilter (#53484)
- Fixed weight synching in offline evaluation (#52757)
- Fixed bug in split_and_zero_pad utility function (#52818)
📖 Documentation:
- Do-over of examples for connector pipelines (#52604)
- Remove "new API stack" banner from all RLlib docs pages as it's now the default (#54282)
Ray Core
🎉 New Features:
- Enhanced GPU object support with intra-process communication (#53798)
- Integrated single-controller collective APIs with GPU objects (#53720)
- Added support for ray.get on driver process for GPU objects (#53902)
- Supporting allreduce on list of input nodes in compiled graphs (#51047)
- Add single-controller API for ray.util.collective and torch gloo backend (#53319)
💫 Enhancements:
- Improved autoscaler v2 functionality with cloud instance ID reusing (#54397)
- Enhanced cluster task manager with better resource management (#54413)
- Upgraded OpenTelemetry SDK for better observability (#53745)
- Improved actor scheduling to prevent deadlocks in ordered actors (#54034)
- Enhanced get_max_resources_from_cluster_config functionality (#54455)
- Use std::move in cluster task manager constructor (#54413)
- Improve status messages and add comments about stale seq_no handling (#54470)
- uv run integration is now enabled by default (#53060)
🔨 Fixes:
- Fixed race conditions in object eviction and repinning for recovery (#53934)
- Resolved GCS crash issues on duplicate MarkJobFinished RPCs (#53951)
- Enhanced actor restart handling on node failures (#54088)
- Improved reference counting during worker graceful shutdown (#53002)
- Fix race condition when canceling task that hasn't started yet (#52703)
- Fix the issue where a valid RestartActor rpc is ignored (#53330)
- Fixed "Check failed: it->second.num_retries_left == -1" error (#54116)
- Fix detached actor bei...
Ray-2.47.1
Ray 2.47.1 fixed an issue where Ray failed to start on Mac (#53807)
Ray-2.47.0
Release Highlights
- Prefill disaggregation is now supported in initial support in Ray Serve LLM (#53092). This is critical for production LLM serving use cases.
- Ray Data features a variety of performance improvements (locality-based scheduling, non-blocking execution) as well as improvements to observability, preprocessors, and other stability fixes.
- Ray Serve now features custom request routing algorithms, which is critical for high throughput traffic for large model use cases.
Ray Libraries
Ray Data
🎉 New Features:
- Add save modes support to file data sinks (#52900)
- Added flattening capability to the Concatenator preprocessor to support output vectorization use cases (#53378)
💫 Enhancements:
- Re-enable Actor locality-based scheduling. This PR also improves algorithms for ranking the locations for the bundle. (#52861)
- Disable blocking pipeline by default until Actor Pool fully scales up to min actors (#52754)
- Progress bar and dashboard improvements to show name of partial functions properly(#52280)
🔨 Fixes:
- Make Ray Data
from_torch
respect Dataset len (#52804) - Fixing flaky aggregation test (#53383)
- Fix race condition bug in fault tolerance by disabling
on_exit
hook (#53249) - Fix
move_tensors_to_device
utility for the list/tuple[tensor] case (#53109) - Fix
ActorPool
scaling to avoid scaling down when the input queue is empty (#53009) - Fix internal queues accounting for all Operators w/ an internal queue (#52806)
- Fix backpressure for
FileBasedDatasource
. This fixes potential OOMs for workloads usingFileBasedDatasources
(#52852)
📖 Documentation:
- Fix working code snippets (#52748)
- Improve AggregateFnV2 docstrings and examples (#52911)
- Improved documentation for vectorizers and API visibility in Data (#52456)
Ray Train
🎉 New Features:
- Added support for configuring Ray Train worker actor runtime environments. (#52421)
- Included Grafana panel data in Ray Train export for improved monitoring. (#53072)
- Introduced a structured logging environment variable to standardize log formats. (#52952)
- Added metrics for
TrainControllerState
to enhance observability. (#52805)
💫 Enhancements:
- Logging of controller state transitions to aid in debugging and analysis. (#53344)
- Improved handling of
Noop
scaling decisions for smoother scaling logic. (#53180)
🔨 Fixes:
- Improved
move_tensors_to_device utility
to correctly handlelist
/tuple
of tensors. (#53109) - Fixed GPU transfer support for non-contiguous tensors. (#52548)
- Increased timeout in
test_torch_device_manager
to reduce flakiness. (#52917)
📖 Documentation:
- Added a note about PyTorch DataLoader’s multiprocessing and forkserver usage. (#52924)
- Fixed various docstring format and indentation issues. (#52855, #52878)
- Removed unused "configuration-overview" documentation page. (#52912)
- General typo corrections. (#53048)
🏗 Architecture refactoring:
- Deduplicated ML doctest runners in CI for efficiency. (#53157)
- Converted isort configuration to Ruff for consistency. (#52869)
- Removed unused
PARALLEL_CI
blocks and combined imports. (#53087, #52742)
Ray Tune
💫 Enhancements:
- Updated
test_train_v2_integration
to use the correctRunConfig
. (#52882)
📖 Documentation:
- Replaced
session.report
withtune.report
and corrected import paths. (#52801) - Removed outdated graphics cards reference in docs. (#52922)
- Fixed various docstring format issues. (#52879)
Ray Serve
🎉 New Features:
- Added support for implementing custom request routing algorithms. (#53251)
- Introduced an environment variable to prioritize custom resources during deployment scheduling. (#51978)
💫 Enhancements:
- The ingress API now accepts a builder function in addition to an ASGI app object. (#52892)
🔨 Fixes:
- Fixed
runtime_env
validation forpy_modules
. (#53186) - Disallowed special characters in Serve deployment and application names. (#52702)
- Added a descriptive error message when a deployment name is not found. (#45181)
📖 Documentation:
- Updated the guide on serving models with Triton Server in Ray Serve.
- Added documentation for custom request routing algorithms.
Ray Serve/Data LLM
🎉 New Features:
- Added initial support for prefill decode disaggregation (#53092)
- Expose vLLM Metrics to
serve.llm
API (#52719) - Embedding API (#52229)
💫 Enhancements:
- Allow setting
name_prefix
inbuild_llm_deployment
(#53316) - Minor bug fix for 53144: stop tokens cannot be null (#53288)
- Add missing
repetition_penalty
vLLM sampling parameter (#53222) - Mitigate the serve.llm streaming overhead by properly batching stream chunks (#52766)
- Fix test_batch_vllm leaking resources by using larger
wait_for_min_actors_s
🔨 Fixes:
LLMRouter.check_health()
should checkLLMServer.check_health()
(#53358)- Fix runtime passthrough and auto-executor class selection (#53253)
- Update
check_health
return type (#53114) - Bug fix for duplication of
<bos>
token (#52853) - In stream batching, first part of the stream was always consumed and not streamed back from the router (#52848)
RLlib
🎉 New Features:
- Add GPU inference to offline evaluation. (#52718)
💫 Enhancements:
- Do-over of examples for connector pipelines. (#52604)
- Cleanup of meta learning classes and examples. (#52680)
🔨 Fixes:
- Fixed weight synching in offline evaluation. (#52757)
- Fixed bug in
split_and_zero_pad
utility function (related to complex structures vs simple values ornp.arrays
). (#52818)
Ray Core
💫 Enhancements:
uv run
integration is now enabled by default, so you don't need to set theRAY_RUNTIME_ENV_HOOK
any more (#53060). If you rely on the previous behavior whereuv run
only runs the Ray driver but not the workers in the uv environment, you can switch back to the old behavior by setting theRAY_ENABLE_UV_RUN_RUNTIME_ENV=0
environment variable.- Record gcs process metrics (#53171)
🔨 Fixes:
- Improvements for using
RuntimeEnv
in the Job Submission API. (#52704) - Close unused pipe file descriptor of child processes of Raylet (#52700)
- Fix race condition when canceling task that hasn't started yet (#52703)
- Implement a thread pool and call the CPython API on all threads within the same concurrency group (#52575)
- cgraph: Fix execution schedules with collective operations (#53007)
- cgraph: Fix scalar tensor serialization edge case with
serialize_to_numpy_or_scalar
(#53160) - Fix the issue where a valid
RestartActor
rpc is ignored (#53330) - Fix reference counter crashes during worker graceful shutdown (#53002)
Dashboard
🎉 New Features:
- train: Add dynolog for on-demand GPU profiling for Torch training (#53191)
💫 Enhancements:
- Add configurability of 'orgId' param for requesting Grafana dashboards (#53236)
🔨 Fixes:
- Fix Grafana dashboards dropdowns for data and train dashboard (#52752)
- Fix dashboard for daylight savings (#52755)
Ray Container Images
💫 Enhancements:
- Upgrade
h11
(#53361),requests
,starlette
,jinja2
(#52951),pyopenssl
andcryptography
(#52941) - Generate multi-arch image indexes (#52816)
Docs
🎉 New Features:
- End-to-end example: Entity recognition with LLMs (#52342) - new end-to-end example
- End-to-end example: xgboost tutorial (#52383)
- End-to-end tutorial for audio transcription and LLM as judge curation (#53189)
💫 Enhancements:
- Adds pydoclint to pre-commit (#52974)
Thanks!
Thank you to everyone who contributed to this release!
@NeilGirdhar, @ok-scale, @JiangJiaWei1103, @brandonscript, @eicherseiji, @ktyxx, @MichalPitr, @GeneDer, @rueian, @khluu, @bveeramani, @ArturNiederfahrenhorst, @c8ef, @lk-chen, @alanwguo, @simonsays1980, @codope, @ArthurBook, @kouroshHakha, @Yicheng-Lu-llll, @jujipotle, @aslonnie, @justinvyu, @machichima, @pcmoritz, @saihaj, @wingkitlee0, @omatthew98, @can-anyscale, @nadongjun, @chris-ray-zhang, @dizer-ti, @matthewdeng, @ryanaoleary, @janimo, @crypdick, @srinathk10, @cszhu, @TimothySeah, @iamjustinhsu, @mimiliaogo, @angelinalg, @gvspraveen, @kevin85421, @jjyao, @elliot-barn, @xingyu-long, @LeoLiao123, @thomasdesr, @ishaan-mehta, @noemotiovon, @hipudding, @davidxia, @omahs, @MengjinYan, @dengwxn, @MortalHappiness, @alhparsa, @emmanuel-ferdman, @alexeykudinkin, @KunWuLuan, @dev-goyal, @sven1977, @akyang-anyscale, @GokuMohandas, @raulchen, @abrarsheikh, @edoakes, @JoshKarpel, @bhmiller, @seanlaii, @ruisearch42, @dayshah, @Bye-legumes, @petern48, @richardliaw, @rclough, @israbbani, @jiwq
Ray-2.46.0
Release Highlights
The 2.46 Ray release comes with a couple core highlights:
- Ray Data now supports hash shuffling for repartition and aggregations, along with support for joins. This enables many new data processing workloads to be run on Ray Data. Please give it a try and let us know if you have any feedback!
- Ray Serve LLM now supports vLLM v1 to be forward-compatible with upcoming vLLM releases. This also opens up significant performance improvements that come with vLLM's v1 refactor.
- There is a new Train Grafana dashboard which provides in-depth metrics on Grafana for better metrics on training workloads.
Ray Libraries
Ray Data
🎉 New Features:
- Adding support for hash-shuffle based repartitioning and aggregations (#52664)
- Added support for Joins (using hash-shuffle) (#52728)
- [LLM] vLLM support upgrades to 0.8.5 (#52344)
💫 Enhancements:
- Add memory attribute to ExecutionResources (#51127)
- Support ray_remote_args for read_tfrecords #52450
- [data.dashboard] Skip reporting internal metrics (#52666)
- Add PhysicalOperator.min_max_resource_usage_bounds (#52502)
- Speed up printing the schema (#52612)
- [data.dashboard] Dataset logger for worker (#52706)
- Support new pyiceberg version (#51744)
- Support num_cpus, memory, concurrency, batch_size for preprocess (#52574)
🔨 Fixes:
- Handle Arrow Array null types in to_numpy (#52572)
- Fix S3 serialization wrapper compatibility with RetryingPyFileSystem (#52568)
- Fixing Optimizer to apply rules until plan stabilize; (#52663)
- Fixing FuseOperators rule to properly handle the case of transformations drastically changing size of the dataset (#52570)
📖 Documentation:
- [LLM] Improve concurrency settings, improve prompt to achieve better throughput (#52634)
Ray Train
🎉 New Features:
- Add initial Train Grafana dashboard (#52709)
💫 Enhancements:
- Lazily import torch FSDP for ray.train.torch module to improve performance and reduce unnecessary dependencies (#52707)
- Deserialize the user-defined training function directly on workers, improving efficiency (#52684)
🔨 Fixes:
- Fixed error when no arguments are passed into TorchTrainer (#52693)
📖 Documentation:
- Added new XGBoostTrainer user guide (#52355)
🏗 Architecture refactoring:
- Re-enabled isort for python/ray/train to maintain code formatting consistency (#52717)
Ray Tune
📖 Documentation:
- Fixed typo in Ray Tune PyTorch Lightning docs (#52756)
Ray Serve
💫 Enhancements:
- [LLM] Refactor LLMServer and LLMEngine to not diverge too much from vllm chat formatting logic (#52597)
- Bump vllm from 0.8.2 to 0.8.5 in /python (#52344)
- [LLM] Add router replicas and batch size to llm config (#52655)
🔨 Fixes:
- Request cancellation not propagating correctly across deployments (#52591)
- BackpressureError not properly propagated in FastAPI ingress deployments (#52397)
- Hanging issue when awaiting deployment responses (#52561)
- [Serve.llm] made Ray Serve LLM compatible with vLLM v1 (#52668)
📖 Documentation:
- [Serve][LLM] Add doc for deploying DeepSeek (#52592)
RLLib
🎉 New Features:
- Offline Evaluation with loss function for Offline RL pipeline. Introduces three new callbacks,
on_offline_evaluate_start
,on_offline_evaluate_end
,on_offline_eval_runners_recreated
(#52308)
💫 Enhancements:
- New
custom_data
attribute forSingleAgentEpisode
andMultiAgentEpisode
to store custom metrics. Deprecatesadd|get_temporary_timestep_data()
(#52603)
Ray Core
💫 Enhancements:
- Only get serialization context once for all .remote args (#52690)
- Add grpc server success and fail count metric (#52711)
🔨 Fixes:
- Fix open leak for plasma store memory (shm/fallback) by workers (#52622)
- Assure closing of unused pipe for dashboard subprocesses (#52678)
- Expand protection against dead processes in reporter agent (#52657)
- [cgraph] Separate metadata and data in cross-node shared memory transport (#52619)
- Fix JobID check for detached actor tasks (#52405)
- Fix potential log loss of tail_job_logs (#44709)
🏗 Architecture refactoring:
- Cancel tasks when an owner dies instead of checking if an owner is dead during scheduling (#52516)
- Unify GcsAioClient and GcsClient (#52735)
- Remove worker context dependency from the task receiver (#52740)
Dashboard
🎉 New Features:
- Ray Train Grafana Dashboard added with a few built-in metrics. More to come.
Thanks!
Thank you to everyone who contributed to this release!
@kevin85421, @edoakes, @wingkitlee0, @alexeykudinkin, @chris-ray-zhang, @sophie0730, @zcin, @raulchen, @matthewdeng, @abrarsheikh, @popojk, @Jay-ju, @ruisearch42, @eicherseiji, @lk-chen, @justinvyu, @dayshah, @kouroshHakha, @NeilGirdhar, @omatthew98, @ishaan-mehta, @davidxia, @ArthurBook, @GeneDer, @srinathk10, @dependabot[bot], @JoshKarpel, @aslonnie, @khluu, @can-anyscale, @israbbani, @saihaj, @MortalHappiness, @alanwguo, @bveeramani, @iamjustinhsu, @Ziy1-Tan, @xingyu-long, @simonsays1980, @fscnick, @chuang0221, @sven1977, @jjyao
Ray-2.45.0
Ray Core
💫 Enhancements
- Make Object Store Fallback Directory configurable (#51189).
- [cgraph] Support
with_tensor_transport(transport='shm')
(#51872). - [cgraph] Support reduce scatter and all gather collective for GPU communicator in compiled graph (#50624).
🔨 Fixes
- Make sure
KillActor
RPC withforce_kill=True
can actually kill the threaded actor (#51414). - [Autoscaler] Do not remove idle nodes for upcoming placement groups (#51122).
- Threaded actors get stuck forever if they receive two exit signals (#51582).
- [cgraph] Fix illegal memory access of cgraph when used in PP (#51734).
- Avoid resubmitted actor tasks from hanging indefinitely (#51904).
- Fix interleaved placement group creation process due to node failure (#52202).
- Flush task events in
CoreWorker::Shutdown
instead ofCoreWorker::Disconnect
(#52374).
🏗 Architecture refactoring
- Split dashboard single process into multiple processes to improve stability and avoid interference between different heads (#51282, #51489, #51555, #51507, #51587, #51553, #51676, #51733, #51809, #51877, #51876, #51980, #52114).
Ray Libraries
Ray Data
🎉 New Features
- New ClickHouse sink via
Dataset.write_clickhouse()
(#50377) - Support
ray_remote_args_fn
inDataset.groupby().map_groups()
to set per-group runtime env and resource hints (#51236) - Expose
Dataset.name
/set_name
as a public API for easier lineage tracking (#51076) - Allow async callable classes in
Dataset.flat_map()
(#51180) - Introduce Ruleset abstraction for rule-based query optimisation (#51558)
- Add seamless conversion from Daft DataFrame to Ray Dataset (#51531)
- Improved support for line-delimited JSONL reading in
read_json()
(#52083) - Provide
Dataset.export_metadata()
for schema & stats snapshots (#52227)
💫 Enhancements
- Improved performance of sorting and sort-shuffle based operations (by more than 5x in benchmarks) (#51943)
- Metrics: number of map-actor workers alive / pending / restarting (#51082)
- Continuous memory-usage polling per map task (#51324)
- Auto-tune map-task memory based on output size (#51536)
- More informative back-pressure progress bar (#51697)
- Faster
RefBundle.get_cached_location()
lookup (#52097) - Speed-up for
PandasBlock.size_bytes()
(#52510) - Expanded
BlockColumnAccessor
utilities and ops (#51326, #51571)
🔨 Fixes
- Correct
MapTransformFn.__eq__
equality check (#51434) - Persist unresolved wildcard paths in
FileBasedDataSource
(#51424) - Repair Hugging Face dynamic-module loading on workers (#51488)
- Prevent HTTP URLs from being expanded by
_expand_paths
(#50178) - Fix Databricks host-URL parsing in Delta datasource (#49926)
- Restore reproducibility of
Dataset.random_sample()
(#51401) - Correct
RandomAccessDataset.multiget()
return values (#51421) - Ensure executor shutdown after schema fetch to avoid leaked actors (#52379)
- Repair streaming shutdown regression (#52509)
- Honour minimum resource reservation in
ResourceManager
(#52226)
📖 Documentation
- Clarified shuffle-section wording (#51289)
- Documented concurrency semantics in API reference (#51963)
- Updated Ray Data guides for the 2.45 release (#52082)
Ray Train
🎉 New Features
- Fold
v2.LightGBMTrainer
API into the public trainer class as an alternate constructor (#51265).
💫 Enhancements
- Use the user-defined function name as the training thread name (#52514).
- Upgrade LightGBM to version 4.6.0 (#52410).
- Adjust test size further for better results (#52283).
- Log errors raised by workers during training (#52223).
- Add worker group setup finished log to track progress (#52120).
- Change
test_telemetry
to medium size (#52178). - Improve dataset name observability for better tracking (#52059).
- Differentiate between train v1 and v2 export data for clarity (#51728).
- Include scheduling status detail to improve debugging (#51480).
- Move train library usage check to
Trainer
initialization (#50966).
🔨 Fixes
- Separate
OutputSplitter._locality_hints
fromactor_locality_enabled
andlocality_with_output
(#52005). - Fix print redirection to handle new lines correctly (#51542).
- Mark
RunAttempt
workers as dead after completion to avoid stale states (#51540). - Fix
setup_wandb
rank_zero_only
logic (#52381).
📖 Documentation
- Add links to the Train v2 migration guide in the Train API pages (#51924).
🏗 Architecture refactoring
- Replace AMD device environment variable with
HIP_VISIBLE_DEVICES
(#51104). - Remove unnecessary string literal splits (#47360).
Ray Tune
📖 Documentation
- Improve Tune documentation structure (#51684).
- Fix syntax errors in Ray Tune example
pbt_ppo_example.ipynb
(#51626).
Ray Serve
🎉 New Features
- Add request timeout sec for gRPC (#52276).
- [Serve.llm]
ray.llm
support custom accelerators (#51359).
💫 Enhancements
- Improve Serve deploy ignore behavior (#49336).
- [Serve.llm] Telemetry GPU type fallback to cluster hardware when unspecified (#52003).
🔨 Fixes
- Fix multiplex fallback logic during burst requests (#51389).
- Don't stop retrying replicas when a deployment is scaling back up from zero (#51600).
- Remove
RAY_SERVE_ENABLE_QUEUE_LENGTH_CACHE
flag (#51649). - Remove
RAY_SERVE_EAGERLY_START_REPLACEMENT_REPLICAS
flag (#51722). - Unify request cancellation errors (#51768).
- Catch timeout error when checking if proxy is dead (#52002).
- Suppress cancelled errors in proxy (#52423).
- [Serve.llm] Fix loading model from remote storage and add docs (#51617).
- [Serve.llm] Fix
ServeReplica
deployment failure for DeepSeek (#51989). - [Serve.llm] Check
GPUType
enum value rather than enum itself ([#52037]...
Ray-2.44.1
There is no difference between 2.44.1 and 2.44.0, though we needed a patch version for other out of band reasons. To fill the awkward blankness, here is a haiku about Ray:
Under screen-lit skies
A ray of bliss in each patch
Joy at any scale
Ray-2.44.0
Release Highlights
- This release features Ray Compiled Graph (beta). Ray Compiled Graph gives you a classic Ray Core-like API, but with (1) less than 50us system overhead for workloads that repeatedly execute the same task graph; and (2) native support for GPU-GPU communication via NCCL. Ray Compiled Graph APIs simplify high-performance multi-GPU workloads such as LLM inference and training. The beta release refines the API, enhances stability, and adds or improves features like visualization, profiling and experimental GPU compute/computation overlap. For more information, refer to Ray documentation: https://docs.ray.io/en/latest/ray-core/compiled-graph/ray-compiled-graph.html
- The experimental Ray Workflows library has been deprecated and will be removed in a future version of Ray. Ray Workflows has been marked experimental since its inception and hasn’t been maintained due to the Ray team focusing on other priorities. If you are using Ray Workflows, we recommend pinning your Ray version to 2.44.
Ray Libraries
Ray Data
🎉 New Features:
- Add Iceberg write support through pyiceberg (#50590)
- [LLM] Various feature enhancements to Ray Data LLM, including LoRA support #50804 and structured outputs #50901
💫 Enhancements:
- Add dataset/operator state, progress, total metrics (#50770)
- Make chunk combination threshold configurable (#51200)
- Store average memory use per task in OpRuntimeMetrics (#51126)
- Avoid unnecessary conversion to Numpy when creating Arrow/Pandas blocks (#51238)
- Append-mode API for preprocessors -- #50848, #50847, #50642, #50856, #50584. Note that vectorizers and hashers now output a single column instead 1 column per feature. In the near future, we will be graduating preprocessors to beta.
🔨 Fixes:
- Fixing Map Operators to avoid unconditionally overriding generator's back-pressure configuration (#50900)
- Fix filter expr equating negative numbers (#50932)
- Fix error message for
override_num_blocks
when reading from a HuggingFace Dataset (#50998) - Make num_blocks in repartition optional (#50997)
- Always pin the seed when doing file-based random shuffle (#50924)
- Fix
StandardScaler
to handleNaN
stats (#51281)
Ray Train
🎉 New Features:
💫 Enhancements:
- Folded v2.XGBoostTrainer API into the public trainer class as an alternate constructor (#50045)
- Created a default ScalingConfig if one is not provided to the trainer (#51093)
- Improved TrainingFailedError message (#51199)
- Utilize FailurePolicy factory (#51067)
🔨 Fixes:
- Fixed trainer import deserialization when captured within a Ray task (#50862)
- Fixed serialize import test for Python 3.12 (#50963)
- Fixed RunConfig deprecation message in Tune being emitted in trainer.fit usage (#51198)
📖 Documentation:
- [Train V2] Updated API references (#51222)
- [Train V2] Updated persistent storage guide (#51202)
- [Train V2] Updated user guides for metrics, checkpoints, results, and experiment tracking (#51204)
- [Train V2] Added updated Train + Tune user guide (#51048)
- [Train V2] Added updated fault tolerance user guide (#51083)
- Improved HF Transformers example (#50896)
- Improved Train DeepSpeed example (#50906)
- Use correct mean and standard deviation norm values in image tutorials (#50240)
🏗 Architecture refactoring:
- Deprecated Torch AMP wrapper utilities (#51066)
- Hid private functions of train context to avoid abuse (#50874)
- Removed ray storage dependency and deprecated RAY_STORAGE env var configuration option (#50872)
- Moved library usage tests out of core (#51161)
Ray Tune
📖 Documentation:
- Various improvements to Tune Pytorch CIFAR tutorial (#50316)
- Various improvements to the Ray Tune XGBoost tutorial (#50455)
- Various enhancements to Tune Keras example (#50581)
- Minor improvements to Hyperopt tutorial (#50697)
- Various improvements to LightGBM tutorial (#50704)
- Fixed non-runnable Optuna tutorial (#50404)
- Added documentation for Asynchronous HyperBand Example in Tune (#50708)
- Replaced reuse actors example with a fuller demonstration (#51234)
- Fixed broken PB2/RLlib example (#51219)
- Fixed typo and standardized equations across the two APIs (#51114)
- Improved PBT example (#50870)
- Removed broken links in documentation (#50995, #50996)
🏗 Architecture refactoring:
- Removed ray storage dependency and deprecated RAY_STORAGE env var configuration option (#50872)
- Moved library usage tests out of core (#51161)
Ray Serve
🎉 New Features:
💫 Enhancements:
- Clean up shutdown behavior of serve (#51009)
- Add
additional_log_standard_attrs
to serve logging config (#51144) - [LLM] remove
asyncache
andcachetools
from dependencies (#50806) - [LLM] remove
backoff
dependency (#50822) - [LLM] Remove
asyncio_timeout
fromray[llm]
deps on python<3.11 (#50815) - [LLM] Made JSON validator a singleton and
jsonref
packages lazy imported (#50821) - [LLM] Reuse
AutoscalingConfig
andDeploymentConfig
from Serve (#50871) - [LLM] Use
pyarrow
FS for cloud remote storage interaction (#50820) - [LLM] Add usage telemetry for
serve.llm
(#51221)
🔨 Fixes:
- Exclude redirects from request error count (#51130)
- [LLM] Fix the wrong
device_capability
issue in vllm on quantized models (#51007) - [LLM] add
gen-config
related data file to the package (#51347)
📖 Documentation:
- [LLM] Fix quickstart serve LLM docs (#50910)
- [LLM] update
build_openai_app
to include yaml example (#51283) - [LLM] remove old vllm+serve doc (#51311)
RLlib
💫 Enhancements:
- APPO/IMPALA accelerate:
- Unify namings for actor managers' outstanding in-flight requests metrics. (#51159)
- Add timers to env step, forward pass, and complete connector pipelines runs. (#51160)
🔨 Fixes:
📖 Documentation:
Ray Core and Ray Clusters
Ray Core
🎉 New Features:
- Enhanced
uv
support (#51233)
💫 Enhancements:
- Made infeasible task errors much more obvious (#45909)
- Log rotation for workers, runtime env agent, and dashboard agent (#50759, #50877, #50909)
- Support customizing gloo timeout (#50223)
- Support torch profiling in Compiled Graph (#51022)
- Change default tensor deserialization in Compiled Graph (#50778)
- Use current node id if no node is specified on ray drain-node (#51134)
🔨 Fixes:
- Fixed an issue where the raylet continued to have high CPU overhead after a job was terminated ([...