19 Sep 18:10

ray-2.49.2

479fa71

Ray-2.49.2 Latest

Latest

There is no difference between 2.49.2 and 2.49.1, though we needed a patch version for other out of band reasons. To fill the awkward blankness, here is a haiku about Ray:

Summit drawing near
Ray advances, step by step
Scaling without end

Assets 2

03 Sep 00:44

aslonnie

ray-2.49.1

c057f1e

Ray-2.49.1

Ray Dashboard: Fix issue where GPU metrics are missing (#56006)
Ray Data: Fixed regression in handling very large schemas (#56058)

Assets 2

26 Aug 19:52

elliot-barn

ray-2.49.0

66438d8

Ray-2.49.0

Release Highlights

Ray Data:

We’ve implemented a variety of performance enhancements, including improved actor/node autoscaling with budget-aware decisions; faster/more accurate shuffle accounting; reduced Parquet metadata footprint; and out-of-order execution for higher throughput.
We’ve also implemented anti/semi joins, stratified train_test_split, and added Snowflake connectors.

Ray Core:

Performance/robustness cleanups around GCS publish path and raylet internals; simpler OpenTelemetry flagging; new user-facing API to wait for GPU tensor free; plus assorted test/infra tidy-ups

Ray Train:

We’ve introduced a new JaxTrainer with SPMD support for TPUs.

Ray Serve:

Custom Autoscaling per Deployment Serve now supports user-defined autoscaling policies via AutoscalingContext and AutoscalingPolicy, enabling fine-grained scaling logic at the deployment level. This is part of a large effort where we are adding support for autoscaling based on custom metrics in Serve, see this RFC for more details.
Async Inference (Initial Support): Ray Serve introduces asynchronous inference execution, laying the foundation for better throughput and latency in async workloads. Please see this RFC for more details.
Major Performance Gains: This version of ray serve brings double digit % performance improvements both in throughput and latency. See release notes for more details.

Ray Serve/Data LLM:

We’ve refactored Ray Serve LLM to be fully compatible with the default vllm serve and also now supports vLLM=0.10.
We’ve added a prefix cache-aware router with PrefixCacheAffinityRouter for optimized cache utilization; dynamic cache management via reset prefix cache remote methods; enhanced LMCacheConnectorV1 with kv_transfer_config support.

Ray Libraries

Ray Data

🎉 New Features:

Wrapped batch indices in a BatchMetadata object to make per-batch metadata explicit. (#55643)
Added support for Anti/Semi Join types. (#55272)
Introduced an Issue Detection Framework. (#55155)
Added an option to enable out-of-order execution for better performance. (#54504)
Introduced a StreamingSplit logical operator for DAG rewrite. (#54994)
Added a stratify parameter to train_test_split. (#54624)
Added Snowflake connectors. (#51429)
Updated Hudi integration to support incremental query. (#54301)
Added an Actor location tracker. (#54590)
Added BundleQueue.has_next. (#54710)
Made DEFAULT_OBJECT_STORE_MEMORY_LIMIT_FRACTION configurable. (#54873)
Added Expression support & a with_columns API. (#54322)
Allocate GPU resources in ResourceManager. (#54445)

💫 Enhancements:

Decoupled actor and node autoscaling; autoscaling now also considers budget. (#55673, #54902)
Faster hash-shuffle resource usage calculation; more accurate shuffle progress totals. (#55503, #55543)
Reduced Parquet metadata storage usage. (#54821)
Export API improvements: refresh dataset/operator state, sanitize metadata, and truncate exported metadata. (#55355, #55379, #55216, #54623)
Metrics & observability: task metric improvements, external-buffer block-count metric, row-based metrics, clearer operator names in logs, single debug log when aggregators are ready. (#55429, #55022, #54693, #52949, #54483)
Dashboard: added “Max Bytes to Read” panel/budget, panels for blocks-per-task and bytes-per-block, and streaming executor duration. (#55024, #55020, #54614)
Planner/execution & infra cleanups: ExecutionResources and StatsManager cleanup, planner interface refactor, node trackers init, removed ray.get in _MapWorker ctor, removed target_shuffle_max_block_size. (#54694, #55400, #55018, #54665, #54734, #55158)
Behavior/interop tweaks: map_batches defaults to row_modification=False and avoids pushing past limit; limited operator pushdown; prefetch for PandasJSONDatasource; use cloudpickle for Arrow tensor extension ser/des; bumped Arrow to 21.0; schema warning tone change. (#54992, #54457, #54667, #54831, #55426, #54630)
Removed randomize-blocks reorder rule for more stable behavior. (#55278)

🔨 Fixes:

AutoscalingActorPool now properly downscales after execution. (#55565)
StatsManager handles StatsActor loss on disconnect. (#55163)
Handle missing chunks key when Databricks UC query returns zero rows. (#54526)
Handle empty fragments in sampling when num_row_groups=0. (#54822)
Restored handling of PyExtensionType to keep compatibility with previously written datasets. (#55498)
Prevent negative resource budget when concurrency exceeds the global limit; fixed resource-manager log calculation. (#54986, #54878)
Default write_parquet warning removed; handled unhashable types in OneHotEncoding. (#54864, #54863)
Overwrite mode now maps to the correct Arrow behavior for parallel writes. (#55118)
Added back from_daft Arrow-version checks. (#54907)
Pandas chained in-place assignment warning resolved. (#54486)
Test stability/infra: fixed flaky tests, adjusted bounds and sizes, added additional release tests/chaos variants for image workloads, increased join test size, adjusted sorting release test to produce 1 GB blocks. (#55485, #55489, #54806, #55120, #54716, #55402, #54971)

📖 Documentation:

Added a user guide for aggregations. (#53568)
Added a code snippet in docs for partitioned writes. (#55002)
Updated links to Lance documentation. (#54836)

Ray Train

🎉 New Features:

Introduced JaxTrainer with SPMD support on TPUs (#55207)

💫 Enhancements:

ray.train.get_dataset_shard now lazily configures dataset sharding for better startup behavior (#55230)
Clearer worker error logging (#55222)
Fail fast when placement group requirements can never be satisfied (#54402)
New ControllerError surfaced and handled via failure policy for improved resiliency (#54801, #54833)
TrainStateActor periodically checks controller health and aborts when necessary (#53818)

🔨 Fixes:

Resolve circular import in ray.train.v2.lightning.lightning_utils (#55668)
Fix XGBoost v2 callback behavior (#54787)
Suppress a spurious type error (#50994)
Reduce test flakiness: remove randomness and bump a data-integration test size (#55315, #55633)

📖 Documentation:

New LightGBMTrainer user guide (#54492)
Fix code-snippet syntax highlighting (#54909)
Minor correction in experiment-tracking guide comment (#54605)

🏗 Architecture refactoring:

Public Train APIs routed through TrainFnUtils for consistency (#55226)
LoggingManager utility for Train logging (#55121)
Convert DEFAULT variables from strings to bools (#55581)

Ray Tune

🎉 New Features:

Add video FPS support to WandbLoggerCallback (#53638)

💫 Enhancements:

Typing: reset_config now explicitly returns bool (#54581)
CheckpointManager supports recording scoring metric only (#54642)

🔨 Fixes:

Fix XGBoost v2 callback integration (#54787)
Correct type for RunConfig.progress_reporter (#48439)

📖 Documentation:

Minor fixes (#55125, #54942)

Ray Serve

🎉 New Features:

Async inference support in Ray Serve (initial phase). Provides basic asynchronous inference execution, with follow-up work planned for failed/unprocessed queues and additional tests. #54824
Per-deployment custom autoscaling controls. Introduces AutoscalingContext and AutoscalingPolicy classes, enabling user-defined autoscaling strategies at the deployment level. #55253
Same event loop router. Adds option to run the Serve router in the same event loop as the proxy, yielding ~17% throughput improvement. #55030

💫 Enhancements:

Async get_current_servable_instance(). Converts the FastAPI dependency to async def, removing threadpool overhead and boosting performance: 35% higher RPS and reduced latency. #55457
Access log optimization. Cached contexts in request path logging improved request throughput by ~16% with lower average latency. #55166
Batching improvements. Default batch wait timeout increased from 0.0s to 0.01s (10ms) to enable meaningful batching. #55126
HTTP receive refactor. Cleaned up handling of replica-side HTTP receive tasks. #54543 / #54565
Configurable replica router backoff. Added knobs for retry/backoff control when routing to replicas. #54723
Autoscaling ergonomics. Marked per-deployment autoscaling metrics push interval config as deprecated for consistency. #55102
Health check & env var safety. Introduced warnings for invalid/zero/negative environment variable values, with migration path planned for Ray 2.50.0. #55464, #54944
Improved CLI UX. serve config now prints No configuration was found. instead of an empty string. #54767

🔨 Fixes:

Removed brittle ray._private dependency usage. #55659
HTTP route test fixes. Migrated to get_application_url() to avoid hardcoded URLs, reducing flakiness on Windows. #55623, #54974, #54924, #54911,[ #54704](#5470...

Contributors

rclough, alexeykudinkin, and 113 other contributors

Assets 2

18 Jul 22:27

khluu

ray-2.48.0

0349122

Ray-2.48.0

Release Highlights

Ray Data: This release features a new Delta Lake and Unity Catalog integration and performance improvements to various reading/writing operators.
Ray Core: Enhanced GPU object support with intra-process communication and improved Autoscaler v2 functionality
Ray Train: Improved hardware metrics integration with Grafana and enhanced collective operations support
Ray Serve LLM: This release features early proof of concept for prefill-decode disaggregation deployment and LLM-aware request routing such as prefix-cache aware routing.
Ray Data LLM: Improved throughput and CPU memory utilization for ray data workers.

Ray Libraries

Ray Data

🎉 New Features:

Add reading from Delta Lake tables and Unity Catalog integration (#53701)
Enhanced pin_memory support in iter_torch_batches (#53792)
Add pin_memory to iter_torch_batches (#53792)

💫 Enhancements:

Re-enabled sorting in Ray Data tests with performance improvements (#54475)
Enhanced handling of mismatched columns and pandas.NA values (#53861, #53859)
Improved read_text trailing newline semantics (#53860)
Optimized backpressure handling with policy-based resource management (#54376)
Enhanced write_parquet with support for both partition_by and row limits (#53930)
Prevent filename collisions on write operations (#53890)
Improved execution performance for One Hot encoding in preprocessors (#54022)

🔨 Fixes:

Fixing map_groups issues (#54462)
Prevented Op fusion for streaming repartition to avoid performance degradation (#54469)
Fixed ActorPool autoscaler scaling up logic (#53983)
Resolved empty dataset repartitioning issues (#54107)
Fixed PyArrow overflow handling in data processing (#53971, #54390)
Fixed IcebergDatasink to properly generate individual file uuids (#52956)
Avoid OOMs with read_json(..., lines=True) (#54436)
Handle HuggingFace parquet dataset resolve URLs (#54146)
Fixed BlockMetadata derivation for Read operator (#53908)

📖 Documentation:

Updated AggregateFnV2 documentation to clarify finalize method (#53835)
Improved preprocessor and vectorizer API documentation

Ray Train

🎉 New Features:

Added broadcast_from_rank_zero and barrier collective operations (#54066)
Enhanced hardware metrics integration with Grafana dashboards (#53218)
Added support for dynamically loading callbacks via environment variables (#54233)

💫 Enhancements:

Improved checkpoint population from before_init_train_context (#54453)
Enhanced controller state logging and metrics (#52805)
Added structured logging environment variable support (#52952)
Improved handling of Noop scaling decisions for smoother scaling logic (#53180)
Logging of controller state transitions to aid in debugging and analysis (#53344)

🔨 Fixes:

Fixed GPU tensor reporting in ray.train.report (#53725)
Enhanced move_tensors_to_device utility for complex tensor structures (#53109)
Improved worker health check error handling with trace information (#53626)
Fixed GPU transfer support for non-contiguous tensors (#52548)
Force abort on SIGINT spam and do not abort finished runs (#54188)

📖 Documentation:

Updated beginner PyTorch example (#54124)
Added documentation for ray.train.collective APIs (#54340)
Added a note about PyTorch DataLoader's multiprocessing and forkserver usage (#52924)
Fixed various docstring format and indentation issues (#52855, #52878)
Added note that ray.train.report API docs should mention optional checkpoint_dir_name (#54391)

🏗 Architecture refactoring:

Removed subclass relationship between RunConfig and RunConfigV1 (#54293)
Enhanced error handling for finished training runs (#54188)
Deduplicated ML doctest runners in CI for efficiency (#53157)
Converted isort configuration to Ruff for consistency (#52869)

Ray Tune

💫 Enhancements:

Updated test_train_v2_integration to use the correct RunConfig (#52882)

🔨 Fixes:

Fixed RayTaskError serialization logic (#54396)
Improved experiment restore timeout handling (#53387)

📖 Documentation:

Replaced session.report with tune.report and corrected import paths (#52801)
Removed outdated graphics cards reference in docs (#52922)
Fixed various docstring format issues (#52879)

Ray Serve

🎉 New Features:

Added RouterConfig field to DeploymentConfig for custom RequestRouter configuration (#53870)
Added support for implementing custom request routing algorithms (#53251)

💫 Enhancements:

Enhanced FastAPI ingress deployment validation for multiple deployments (#53647)
Optimized get_live_deployments performance (#54454)
Progress towards making ray.serve.llm compatible with vLLM serve frontend (#54481, #54443, #54440)

🔨 Fixes:

Fixed deployment scheduler issues with component scheduling (#54479)
Fixed runtime_env validation for py_modules (#53186)
Added descriptive error message when deployment name is not found (#45181)

📖 Documentation:

Added troubleshooting guide for DeepSeek/multi-node GPU deployment on KubeRay (#54229)
Updated the guide on serving models with Triton Server in Ray Serve
Added documentation for custom request routing algorithms
Added custom request router docs (#53511)

🏗 Architecture refactoring:

Remove indirection layers of node initialization (#54481)
Incremental refactor of LLMEngine (#54443)
Remove random v0 logic from serve endpoints (#54440)
Remove usage of internal_api.memory_summary() (#54417)
Remove usage of ray._private.state (#54140)

Ray Serve/Data LLM

🎉 New Features

Support separate deployment config for PDProxy in PrefixAwareReplicaSet (#53935)
Support for prefix-aware request router (#52725)

💫 Enhancements

Log engine stats after each batch task is done. (#54360)
Decouple max_tasks_in_flight from max_concurrent_batches (#54362)
Make llm serve endpoints compatible with vLLM serve frontend, including streaming, tool_code, and health check support (#54440)
Remove botocore dependency in Ray Serve LLM (#54156)
Update vLLM version to 0.9.2 (#54407)

🔨 Fixes

Fix health check in prefill disagg (#53937)
Fix doc to only support int concurrency (#54196)
Fix vLLM batch test by changing to Pixtral (#53744)
Fix pickle error with remote code models in vLLM Ray workloads (#53868)
Adaption of the change of vllm.PoolingOutput (#54467)

📖 Documentation

Ray serve/lora doc fix (#53553)
Add Ray serve/LLM doc (#52832)
Add a doc snippet to inform users about existing diffs between vLLM and Ray Serve LLM behavior in some APIs like streaming, tool_code, and health check (#54123)
Troubleshooting DeepSeek/multi-node GPU deployment on KubeRay (#54229)

🏗 Architecture refactoring

Make llm serve endpoints compatible with vLLM serve frontend, including streaming, tool_code, and health check support (#54490)
Prefix-aware scheduler [2/N] Configure PrefixAwareReplicaSet to correctly handle the number of available GPUs for each worker and to ensure efficient GPU utilization in vLLM (#53192)
Organize spread out utils.py (#53722)
Remove ImageRetriever class and related tests from the LLM serving codebase. (#54018)
Return a batch of rows in the udf instead of row by row (#54329)

RLlib

🎉 New Features:

Implemented Offline Policy Evaluation (OPE) via Importance Sampling (#53702)
Enhanced ConnectorV2 ObservationPreprocessor APIs with multi-agent support (#54209)
Add GPU inference to offline evaluation (#52718)

💫 Enhancements:

Enhanced MetricsLogger to handle tensors in state management (#53514)
Improved env seeding in EnvRunners with deterministic training example rewrite (#54039)
Cleanup of meta learning classes and examples (#52680)

🔨 Fixes:

Fixed EnvRunner restoration when no local EnvRunner is available (#54091)
Fixed shapes in explained_variance for recurrent policies (#54005)
Resolved device check issues in Learner implementation (#53706)
Enhanced numerical stability in MeanStdFilter (#53484)
Fixed weight synching in offline evaluation (#52757)
Fixed bug in split_and_zero_pad utility function (#52818)

📖 Documentation:

Do-over of examples for connector pipelines (#52604)
Remove "new API stack" banner from all RLlib docs pages as it's now the default (#54282)

Ray Core

🎉 New Features:

Enhanced GPU object support with intra-process communication (#53798)
Integrated single-controller collective APIs with GPU objects (#53720)
Added support for ray.get on driver process for GPU objects (#53902)
Supporting allreduce on list of input nodes in compiled graphs (#51047)
Add single-controller API for ray.util.collective and torch gloo backend (#53319)

💫 Enhancements:

Improved autoscaler v2 functionality with cloud instance ID reusing (#54397)
Enhanced cluster task manager with better resource management (#54413)
Upgraded OpenTelemetry SDK for better observability (#53745)
Improved actor scheduling to prevent deadlocks in ordered actors (#54034)
Enhanced get_max_resources_from_cluster_config functionality (#54455)
Use std::move in cluster task manager constructor (#54413)
Improve status messages and add comments about stale seq_no handling (#54470)
uv run integration is now enabled by default (#53060)

🔨 Fixes:

Fixed race conditions in object eviction and repinning for recovery (#53934)
Resolved GCS crash issues on duplicate MarkJobFinished RPCs (#53951)
Enhanced actor restart handling on node failures (#54088)
Improved reference counting during worker graceful shutdown (#53002)
Fix race condition when canceling task that hasn't started yet (#52703)
Fix the issue where a valid RestartActor rpc is ignored (#53330)
Fixed "Check failed: it->second.num_retries_left == -1" error (#54116)
Fix detached actor bei...

Contributors

pcmoritz, robertnishihara, and 97 other contributors

Assets 2

18 Jun 17:53

khluu

ray-2.47.1

61d3f2f

Ray-2.47.1

Ray 2.47.1 fixed an issue where Ray failed to start on Mac (#53807)

Assets 2

12 Jun 22:28

aslonnie

ray-2.47.0

6f4c0c0

Ray-2.47.0

Release Highlights

Prefill disaggregation is now supported in initial support in Ray Serve LLM (#53092). This is critical for production LLM serving use cases.
Ray Data features a variety of performance improvements (locality-based scheduling, non-blocking execution) as well as improvements to observability, preprocessors, and other stability fixes.
Ray Serve now features custom request routing algorithms, which is critical for high throughput traffic for large model use cases.

Ray Libraries

Ray Data

🎉 New Features:

Add save modes support to file data sinks (#52900)
Added flattening capability to the Concatenator preprocessor to support output vectorization use cases (#53378)

💫 Enhancements:

Re-enable Actor locality-based scheduling. This PR also improves algorithms for ranking the locations for the bundle. (#52861)
Disable blocking pipeline by default until Actor Pool fully scales up to min actors (#52754)
Progress bar and dashboard improvements to show name of partial functions properly(#52280)

🔨 Fixes:

Make Ray Data from_torch respect Dataset len (#52804)
Fixing flaky aggregation test (#53383)
Fix race condition bug in fault tolerance by disabling on_exit hook (#53249)
Fix move_tensors_to_device utility for the list/tuple[tensor] case (#53109)
Fix ActorPool scaling to avoid scaling down when the input queue is empty (#53009)
Fix internal queues accounting for all Operators w/ an internal queue (#52806)
Fix backpressure for FileBasedDatasource. This fixes potential OOMs for workloads using FileBasedDatasources (#52852)

📖 Documentation:

Fix working code snippets (#52748)
Improve AggregateFnV2 docstrings and examples (#52911)
Improved documentation for vectorizers and API visibility in Data (#52456)

Ray Train

🎉 New Features:

Added support for configuring Ray Train worker actor runtime environments. (#52421)
Included Grafana panel data in Ray Train export for improved monitoring. (#53072)
Introduced a structured logging environment variable to standardize log formats. (#52952)
Added metrics for TrainControllerState to enhance observability. (#52805)

💫 Enhancements:

Logging of controller state transitions to aid in debugging and analysis. (#53344)
Improved handling of Noop scaling decisions for smoother scaling logic. (#53180)

🔨 Fixes:

Improved move_tensors_to_device utility to correctly handle list / tuple of tensors. (#53109)
Fixed GPU transfer support for non-contiguous tensors. (#52548)
Increased timeout in test_torch_device_manager to reduce flakiness. (#52917)

📖 Documentation:

Added a note about PyTorch DataLoader’s multiprocessing and forkserver usage. (#52924)
Fixed various docstring format and indentation issues. (#52855, #52878)
Removed unused "configuration-overview" documentation page. (#52912)
General typo corrections. (#53048)

🏗 Architecture refactoring:

Deduplicated ML doctest runners in CI for efficiency. (#53157)
Converted isort configuration to Ruff for consistency. (#52869)
Removed unused PARALLEL_CI blocks and combined imports. (#53087, #52742)

Ray Tune

💫 Enhancements:

Updated test_train_v2_integration to use the correct RunConfig. (#52882)

📖 Documentation:

Replaced session.report with tune.report and corrected import paths. (#52801)
Removed outdated graphics cards reference in docs. (#52922)
Fixed various docstring format issues. (#52879)

Ray Serve

🎉 New Features:

Added support for implementing custom request routing algorithms. (#53251)
Introduced an environment variable to prioritize custom resources during deployment scheduling. (#51978)

💫 Enhancements:

The ingress API now accepts a builder function in addition to an ASGI app object. (#52892)

🔨 Fixes:

Fixed runtime_env validation for py_modules. (#53186)
Disallowed special characters in Serve deployment and application names. (#52702)
Added a descriptive error message when a deployment name is not found. (#45181)

📖 Documentation:

Updated the guide on serving models with Triton Server in Ray Serve.
Added documentation for custom request routing algorithms.

Ray Serve/Data LLM

🎉 New Features:

Added initial support for prefill decode disaggregation (#53092)
Expose vLLM Metrics to serve.llm API (#52719)
Embedding API (#52229)

💫 Enhancements:

Allow setting name_prefix in build_llm_deployment (#53316)
Minor bug fix for 53144: stop tokens cannot be null (#53288)
Add missing repetition_penalty vLLM sampling parameter (#53222)
Mitigate the serve.llm streaming overhead by properly batching stream chunks (#52766)
Fix test_batch_vllm leaking resources by using larger wait_for_min_actors_s

🔨 Fixes:

LLMRouter.check_health() should check LLMServer.check_health() (#53358)
Fix runtime passthrough and auto-executor class selection (#53253)
Update check_health return type (#53114)
Bug fix for duplication of <bos> token (#52853)
In stream batching, first part of the stream was always consumed and not streamed back from the router (#52848)

RLlib

🎉 New Features:

Add GPU inference to offline evaluation. (#52718)

💫 Enhancements:

Do-over of examples for connector pipelines. (#52604)
Cleanup of meta learning classes and examples. (#52680)

🔨 Fixes:

Fixed weight synching in offline evaluation. (#52757)
Fixed bug in split_and_zero_pad utility function (related to complex structures vs simple values or np.arrays). (#52818)

Ray Core

💫 Enhancements:

uv run integration is now enabled by default, so you don't need to set the RAY_RUNTIME_ENV_HOOK any more (#53060). If you rely on the previous behavior where uv run only runs the Ray driver but not the workers in the uv environment, you can switch back to the old behavior by setting the RAY_ENABLE_UV_RUN_RUNTIME_ENV=0 environment variable.
Record gcs process metrics (#53171)

🔨 Fixes:

Improvements for using RuntimeEnv in the Job Submission API. (#52704)
Close unused pipe file descriptor of child processes of Raylet (#52700)
Fix race condition when canceling task that hasn't started yet (#52703)
Implement a thread pool and call the CPython API on all threads within the same concurrency group (#52575)
cgraph: Fix execution schedules with collective operations (#53007)
cgraph: Fix scalar tensor serialization edge case with serialize_to_numpy_or_scalar (#53160)
Fix the issue where a valid RestartActor rpc is ignored (#53330)
Fix reference counter crashes during worker graceful shutdown (#53002)

Dashboard

🎉 New Features:

train: Add dynolog for on-demand GPU profiling for Torch training (#53191)

💫 Enhancements:

Add configurability of 'orgId' param for requesting Grafana dashboards (#53236)

🔨 Fixes:

Fix Grafana dashboards dropdowns for data and train dashboard (#52752)
Fix dashboard for daylight savings (#52755)

Ray Container Images

💫 Enhancements:

Upgrade h11 (#53361), requests, starlette, jinja2 (#52951), pyopenssl and cryptography (#52941)
Generate multi-arch image indexes (#52816)

Docs

🎉 New Features:

End-to-end example: Entity recognition with LLMs (#52342) - new end-to-end example
End-to-end example: xgboost tutorial (#52383)
End-to-end tutorial for audio transcription and LLM as judge curation (#53189)

💫 Enhancements:

Adds pydoclint to pre-commit (#52974)

Thanks!

Thank you to everyone who contributed to this release!

@NeilGirdhar, @ok-scale, @JiangJiaWei1103, @brandonscript, @eicherseiji, @ktyxx, @MichalPitr, @GeneDer, @rueian, @khluu, @bveeramani, @ArturNiederfahrenhorst, @c8ef, @lk-chen, @alanwguo, @simonsays1980, @codope, @ArthurBook, @kouroshHakha, @Yicheng-Lu-llll, @jujipotle, @aslonnie, @justinvyu, @machichima, @pcmoritz, @saihaj, @wingkitlee0, @omatthew98, @can-anyscale, @nadongjun, @chris-ray-zhang, @dizer-ti, @matthewdeng, @ryanaoleary, @janimo, @crypdick, @srinathk10, @cszhu, @TimothySeah, @iamjustinhsu, @mimiliaogo, @angelinalg, @gvspraveen, @kevin85421, @jjyao, @elliot-barn, @xingyu-long, @LeoLiao123, @thomasdesr, @ishaan-mehta, @noemotiovon, @hipudding, @davidxia, @omahs, @MengjinYan, @dengwxn, @MortalHappiness, @alhparsa, @emmanuel-ferdman, @alexeykudinkin, @KunWuLuan, @dev-goyal, @sven1977, @akyang-anyscale, @GokuMohandas, @raulchen, @abrarsheikh, @edoakes, @JoshKarpel, @bhmiller, @seanlaii, @ruisearch42, @dayshah, @Bye-legumes, @petern48, @richardliaw, @rclough, @israbbani, @jiwq

Contributors

janimo, pcmoritz, and 77 other contributors

Assets 2

07 May 21:12

dayshah

ray-2.46.0

c3dd2ca

Ray-2.46.0

Release Highlights

The 2.46 Ray release comes with a couple core highlights:

Ray Data now supports hash shuffling for repartition and aggregations, along with support for joins. This enables many new data processing workloads to be run on Ray Data. Please give it a try and let us know if you have any feedback!
Ray Serve LLM now supports vLLM v1 to be forward-compatible with upcoming vLLM releases. This also opens up significant performance improvements that come with vLLM's v1 refactor.
There is a new Train Grafana dashboard which provides in-depth metrics on Grafana for better metrics on training workloads.

Ray Libraries

Ray Data

🎉 New Features:

Adding support for hash-shuffle based repartitioning and aggregations (#52664)
Added support for Joins (using hash-shuffle) (#52728)
[LLM] vLLM support upgrades to 0.8.5 (#52344)

💫 Enhancements:

Add memory attribute to ExecutionResources (#51127)
Support ray_remote_args for read_tfrecords #52450
[data.dashboard] Skip reporting internal metrics (#52666)
Add PhysicalOperator.min_max_resource_usage_bounds (#52502)
Speed up printing the schema (#52612)
[data.dashboard] Dataset logger for worker (#52706)
Support new pyiceberg version (#51744)
Support num_cpus, memory, concurrency, batch_size for preprocess (#52574)

🔨 Fixes:

Handle Arrow Array null types in to_numpy (#52572)
Fix S3 serialization wrapper compatibility with RetryingPyFileSystem (#52568)
Fixing Optimizer to apply rules until plan stabilize; (#52663)
Fixing FuseOperators rule to properly handle the case of transformations drastically changing size of the dataset (#52570)

📖 Documentation:

[LLM] Improve concurrency settings, improve prompt to achieve better throughput (#52634)

Ray Train

🎉 New Features:

Add initial Train Grafana dashboard (#52709)

💫 Enhancements:

Lazily import torch FSDP for ray.train.torch module to improve performance and reduce unnecessary dependencies (#52707)
Deserialize the user-defined training function directly on workers, improving efficiency (#52684)

🔨 Fixes:

Fixed error when no arguments are passed into TorchTrainer (#52693)

📖 Documentation:

Added new XGBoostTrainer user guide (#52355)

🏗 Architecture refactoring:

Re-enabled isort for python/ray/train to maintain code formatting consistency (#52717)

Ray Tune

📖 Documentation:

Fixed typo in Ray Tune PyTorch Lightning docs (#52756)

Ray Serve

💫 Enhancements:

[LLM] Refactor LLMServer and LLMEngine to not diverge too much from vllm chat formatting logic (#52597)
Bump vllm from 0.8.2 to 0.8.5 in /python (#52344)
[LLM] Add router replicas and batch size to llm config (#52655)

🔨 Fixes:

Request cancellation not propagating correctly across deployments (#52591)
BackpressureError not properly propagated in FastAPI ingress deployments (#52397)
Hanging issue when awaiting deployment responses (#52561)
[Serve.llm] made Ray Serve LLM compatible with vLLM v1 (#52668)

📖 Documentation:

[Serve][LLM] Add doc for deploying DeepSeek (#52592)

RLLib

🎉 New Features:

Offline Evaluation with loss function for Offline RL pipeline. Introduces three new callbacks, on_offline_evaluate_start, on_offline_evaluate_end, on_offline_eval_runners_recreated (#52308)

💫 Enhancements:

New custom_data attribute for SingleAgentEpisode and MultiAgentEpisode to store custom metrics. Deprecates add|get_temporary_timestep_data() (#52603)

Ray Core

💫 Enhancements:

Only get serialization context once for all .remote args (#52690)
Add grpc server success and fail count metric (#52711)

🔨 Fixes:

Fix open leak for plasma store memory (shm/fallback) by workers (#52622)
Assure closing of unused pipe for dashboard subprocesses (#52678)
Expand protection against dead processes in reporter agent (#52657)
[cgraph] Separate metadata and data in cross-node shared memory transport (#52619)
Fix JobID check for detached actor tasks (#52405)
Fix potential log loss of tail_job_logs (#44709)

🏗 Architecture refactoring:

Cancel tasks when an owner dies instead of checking if an owner is dead during scheduling (#52516)
Unify GcsAioClient and GcsClient (#52735)
Remove worker context dependency from the task receiver (#52740)

Dashboard

🎉 New Features:

Ray Train Grafana Dashboard added with a few built-in metrics. More to come.

Thanks!

Thank you to everyone who contributed to this release!
@kevin85421, @edoakes, @wingkitlee0, @alexeykudinkin, @chris-ray-zhang, @sophie0730, @zcin, @raulchen, @matthewdeng, @abrarsheikh, @popojk, @Jay-ju, @ruisearch42, @eicherseiji, @lk-chen, @justinvyu, @dayshah, @kouroshHakha, @NeilGirdhar, @omatthew98, @ishaan-mehta, @davidxia, @ArthurBook, @GeneDer, @srinathk10, @dependabot[bot], @JoshKarpel, @aslonnie, @khluu, @can-anyscale, @israbbani, @saihaj, @MortalHappiness, @alanwguo, @bveeramani, @iamjustinhsu, @Ziy1-Tan, @xingyu-long, @simonsays1980, @fscnick, @chuang0221, @sven1977, @jjyao

Contributors

alexeykudinkin, davidxia, and 41 other contributors

Assets 2

29 Apr 21:21

khluu

ray-2.45.0

4883bd5

Ray-2.45.0

Ray Core

💫 Enhancements

Make Object Store Fallback Directory configurable (#51189).
[cgraph] Support with_tensor_transport(transport='shm') (#51872).
[cgraph] Support reduce scatter and all gather collective for GPU communicator in compiled graph (#50624).

🔨 Fixes

Make sure KillActor RPC with force_kill=True can actually kill the threaded actor (#51414).
[Autoscaler] Do not remove idle nodes for upcoming placement groups (#51122).
Threaded actors get stuck forever if they receive two exit signals (#51582).
[cgraph] Fix illegal memory access of cgraph when used in PP (#51734).
Avoid resubmitted actor tasks from hanging indefinitely (#51904).
Fix interleaved placement group creation process due to node failure (#52202).
Flush task events in CoreWorker::Shutdown instead of CoreWorker::Disconnect (#52374).

🏗 Architecture refactoring

Split dashboard single process into multiple processes to improve stability and avoid interference between different heads (#51282, #51489, #51555, #51507, #51587, #51553, #51676, #51733, #51809, #51877, #51876, #51980, #52114).

Ray Libraries

Ray Data

🎉 New Features

New ClickHouse sink via Dataset.write_clickhouse() (#50377)
Support ray_remote_args_fn in Dataset.groupby().map_groups() to set per-group runtime env and resource hints (#51236)
Expose Dataset.name / set_name as a public API for easier lineage tracking (#51076)
Allow async callable classes in Dataset.flat_map() (#51180)
Introduce Ruleset abstraction for rule-based query optimisation (#51558)
Add seamless conversion from Daft DataFrame to Ray Dataset (#51531)
Improved support for line-delimited JSONL reading in read_json() (#52083)
Provide Dataset.export_metadata() for schema & stats snapshots (#52227)

💫 Enhancements

Improved performance of sorting and sort-shuffle based operations (by more than 5x in benchmarks) (#51943)
Metrics: number of map-actor workers alive / pending / restarting (#51082)
Continuous memory-usage polling per map task (#51324)
Auto-tune map-task memory based on output size (#51536)
More informative back-pressure progress bar (#51697)
Faster RefBundle.get_cached_location() lookup (#52097)
Speed-up for PandasBlock.size_bytes() (#52510)
Expanded BlockColumnAccessor utilities and ops (#51326, #51571)

🔨 Fixes

Correct MapTransformFn.__eq__ equality check (#51434)
Persist unresolved wildcard paths in FileBasedDataSource (#51424)
Repair Hugging Face dynamic-module loading on workers (#51488)
Prevent HTTP URLs from being expanded by _expand_paths (#50178)
Fix Databricks host-URL parsing in Delta datasource (#49926)
Restore reproducibility of Dataset.random_sample() (#51401)
Correct RandomAccessDataset.multiget() return values (#51421)
Ensure executor shutdown after schema fetch to avoid leaked actors (#52379)
Repair streaming shutdown regression (#52509)
Honour minimum resource reservation in ResourceManager (#52226)

📖 Documentation

Clarified shuffle-section wording (#51289)
Documented concurrency semantics in API reference (#51963)
Updated Ray Data guides for the 2.45 release (#52082)

Ray Train

🎉 New Features

Fold v2.LightGBMTrainer API into the public trainer class as an alternate constructor (#51265).

💫 Enhancements

Use the user-defined function name as the training thread name (#52514).
Upgrade LightGBM to version 4.6.0 (#52410).
Adjust test size further for better results (#52283).
Log errors raised by workers during training (#52223).
Add worker group setup finished log to track progress (#52120).
Change test_telemetry to medium size (#52178).
Improve dataset name observability for better tracking (#52059).
Differentiate between train v1 and v2 export data for clarity (#51728).
Include scheduling status detail to improve debugging (#51480).
Move train library usage check to Trainer initialization (#50966).

🔨 Fixes

Separate OutputSplitter._locality_hints from actor_locality_enabled and locality_with_output (#52005).
Fix print redirection to handle new lines correctly (#51542).
Mark RunAttempt workers as dead after completion to avoid stale states (#51540).
Fix setup_wandb rank_zero_only logic (#52381).

📖 Documentation

Add links to the Train v2 migration guide in the Train API pages (#51924).

🏗 Architecture refactoring

Replace AMD device environment variable with HIP_VISIBLE_DEVICES (#51104).
Remove unnecessary string literal splits (#47360).

Ray Tune

📖 Documentation

Improve Tune documentation structure (#51684).
Fix syntax errors in Ray Tune example pbt_ppo_example.ipynb (#51626).

Ray Serve

🎉 New Features

Add request timeout sec for gRPC (#52276).
[Serve.llm] ray.llm support custom accelerators (#51359).

💫 Enhancements

Improve Serve deploy ignore behavior (#49336).
[Serve.llm] Telemetry GPU type fallback to cluster hardware when unspecified (#52003).

🔨 Fixes

Fix multiplex fallback logic during burst requests (#51389).
Don't stop retrying replicas when a deployment is scaling back up from zero (#51600).
Remove RAY_SERVE_ENABLE_QUEUE_LENGTH_CACHE flag (#51649).
Remove RAY_SERVE_EAGERLY_START_REPLACEMENT_REPLICAS flag (#51722).
Unify request cancellation errors (#51768).
Catch timeout error when checking if proxy is dead (#52002).
Suppress cancelled errors in proxy (#52423).
[Serve.llm] Fix loading model from remote storage and add docs (#51617).
[Serve.llm] Fix ServeReplica deployment failure for DeepSeek (#51989).
[Serve.llm] Check GPUType enum value rather than enum itself ([#52037]...

Contributors

pcmoritz, denadai2, and 89 other contributors

Assets 2

27 Mar 17:13

omatthew98

ray-2.44.1

daca7b2

Ray-2.44.1

There is no difference between 2.44.1 and 2.44.0, though we needed a patch version for other out of band reasons. To fill the awkward blankness, here is a haiku about Ray:

Under screen-lit skies
A ray of bliss in each patch
Joy at any scale

Assets 2

21 Mar 05:15

aslonnie

ray-2.44.0

36bed82

Ray-2.44.0

Release Highlights

This release features Ray Compiled Graph (beta). Ray Compiled Graph gives you a classic Ray Core-like API, but with (1) less than 50us system overhead for workloads that repeatedly execute the same task graph; and (2) native support for GPU-GPU communication via NCCL. Ray Compiled Graph APIs simplify high-performance multi-GPU workloads such as LLM inference and training. The beta release refines the API, enhances stability, and adds or improves features like visualization, profiling and experimental GPU compute/computation overlap. For more information, refer to Ray documentation: https://docs.ray.io/en/latest/ray-core/compiled-graph/ray-compiled-graph.html
The experimental Ray Workflows library has been deprecated and will be removed in a future version of Ray. Ray Workflows has been marked experimental since its inception and hasn’t been maintained due to the Ray team focusing on other priorities. If you are using Ray Workflows, we recommend pinning your Ray version to 2.44.

Ray Libraries

Ray Data

🎉 New Features:

Add Iceberg write support through pyiceberg (#50590 )
[LLM] Various feature enhancements to Ray Data LLM, including LoRA support #50804 and structured outputs #50901

💫 Enhancements:

Add dataset/operator state, progress, total metrics (#50770)
Make chunk combination threshold configurable (#51200)
Store average memory use per task in OpRuntimeMetrics (#51126)
Avoid unnecessary conversion to Numpy when creating Arrow/Pandas blocks (#51238)
Append-mode API for preprocessors -- #50848, #50847, #50642, #50856, #50584. Note that vectorizers and hashers now output a single column instead 1 column per feature. In the near future, we will be graduating preprocessors to beta.

🔨 Fixes:

Fixing Map Operators to avoid unconditionally overriding generator's back-pressure configuration (#50900)
Fix filter expr equating negative numbers (#50932)
Fix error message for override_num_blocks when reading from a HuggingFace Dataset (#50998)
Make num_blocks in repartition optional (#50997)
Always pin the seed when doing file-based random shuffle (#50924)
Fix StandardScaler to handle NaN stats (#51281)

Ray Train

🎉 New Features:

Implement state export API (#50622, #51085, #51177)

💫 Enhancements:

Folded v2.XGBoostTrainer API into the public trainer class as an alternate constructor (#50045)
Created a default ScalingConfig if one is not provided to the trainer (#51093)
Improved TrainingFailedError message (#51199)
Utilize FailurePolicy factory (#51067)

🔨 Fixes:

Fixed trainer import deserialization when captured within a Ray task (#50862)
Fixed serialize import test for Python 3.12 (#50963)
Fixed RunConfig deprecation message in Tune being emitted in trainer.fit usage (#51198)

📖 Documentation:

[Train V2] Updated API references (#51222)
[Train V2] Updated persistent storage guide (#51202)
[Train V2] Updated user guides for metrics, checkpoints, results, and experiment tracking (#51204)
[Train V2] Added updated Train + Tune user guide (#51048)
[Train V2] Added updated fault tolerance user guide (#51083)
Improved HF Transformers example (#50896)
Improved Train DeepSpeed example (#50906)
Use correct mean and standard deviation norm values in image tutorials (#50240)

🏗 Architecture refactoring:

Deprecated Torch AMP wrapper utilities (#51066)
Hid private functions of train context to avoid abuse (#50874)
Removed ray storage dependency and deprecated RAY_STORAGE env var configuration option (#50872)
Moved library usage tests out of core (#51161)

Ray Tune

📖 Documentation:

Various improvements to Tune Pytorch CIFAR tutorial (#50316)
Various improvements to the Ray Tune XGBoost tutorial (#50455)
Various enhancements to Tune Keras example (#50581)
Minor improvements to Hyperopt tutorial (#50697)
Various improvements to LightGBM tutorial (#50704)
Fixed non-runnable Optuna tutorial (#50404)
Added documentation for Asynchronous HyperBand Example in Tune (#50708)
Replaced reuse actors example with a fuller demonstration (#51234)
Fixed broken PB2/RLlib example (#51219)
Fixed typo and standardized equations across the two APIs (#51114)
Improved PBT example (#50870)
Removed broken links in documentation (#50995, #50996)

🏗 Architecture refactoring:

Removed ray storage dependency and deprecated RAY_STORAGE env var configuration option (#50872)
Moved library usage tests out of core (#51161)

Ray Serve

🎉 New Features:

Faster bulk imperative Serve Application deploys (#49168)
[LLM] Add gen-config (#51235)

💫 Enhancements:

Clean up shutdown behavior of serve (#51009)
Add additional_log_standard_attrs to serve logging config (#51144)
[LLM] remove asyncache and cachetools from dependencies (#50806)
[LLM] remove backoff dependency (#50822)
[LLM] Remove asyncio_timeout from ray[llm] deps on python<3.11 (#50815)
[LLM] Made JSON validator a singleton and jsonref packages lazy imported (#50821)
[LLM] Reuse AutoscalingConfig and DeploymentConfig from Serve (#50871)
[LLM] Use pyarrow FS for cloud remote storage interaction (#50820)
[LLM] Add usage telemetry for serve.llm (#51221)

🔨 Fixes:

Exclude redirects from request error count (#51130)
[LLM] Fix the wrong device_capability issue in vllm on quantized models (#51007)
[LLM] add gen-config related data file to the package (#51347)

📖 Documentation:

[LLM] Fix quickstart serve LLM docs (#50910)
[LLM] update build_openai_app to include yaml example (#51283)
[LLM] remove old vllm+serve doc (#51311)

RLlib

💫 Enhancements:

APPO/IMPALA accelerate:
- LearnerGroup should not pickle remote functions on each update-call; Refactor LearnerGroup and Learner APIs. (#50665)
- EnvRunner sync enhancements. (#50918 )
- Various other speedups: #51302, #50923, #50919, #50791
Unify namings for actor managers' outstanding in-flight requests metrics. (#51159)
Add timers to env step, forward pass, and complete connector pipelines runs. (#51160)

🔨 Fixes:

Multi-agent env vectorization:
- Fix MultiAgentEnvRunner env check bug. (#50891 )
- Add single_action_space and single_observation_space to VectorMultiAgentEnv. (#51096)
Other fixes: #51255, #50920, #51369

📖 Documentation:

Smaller fixes: #51015, #51219

Ray Core and Ray Clusters

Ray Core

🎉 New Features:

Enhanced uv support (#51233)

💫 Enhancements:

Made infeasible task errors much more obvious (#45909)
Log rotation for workers, runtime env agent, and dashboard agent (#50759, #50877, #50909)
Support customizing gloo timeout (#50223)
Support torch profiling in Compiled Graph (#51022)
Change default tensor deserialization in Compiled Graph (#50778)
Use current node id if no node is specified on ray drain-node (#51134)