Skip to content

Ray-2.48.0

Latest
Compare
Choose a tag to compare
@khluu khluu released this 18 Jul 22:27
· 1 commit to releases/2.48.0 since this release
0349122

Release Highlights

  • Ray Data: This release features a new Delta Lake and Unity Catalog integration and performance improvements to various reading/writing operators.
  • Ray Core: Enhanced GPU object support with intra-process communication and improved Autoscaler v2 functionality
  • Ray Train: Improved hardware metrics integration with Grafana and enhanced collective operations support
  • Ray Serve LLM: This release features early proof of concept for prefill-decode disaggregation deployment and LLM-aware request routing such as prefix-cache aware routing.
  • Ray Data LLM: Improved throughput and CPU memory utilization for ray data workers.

Ray Libraries

Ray Data

🎉 New Features:

  • Add reading from Delta Lake tables and Unity Catalog integration (#53701)
  • Enhanced pin_memory support in iter_torch_batches (#53792)
  • Add pin_memory to iter_torch_batches (#53792)

💫 Enhancements:

  • Re-enabled sorting in Ray Data tests with performance improvements (#54475)
  • Enhanced handling of mismatched columns and pandas.NA values (#53861, #53859)
  • Improved read_text trailing newline semantics (#53860)
  • Optimized backpressure handling with policy-based resource management (#54376)
  • Enhanced write_parquet with support for both partition_by and row limits (#53930)
  • Prevent filename collisions on write operations (#53890)
  • Improved execution performance for One Hot encoding in preprocessors (#54022)

🔨 Fixes:

  • Fixing map_groups issues (#54462)
  • Prevented Op fusion for streaming repartition to avoid performance degradation (#54469)
  • Fixed ActorPool autoscaler scaling up logic (#53983)
  • Resolved empty dataset repartitioning issues (#54107)
  • Fixed PyArrow overflow handling in data processing (#53971, #54390)
  • Fixed IcebergDatasink to properly generate individual file uuids (#52956)
  • Avoid OOMs with read_json(..., lines=True) (#54436)
  • Handle HuggingFace parquet dataset resolve URLs (#54146)
  • Fixed BlockMetadata derivation for Read operator (#53908)

📖 Documentation:

  • Updated AggregateFnV2 documentation to clarify finalize method (#53835)
  • Improved preprocessor and vectorizer API documentation

Ray Train

🎉 New Features:

  • Added broadcast_from_rank_zero and barrier collective operations (#54066)
  • Enhanced hardware metrics integration with Grafana dashboards (#53218)
  • Added support for dynamically loading callbacks via environment variables (#54233)

💫 Enhancements:

  • Improved checkpoint population from before_init_train_context (#54453)
  • Enhanced controller state logging and metrics (#52805)
  • Added structured logging environment variable support (#52952)
  • Improved handling of Noop scaling decisions for smoother scaling logic (#53180)
  • Logging of controller state transitions to aid in debugging and analysis (#53344)

🔨 Fixes:

  • Fixed GPU tensor reporting in ray.train.report (#53725)
  • Enhanced move_tensors_to_device utility for complex tensor structures (#53109)
  • Improved worker health check error handling with trace information (#53626)
  • Fixed GPU transfer support for non-contiguous tensors (#52548)
  • Force abort on SIGINT spam and do not abort finished runs (#54188)

📖 Documentation:

  • Updated beginner PyTorch example (#54124)
  • Added documentation for ray.train.collective APIs (#54340)
  • Added a note about PyTorch DataLoader's multiprocessing and forkserver usage (#52924)
  • Fixed various docstring format and indentation issues (#52855, #52878)
  • Added note that ray.train.report API docs should mention optional checkpoint_dir_name (#54391)

🏗 Architecture refactoring:

  • Removed subclass relationship between RunConfig and RunConfigV1 (#54293)
  • Enhanced error handling for finished training runs (#54188)
  • Deduplicated ML doctest runners in CI for efficiency (#53157)
  • Converted isort configuration to Ruff for consistency (#52869)

Ray Tune

💫 Enhancements:

  • Updated test_train_v2_integration to use the correct RunConfig (#52882)

🔨 Fixes:

  • Fixed RayTaskError serialization logic (#54396)
  • Improved experiment restore timeout handling (#53387)

📖 Documentation:

  • Replaced session.report with tune.report and corrected import paths (#52801)
  • Removed outdated graphics cards reference in docs (#52922)
  • Fixed various docstring format issues (#52879)

Ray Serve

🎉 New Features:

  • Added RouterConfig field to DeploymentConfig for custom RequestRouter configuration (#53870)
  • Added support for implementing custom request routing algorithms (#53251)

💫 Enhancements:

  • Enhanced FastAPI ingress deployment validation for multiple deployments (#53647)
  • Optimized get_live_deployments performance (#54454)
  • Progress towards making ray.serve.llm compatible with vLLM serve frontend (#54481, #54443, #54440)

🔨 Fixes:

  • Fixed deployment scheduler issues with component scheduling (#54479)
  • Fixed runtime_env validation for py_modules (#53186)
  • Added descriptive error message when deployment name is not found (#45181)

📖 Documentation:

  • Added troubleshooting guide for DeepSeek/multi-node GPU deployment on KubeRay (#54229)
  • Updated the guide on serving models with Triton Server in Ray Serve
  • Added documentation for custom request routing algorithms
  • Added custom request router docs (#53511)

🏗 Architecture refactoring:

  • Remove indirection layers of node initialization (#54481)
  • Incremental refactor of LLMEngine (#54443)
  • Remove random v0 logic from serve endpoints (#54440)
  • Remove usage of internal_api.memory_summary() (#54417)
  • Remove usage of ray._private.state (#54140)

Ray Serve/Data LLM

🎉 New Features

  • Support separate deployment config for PDProxy in PrefixAwareReplicaSet (#53935)
  • Support for prefix-aware request router (#52725)

💫 Enhancements

  • Log engine stats after each batch task is done. (#54360)
  • Decouple max_tasks_in_flight from max_concurrent_batches (#54362)
  • Make llm serve endpoints compatible with vLLM serve frontend, including streaming, tool_code, and health check support (#54440)
  • Remove botocore dependency in Ray Serve LLM (#54156)
  • Update vLLM version to 0.9.2 (#54407)

🔨 Fixes

  • Fix health check in prefill disagg (#53937)
  • Fix doc to only support int concurrency (#54196)
  • Fix vLLM batch test by changing to Pixtral (#53744)
  • Fix pickle error with remote code models in vLLM Ray workloads (#53868)
  • Adaption of the change of vllm.PoolingOutput (#54467)

📖 Documentation

  • Ray serve/lora doc fix (#53553)
  • Add Ray serve/LLM doc (#52832)
  • Add a doc snippet to inform users about existing diffs between vLLM and Ray Serve LLM behavior in some APIs like streaming, tool_code, and health check (#54123)
  • Troubleshooting DeepSeek/multi-node GPU deployment on KubeRay (#54229)

🏗 Architecture refactoring

  • Make llm serve endpoints compatible with vLLM serve frontend, including streaming, tool_code, and health check support (#54490)
  • Prefix-aware scheduler [2/N] Configure PrefixAwareReplicaSet to correctly handle the number of available GPUs for each worker and to ensure efficient GPU utilization in vLLM (#53192)
  • Organize spread out utils.py (#53722)
  • Remove ImageRetriever class and related tests from the LLM serving codebase. (#54018)
  • Return a batch of rows in the udf instead of row by row (#54329)

RLlib

🎉 New Features:

  • Implemented Offline Policy Evaluation (OPE) via Importance Sampling (#53702)
  • Enhanced ConnectorV2 ObservationPreprocessor APIs with multi-agent support (#54209)
  • Add GPU inference to offline evaluation (#52718)

💫 Enhancements:

  • Enhanced MetricsLogger to handle tensors in state management (#53514)
  • Improved env seeding in EnvRunners with deterministic training example rewrite (#54039)
  • Cleanup of meta learning classes and examples (#52680)

🔨 Fixes:

  • Fixed EnvRunner restoration when no local EnvRunner is available (#54091)
  • Fixed shapes in explained_variance for recurrent policies (#54005)
  • Resolved device check issues in Learner implementation (#53706)
  • Enhanced numerical stability in MeanStdFilter (#53484)
  • Fixed weight synching in offline evaluation (#52757)
  • Fixed bug in split_and_zero_pad utility function (#52818)

📖 Documentation:

  • Do-over of examples for connector pipelines (#52604)
  • Remove "new API stack" banner from all RLlib docs pages as it's now the default (#54282)

Ray Core

🎉 New Features:

  • Enhanced GPU object support with intra-process communication (#53798)
  • Integrated single-controller collective APIs with GPU objects (#53720)
  • Added support for ray.get on driver process for GPU objects (#53902)
  • Supporting allreduce on list of input nodes in compiled graphs (#51047)
  • Add single-controller API for ray.util.collective and torch gloo backend (#53319)

💫 Enhancements:

  • Improved autoscaler v2 functionality with cloud instance ID reusing (#54397)
  • Enhanced cluster task manager with better resource management (#54413)
  • Upgraded OpenTelemetry SDK for better observability (#53745)
  • Improved actor scheduling to prevent deadlocks in ordered actors (#54034)
  • Enhanced get_max_resources_from_cluster_config functionality (#54455)
  • Use std::move in cluster task manager constructor (#54413)
  • Improve status messages and add comments about stale seq_no handling (#54470)
  • uv run integration is now enabled by default (#53060)

🔨 Fixes:

  • Fixed race conditions in object eviction and repinning for recovery (#53934)
  • Resolved GCS crash issues on duplicate MarkJobFinished RPCs (#53951)
  • Enhanced actor restart handling on node failures (#54088)
  • Improved reference counting during worker graceful shutdown (#53002)
  • Fix race condition when canceling task that hasn't started yet (#52703)
  • Fix the issue where a valid RestartActor rpc is ignored (#53330)
  • Fixed "Check failed: it->second.num_retries_left == -1" error (#54116)
  • Fix detached actor being unexpectedly killed (#53562)

📖 Documentation:

  • Enhanced troubleshooting guides and API documentation
  • Updated reStructuredText formatting on Resources page (#53882)
  • Fix working code snippets (#52748)
  • Add doc for running KubeRay dashboard (#53830)
  • Add antipattern for nested ray.get (#43184)

🏗 Architecture refactoring:

  • Delete old skipped tests and unused code (#54427)
  • Consolidate TaskManager interface (#54317)
  • Move dependencies of NodeManager to main.cc for better testability (#53782)
  • Use smart pointer in logging.cc (#54351)
  • Delete event_label and unused environment variables (#54378, #54095)
  • Remove actor task path in normal task submitter (#53996)
  • Rename GcsFunctionManager and use fake in test (#53973)

Dashboard

🎉 New Features:

  • Add dynolog for on-demand GPU profiling for Torch training (#53191)

💫 Enhancements:

  • Added TPU usage metrics to reporter agent (#53678)
  • Enhanced GPU profiling manager IP address retrieval (#53807)
  • Improved configurability of Grafana dashboard parameters (#53236)
  • Add configurability of 'orgId' param for requesting Grafana dashboards (#53236)

🔨 Fixes:

  • Fixed Grafana dashboard dropdowns for data and train dashboards (#52752)
  • Resolved daylight savings time issues in dashboard (#52755)
  • Fix retrieving IP address from the GPUProfilingManager on the dashboard agent (#53807)

Docs
🎉 New Features:

  • New end-to-end examples:
    • Multi-modal AI pipeline (#52342)
    • Xgboost tutorial (#52383)
    • Audio transcription and LLM as judge curation (#53189)
    • LLM training and inference (#53415)
    • Scalable video processing (#50965)

💫 Enhancements:

  • Add pydoclint to pre-commit (#52974)
  • Add vale to pre-commit (#53564)

Breaking Changes

  • Removed deprecated ray.workflow package (#53612)
  • Removed deprecated storage parameter from ray.init (#53669)
  • Removed deprecated ray start CLI options (#53675)
  • Removed experimental "array" library (#54105)
  • Remove dask from byod 3.9 deps (#54521)

Dependencies & Build

  • Added uv binary v0.7.19 for improved package management (#54437)
  • Upgraded datasets in release tests (#54425)
  • Enhanced wheel building process with single bazel call optimization (#54476)
  • Fixed uv run parser for handling extra arguments (#54488)
  • Upgrade h11, requests, starlette, jinja2, pyopenssl and cryptography
  • Generate multi-arch image indexes (#52816)

Thanks!

Thank you to everyone who contributed to this release!
@kouroshHakha, @davidwagnerkc, @MengjinYan, @minerharry, @simonsays1980, @Myasuka, @noemotiovon, @goutamvenkat-anyscale, @harshit-anyscale, @jugalshah291, @tianyi-ge, @sven1977, @crypdick, @JohnsonKuan, @lk-chen, @richardsliu, @alexeykudinkin, @EagleLo, @soffer-anyscale, @zcin, @AdrienVannson, @nilsmelchert, @raulchen, @jujipotle, @DrehanM, @vigneshka, @Ziy1-Tan, @Blaze-DSP, @ArthurBook, @GokuMohandas, @walkoss, @bveeramani, @edoakes, @omatthew98, @SeanQuant, @CheyuWu, @cszhu, @win5923, @kevin85421, @angelinalg, @iamjustinhsu, @eicherseiji, @kunling-anyscale, @vickytsang, @MortalHappiness, @aslonnie, @psr-ai, @sbhat98, @anyadontfly, @marwan116, @cristianjd, @2niuhe, @codope, @fscnick, @ryanaoleary, @srinathk10, @TimothySeah, @han-steve, @Future-Outlier, @Syulin7, @Qiaolin-Yu, @elliot-barn, @JoshKarpel, @dayshah, @can-anyscale, @ok-scale, @mattip, @SolitaryThinker, @owenowenisme, @nehiljain, @GeneDer, @rnkrtt, @israbbani, @DriverSong, @sinalallsite, @pcmoritz, @akyang-anyscale, @xinyuangui2, @nrghosh, @davidxia, @rueian, @stephanie-wang, @jjyao, @chris-ray-zhang, @czgdp1807, @justinvyu, @Daraan, @landscapepainter, @troychiu, @khluu, @hipudding, @ruisearch42, @robertnishihara, @ArturNiederfahrenhorst, @abrarsheikh, @alanwguo, @HollowMan6, @ran1995data, @matthewdeng