@@ -9,10 +9,11 @@ The Ray offline store is a data I/O implementation that leverages [Ray](https://
99
1010The Ray offline store provides:
1111- Ray-based data reading from file sources (Parquet, CSV, etc.)
12- - Support for both local and distributed Ray clusters
12+ - Support for local, remote, and KubeRay (Kubernetes-managed) clusters
1313- Integration with various storage backends (local files, S3, GCS, HDFS)
1414- Efficient data filtering and column selection
1515- Timestamp-based data processing with timezone awareness
16+ - Enterprise-ready KubeRay cluster support via CodeFlare SDK
1617
1718
1819## Functionality Matrix
@@ -59,9 +60,15 @@ For complex feature processing, historical feature retrieval, and distributed jo
5960
6061## Configuration
6162
62- The Ray offline store can be configured in your ` feature_store.yaml ` file. Below are two main configuration patterns :
63+ The Ray offline store can be configured in your ` feature_store.yaml ` file. It supports ** three execution modes ** :
6364
64- ### Basic Ray Offline Store
65+ 1 . ** LOCAL** : Ray runs locally on the same machine (default)
66+ 2 . ** REMOTE** : Connects to a remote Ray cluster via ` ray_address `
67+ 3 . ** KUBERAY** : Connects to Ray clusters on Kubernetes via CodeFlare SDK
68+
69+ ### Execution Modes
70+
71+ #### Local Mode (Default)
6572
6673For simple data I/O operations without distributed processing:
6774
@@ -72,7 +79,44 @@ provider: local
7279offline_store :
7380 type : ray
7481 storage_path : data/ray_storage # Optional: Path for storing datasets
75- ray_address : localhost:10001 # Optional: Ray cluster address
82+ ` ` `
83+
84+ #### Remote Ray Cluster
85+
86+ Connect to an existing Ray cluster:
87+
88+ ` ` ` yaml
89+ offline_store :
90+ type : ray
91+ storage_path : s3://my-bucket/feast-data
92+ ray_address : " ray://my-cluster.example.com:10001"
93+ ` ` `
94+
95+ #### KubeRay Cluster (Kubernetes)
96+
97+ Connect to Ray clusters on Kubernetes using CodeFlare SDK:
98+
99+ ` ` ` yaml
100+ offline_store :
101+ type : ray
102+ storage_path : s3://my-bucket/feast-data
103+ use_kuberay : true
104+ kuberay_conf :
105+ cluster_name : " feast-ray-cluster"
106+ namespace : " feast-system"
107+ auth_token : " ${RAY_AUTH_TOKEN}"
108+ auth_server : " https://api.openshift.com:6443"
109+ skip_tls : false
110+ enable_ray_logging : false
111+ ` ` `
112+
113+ **Environment Variables** (alternative to config file):
114+ ` ` ` bash
115+ export FEAST_RAY_USE_KUBERAY=true
116+ export FEAST_RAY_CLUSTER_NAME=feast-ray-cluster
117+ export FEAST_RAY_AUTH_TOKEN=your-token
118+ export FEAST_RAY_AUTH_SERVER=https://api.openshift.com:6443
119+ export FEAST_RAY_NAMESPACE=feast-system
76120```
77121
78122### Ray Offline Store + Compute Engine
@@ -175,8 +219,29 @@ batch_engine:
175219|--------|------|---------|-------------|
176220| ` type` | string | Required | Must be `feast.offline_stores.contrib.ray_offline_store.ray.RayOfflineStore` or `ray` |
177221| `storage_path` | string | None | Path for storing temporary files and datasets |
178- | `ray_address` | string | None | Address of the Ray cluster (e.g., "localhost:10001") |
222+ | `ray_address` | string | None | Ray cluster address (triggers REMOTE mode, e.g., "ray://host:10001") |
223+ | `use_kuberay` | boolean | None | Enable KubeRay mode (overrides ray_address) |
224+ | `kuberay_conf` | dict | None | **KubeRay configuration dict** with keys : ` cluster_name` (required), `namespace` (default: "default"), `auth_token`, `auth_server`, `skip_tls` (default: false) |
225+ | `enable_ray_logging` | boolean | false | Enable Ray progress bars and verbose logging |
179226| `ray_conf` | dict | None | Ray initialization parameters for resource management (e.g., memory, CPU limits) |
227+ | `broadcast_join_threshold_mb` | int | 100 | Size threshold for broadcast joins (MB) |
228+ | `enable_distributed_joins` | boolean | true | Enable distributed joins for large datasets |
229+ | `max_parallelism_multiplier` | int | 2 | Parallelism as multiple of CPU cores |
230+ | `target_partition_size_mb` | int | 64 | Target partition size (MB) |
231+ | `window_size_for_joins` | string | "1H" | Time window for distributed joins |
232+
233+ # ### Mode Detection Precedence
234+
235+ The Ray offline store automatically detects the execution mode using the following precedence :
236+
237+ 1. **Environment Variables** (highest priority)
238+ - ` FEAST_RAY_USE_KUBERAY` , `FEAST_RAY_CLUSTER_NAME`, etc.
239+ 2. **Config `kuberay_conf`**
240+ - If present → KubeRay mode
241+ 3. **Config `ray_address`**
242+ - If present → Remote mode
243+ 4. **Default**
244+ - Local mode (lowest priority)
180245
181246# ### Ray Compute Engine Options
182247
@@ -385,6 +450,8 @@ job.persist(hdfs_storage, allow_overwrite=True)
385450
386451# ## Using Ray Cluster
387452
453+ # ### Standard Ray Cluster
454+
388455To use Ray in cluster mode for distributed data access :
389456
3904571. Start a Ray cluster :
@@ -406,6 +473,53 @@ offline_store:
406473ray start --address='head-node-ip:10001'
407474` ` `
408475
476+ # ### KubeRay Cluster (Kubernetes)
477+
478+ To use Feast with Ray clusters on Kubernetes via CodeFlare SDK :
479+
480+ **Prerequisites:**
481+ - KubeRay cluster deployed on Kubernetes
482+ - CodeFlare SDK installed : ` pip install codeflare-sdk`
483+ - Access credentials for the Kubernetes cluster
484+
485+ **Configuration:**
486+
487+ 1. Using configuration file :
488+ ` ` ` yaml
489+ offline_store:
490+ type: ray
491+ use_kuberay: true
492+ storage_path: s3://my-bucket/feast-data
493+ kuberay_conf:
494+ cluster_name: "feast-ray-cluster"
495+ namespace: "feast-system"
496+ auth_token: "${RAY_AUTH_TOKEN}"
497+ auth_server: "https://api.openshift.com:6443"
498+ skip_tls: false
499+ enable_ray_logging: false
500+ ` ` `
501+
502+ 2. Using environment variables :
503+ ` ` ` bash
504+ export FEAST_RAY_USE_KUBERAY=true
505+ export FEAST_RAY_CLUSTER_NAME=feast-ray-cluster
506+ export FEAST_RAY_AUTH_TOKEN=your-k8s-token
507+ export FEAST_RAY_AUTH_SERVER=https://api.openshift.com:6443
508+ export FEAST_RAY_NAMESPACE=feast-system
509+ export FEAST_RAY_SKIP_TLS=false
510+
511+ # Then use standard Feast code
512+ python your_feast_script.py
513+ ` ` `
514+
515+ **Features:**
516+ - The CodeFlare SDK handles cluster connection and authentication
517+ - Automatic TLS certificate management
518+ - Authentication with Kubernetes clusters
519+ - Namespace isolation
520+ - Secure communication between client and Ray cluster
521+ - Automatic cluster discovery
522+
409523# ## Data Source Validation
410524
411525The Ray offline store validates data sources to ensure compatibility :
0 commit comments