Skip to content

Commit 08fb675

Browse files
committed
docs: Adding blog post for Ray
Signed-off-by: ntkathole <[email protected]>
1 parent 4563e80 commit 08fb675

File tree

9 files changed

+317
-23
lines changed

9 files changed

+317
-23
lines changed
402 KB
Loading

docs/blog/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,7 @@ Welcome to the Feast blog! Here you'll find articles about feature store develop
1919
{% content-ref url="rbac-role-based-access-controls.md" %}
2020
[rbac-role-based-access-controls.md](rbac-role-based-access-controls.md)
2121
{% endcontent-ref %}
22+
23+
{% content-ref url="feast-ray-distributed-processing.md" %}
24+
[feast-ray-distributed-processing.md](feast-ray-distributed-processing.md)
25+
{% endcontent-ref %}

docs/getting-started/components/compute-engine.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ engines.
2525
| SnowflakeComputeEngine | Runs on Snowflake, designed for scalable feature generation using Snowflake SQL. | ✅ | |
2626
| LambdaComputeEngine | Runs on AWS Lambda, designed for serverless feature generation. | ✅ | |
2727
| FlinkComputeEngine | Runs on Apache Flink, designed for stream processing and real-time feature generation. | ❌ | |
28-
| RayComputeEngine | Runs on Ray, designed for distributed feature generation and machine learning workloads. | | |
28+
| RayComputeEngine | Runs on Ray, designed for distributed feature generation and machine learning workloads. | | |
2929
```
3030

3131
### Batch Engine

docs/reference/compute-engine/ray.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,24 @@
22

33
The Ray compute engine is a distributed compute implementation that leverages [Ray](https://www.ray.io/) for executing feature pipelines including transformations, aggregations, joins, and materializations. It provides scalable and efficient distributed processing for both `materialize()` and `get_historical_features()` operations.
44

5+
## Quick Start with Ray Template
6+
7+
### Ray RAG Template - Batch Embedding at Scale
8+
9+
For RAG (Retrieval-Augmented Generation) applications with distributed embedding generation:
10+
11+
```bash
12+
feast init -t ray_rag my_rag_project
13+
cd my_rag_project/feature_repo
14+
```
15+
16+
The Ray RAG template demonstrates:
17+
- **Parallel Embedding Generation**: Uses Ray compute engine to generate embeddings across multiple workers
18+
- **Vector Search Integration**: Works with Milvus for semantic similarity search
19+
- **Complete RAG Pipeline**: Data → Embeddings → Search workflow
20+
21+
The Ray compute engine automatically distributes the embedding generation across available workers, making it ideal for processing large datasets efficiently.
22+
523
## Overview
624

725
The Ray compute engine provides:
@@ -365,6 +383,8 @@ batch_engine:
365383

366384
### With Feature Transformations
367385

386+
#### On-Demand Transformations
387+
368388
```python
369389
from feast import FeatureView, Field
370390
from feast.types import Float64
@@ -385,4 +405,27 @@ features = store.get_historical_features(
385405
)
386406
```
387407

408+
#### Ray Native Transformations
409+
410+
For distributed transformations that leverage Ray's dataset and parallel processing capabilities, use `mode="ray"` in your `BatchFeatureView`:
411+
412+
```python
413+
# Feature view with Ray transformation mode
414+
document_embeddings_view = BatchFeatureView(
415+
name="document_embeddings",
416+
entities=[document],
417+
mode="ray", # Enable Ray native transformation
418+
ttl=timedelta(days=365),
419+
schema=[
420+
Field(name="document_id", dtype=String),
421+
Field(name="embedding", dtype=Array(Float32), vector_index=True),
422+
Field(name="movie_name", dtype=String),
423+
Field(name="movie_director", dtype=String),
424+
],
425+
source=movies_source,
426+
udf=generate_embeddings_ray_native,
427+
online=True,
428+
)
429+
```
430+
388431
For more information, see the [Ray documentation](https://docs.ray.io/en/latest/) and [Ray Data guide](https://docs.ray.io/en/latest/data/getting-started.html).

docs/reference/offline-stores/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,7 @@ Please see [Offline Store](../../getting-started/components/offline-store.md) fo
4545
{% content-ref url="mssql.md" %}
4646
[mssql.md](mssql.md)
4747
{% endcontent-ref %}
48+
49+
{% content-ref url="ray.md" %}
50+
[ray.md](ray.md)
51+
{% endcontent-ref %}

docs/reference/offline-stores/overview.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -26,33 +26,33 @@ The first three of these methods all return a `RetrievalJob` specific to an offl
2626
## Functionality Matrix
2727

2828
There are currently four core offline store implementations: `DaskOfflineStore`, `BigQueryOfflineStore`, `SnowflakeOfflineStore`, and `RedshiftOfflineStore`.
29-
There are several additional implementations contributed by the Feast community (`PostgreSQLOfflineStore`, `SparkOfflineStore`, and `TrinoOfflineStore`), which are not guaranteed to be stable or to match the functionality of the core implementations.
29+
There are several additional implementations contributed by the Feast community (`PostgreSQLOfflineStore`, `SparkOfflineStore`, `TrinoOfflineStore`, and `RayOfflineStore`), which are not guaranteed to be stable or to match the functionality of the core implementations.
3030
Details for each specific offline store, such as how to configure it in a `feature_store.yaml`, can be found [here](README.md).
3131

3232
Below is a matrix indicating which offline stores support which methods.
3333

34-
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | Couchbase |
35-
| :-------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
36-
| `get_historical_features` | yes | yes | yes | yes | yes | yes | yes | yes |
37-
| `pull_latest_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes | yes |
38-
| `pull_all_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes | yes |
39-
| `offline_write_batch` | yes | yes | yes | yes | no | no | no | no |
40-
| `write_logged_features` | yes | yes | yes | yes | no | no | no | no |
34+
|| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | Couchbase | Ray |
35+
|| :-------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
36+
|| `get_historical_features` | yes | yes | yes | yes | yes | yes | yes | yes | yes |
37+
|| `pull_latest_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes | yes | yes |
38+
|| `pull_all_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes | yes | yes |
39+
|| `offline_write_batch` | yes | yes | yes | yes | no | no | no | no | yes |
40+
|| `write_logged_features` | yes | yes | yes | yes | no | no | no | no | yes |
4141

4242

4343
Below is a matrix indicating which `RetrievalJob`s support what functionality.
4444

45-
| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB | Couchbase |
46-
| --------------------------------- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
47-
| export to dataframe | yes | yes | yes | yes | yes | yes | yes | yes | yes |
48-
| export to arrow table | yes | yes | yes | yes | yes | yes | yes | yes | yes |
49-
| export to arrow batches | no | no | no | yes | no | no | no | no | no |
50-
| export to SQL | no | yes | yes | yes | yes | no | yes | no | yes |
51-
| export to data lake (S3, GCS, etc.) | no | no | yes | no | yes | no | no | no | yes |
52-
| export to data warehouse | no | yes | yes | yes | yes | no | no | no | yes |
53-
| export as Spark dataframe | no | no | yes | no | no | yes | no | no | no |
54-
| local execution of Python-based on-demand transforms | yes | yes | yes | yes | yes | no | yes | yes | yes |
55-
| remote execution of Python-based on-demand transforms | no | no | no | no | no | no | no | no | no |
56-
| persist results in the offline store | yes | yes | yes | yes | yes | yes | no | yes | yes |
57-
| preview the query plan before execution | yes | yes | yes | yes | yes | yes | yes | no | yes |
58-
| read partitioned data | yes | yes | yes | yes | yes | yes | yes | yes | yes |
45+
|| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB | Couchbase | Ray |
46+
|| --------------------------------- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
47+
|| export to dataframe | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes |
48+
|| export to arrow table | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes |
49+
|| export to arrow batches | no | no | no | yes | no | no | no | no | no | no |
50+
|| export to SQL | no | yes | yes | yes | yes | no | yes | no | yes | no |
51+
|| export to data lake (S3, GCS, etc.) | no | no | yes | no | yes | no | no | no | yes | yes |
52+
|| export to data warehouse | no | yes | yes | yes | yes | no | no | no | yes | no |
53+
|| export as Spark dataframe | no | no | yes | no | no | yes | no | no | no | no |
54+
|| local execution of Python-based on-demand transforms | yes | yes | yes | yes | yes | no | yes | yes | yes | yes |
55+
|| remote execution of Python-based on-demand transforms | no | no | no | no | no | no | no | no | no | no |
56+
|| persist results in the offline store | yes | yes | yes | yes | yes | yes | no | yes | yes | yes |
57+
|| preview the query plan before execution | yes | yes | yes | yes | yes | yes | yes | no | yes | yes |
58+
|| read partitioned data | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes |

docs/reference/offline-stores/ray.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,23 @@
55
66
The Ray offline store is a data I/O implementation that leverages [Ray](https://www.ray.io/) for reading and writing data from various sources. It focuses on efficient data access operations, while complex feature computation is handled by the [Ray Compute Engine](../compute-engine/ray.md).
77

8+
## Quick Start with Ray Template
9+
10+
The easiest way to get started with Ray offline store is to use the built-in Ray template:
11+
12+
```bash
13+
feast init -t ray my_ray_project
14+
cd my_ray_project/feature_repo
15+
```
16+
17+
This template includes:
18+
- Pre-configured Ray offline store and compute engine setup
19+
- Sample feature definitions optimized for Ray processing
20+
- Demo workflow showcasing Ray capabilities
21+
- Resource settings for local development
22+
23+
The template provides a complete working example with sample datasets and demonstrates both Ray offline store data I/O operations and Ray compute engine distributed processing.
24+
825
## Overview
926

1027
The Ray offline store provides:
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
---
2+
title: "Scaling ML with Feast and Ray: Distributed Processing for Modern AI Applications"
3+
description: "Learn how Feast's integration with Ray enables distributed processing for both traditional feature engineering and modern RAG applications, with support for Kubernetes deployment through KubeRay."
4+
date: 2025-10-29
5+
authors: ["Nikhil Kathole"]
6+
---
7+
8+
<div class="hero-image">
9+
<img src="/images/blog/feast_ray_architecture.png" alt="Feast + Ray Architecture for Distributed Processing" loading="lazy">
10+
</div>
11+
12+
In today's data-driven world, organizations are increasingly turning to distributed computing to handle large-scale machine learning workloads. When it comes to feature engineering and retrieval-augmented generation (RAG) systems, the combination of **Feast** and **Ray** provides a powerful solution for building scalable, production-ready pipelines.
13+
14+
This blog post explores how Feast's integration with Ray enables distributed processing for both traditional feature engineering and modern RAG applications, with support for Kubernetes deployment through KubeRay.
15+
16+
## The Scaling Challenge
17+
18+
Modern ML teams face critical scaling challenges:
19+
20+
- **Massive Datasets**: Processing millions of documents for embedding generation
21+
- **Complex Transformations**: CPU-intensive operations like text processing and feature engineering
22+
- **Real-time Requirements**: Low-latency retrieval for RAG applications
23+
- **Resource Efficiency**: Optimal utilization of compute resources across clusters
24+
25+
## Building Scalable Feature Pipelines and RAG Systems with Distributed Computing
26+
27+
Feast's integration with Ray addresses these challenges head-on, providing a unified platform where distributed processing is the default, not an afterthought. The magic happens when you realize that embedding generation, one of the most computationally intensive tasks in modern AI, can be treated as just another transformation in your feature pipeline.
28+
29+
### The Ray RAG Revolution
30+
31+
Consider the Ray RAG template, which demonstrates this new approach in action:
32+
33+
```bash
34+
# Built-in RAG template with distributed embedding generation
35+
feast init -t ray_rag my_rag_project
36+
cd my_rag_project/feature_repo
37+
```
38+
39+
This single command gives you a complete system that can process thousands of documents in parallel, generate embeddings using distributed computing, and serve them through a vector database.
40+
41+
The Ray RAG template demonstrates:
42+
43+
- **Parallel Embedding Generation**: Distribute embedding computation across workers
44+
- **Vector Search Integration**: Seamless integration with vector databases for similarity search
45+
- **Complete RAG Pipeline**: Data → Embeddings → Search in one workflow
46+
47+
## Embedding Generation as a Feast Transformation
48+
49+
Feast's Ray integration makes embedding generation a first-class transformation operation. When you define a transformation in Feast, Ray handles the complexity of distributed processing. It partitions your data, distributes the computation across available workers, and manages the orchestration, all transparently to the developer. Here's how it works in practice:
50+
51+
### Distributed Embedding Processing
52+
53+
```python
54+
from feast import BatchFeatureView, Entity, Field, FileSource
55+
from feast.types import Array, Float32, String
56+
from datetime import timedelta
57+
58+
# Embedding processor for distributed Ray processing
59+
class EmbeddingProcessor:
60+
"""Generate embeddings using SentenceTransformer model."""
61+
62+
def __init__(self):
63+
import torch
64+
from sentence_transformers import SentenceTransformer
65+
66+
device = "cuda" if torch.cuda.is_available() else "cpu"
67+
self.model = SentenceTransformer("all-MiniLM-L6-v2", device=device)
68+
69+
def __call__(self, batch):
70+
"""Process batch and generate embeddings."""
71+
descriptions = batch["Description"].fillna("").tolist()
72+
embeddings = self.model.encode(
73+
descriptions,
74+
show_progress_bar=False,
75+
batch_size=128,
76+
normalize_embeddings=True,
77+
convert_to_numpy=True,
78+
)
79+
batch["embedding"] = embeddings.tolist()
80+
return batch
81+
82+
# Ray native UDF for distributed processing
83+
def generate_embeddings_ray_native(ds):
84+
"""Distributed embedding generation using Ray Data."""
85+
max_workers = 8
86+
batch_size = 2500
87+
88+
# Optimize partitioning for available workers
89+
num_blocks = ds.num_blocks()
90+
if num_blocks < max_workers:
91+
ds = ds.repartition(max_workers)
92+
93+
result = ds.map_batches(
94+
EmbeddingProcessor,
95+
batch_format="pandas",
96+
concurrency=max_workers,
97+
batch_size=batch_size,
98+
)
99+
return result
100+
101+
# Feature view with Ray transformation
102+
document_embeddings_view = BatchFeatureView(
103+
name="document_embeddings",
104+
entities=[document],
105+
mode="ray", # Native Ray Dataset mode
106+
ttl=timedelta(days=365 * 100),
107+
schema=[
108+
Field(name="document_id", dtype=String),
109+
Field(name="embedding", dtype=Array(Float32), vector_index=True),
110+
Field(name="movie_name", dtype=String),
111+
Field(name="movie_director", dtype=String),
112+
],
113+
source=movies_source,
114+
udf=generate_embeddings_ray_native,
115+
online=True,
116+
)
117+
```
118+
119+
### RAG Query Example
120+
121+
```python
122+
from feast import FeatureStore
123+
from sentence_transformers import SentenceTransformer
124+
125+
# Initialize feature store
126+
store = FeatureStore(repo_path=".")
127+
128+
# Generate query embedding
129+
model = SentenceTransformer("all-MiniLM-L6-v2")
130+
query_embedding = model.encode(["sci-fi movie about space"])[0].tolist()
131+
132+
# Retrieve similar documents
133+
results = store.retrieve_online_documents_v2(
134+
features=[
135+
"document_embeddings:embedding",
136+
"document_embeddings:movie_name",
137+
"document_embeddings:movie_director",
138+
],
139+
query=query_embedding,
140+
top_k=5,
141+
).to_dict()
142+
143+
# Display results
144+
for i in range(len(results["document_id_pk"])):
145+
print(f"{i+1}. {results['movie_name'][i]}")
146+
print(f" Director: {results['movie_director'][i]}")
147+
print(f" Distance: {results['distance'][i]:.3f}")
148+
```
149+
150+
## Component Responsibilities
151+
152+
The Feast + Ray integration follows a clear separation of concerns:
153+
154+
- **Ray Compute Engine**: Executes distributed feature computations, transformations, and joins
155+
- **Ray Offline Store**: Handles data I/O operations, reading from various sources (Parquet, CSV, etc.)
156+
157+
This architectural separation ensures that each component has a single responsibility, making the system more maintainable and allowing for independent optimization of data access and computation layers.
158+
159+
## Ray Integration Modes
160+
161+
Feast supports three execution modes for Ray integration:
162+
163+
### 1. Local Development
164+
Perfect for experimentation and testing:
165+
166+
```yaml
167+
offline_store:
168+
type: ray
169+
storage_path: data/ray_storage
170+
# Conservative settings for local development
171+
broadcast_join_threshold_mb: 25
172+
max_parallelism_multiplier: 1
173+
target_partition_size_mb: 16
174+
```
175+
176+
### 2. Remote Ray Cluster
177+
Connect to existing Ray infrastructure:
178+
179+
```yaml
180+
offline_store:
181+
type: ray
182+
storage_path: s3://my-bucket/feast-data
183+
ray_address: "ray://my-cluster.example.com:10001"
184+
```
185+
186+
### 3. Kubernetes with KubeRay
187+
Enterprise-ready deployment:
188+
189+
```yaml
190+
offline_store:
191+
type: ray
192+
storage_path: s3://my-bucket/feast-data
193+
use_kuberay: true
194+
kuberay_conf:
195+
cluster_name: "feast-ray-cluster"
196+
namespace: "feast-system"
197+
```
198+
199+
## Getting Started
200+
201+
### Install Feast with Ray Support
202+
```bash
203+
pip install feast[ray]
204+
```
205+
206+
### Initialize Ray RAG Template
207+
```bash
208+
# RAG applications with distributed embedding generation
209+
feast init -t ray_rag my_rag_project
210+
cd my_rag_project/feature_repo
211+
```
212+
213+
### Deploy to Production
214+
```bash
215+
feast apply
216+
feast materialize --disable-event-timestamp
217+
python test_workflow.py
218+
```
219+
220+
Whether you're building traditional feature pipelines or modern RAG systems, Feast + Ray offers the scalability and performance needed for production workloads. The integration supports everything from local development to large-scale Kubernetes deployments, making it an ideal choice for organizations looking to scale their ML infrastructure.
221+
222+
---
223+
224+
**Ready to build distributed RAG applications?** Get started with our [Ray RAG template](https://docs.feast.dev/reference/compute-engine/ray) and explore [Feast + Ray documentation](https://docs.feast.dev/reference/offline-stores/ray) for distributed embedding generation.
225+
226+
*Learn more about Feast's distributed processing capabilities and join the community at [feast.dev](https://feast.dev).*
402 KB
Loading

0 commit comments

Comments
 (0)