|
| 1 | +--- |
| 2 | +title: "Scaling ML with Feast and Ray: Distributed Processing for Modern AI Applications" |
| 3 | +description: "Learn how Feast's integration with Ray enables distributed processing for both traditional feature engineering and modern RAG applications, with support for Kubernetes deployment through KubeRay." |
| 4 | +date: 2025-10-29 |
| 5 | +authors: ["Nikhil Kathole"] |
| 6 | +--- |
| 7 | + |
| 8 | +<div class="hero-image"> |
| 9 | + <img src="/images/blog/feast_ray_architecture.png" alt="Feast + Ray Architecture for Distributed Processing" loading="lazy"> |
| 10 | +</div> |
| 11 | + |
| 12 | +In today's data-driven world, organizations are increasingly turning to distributed computing to handle large-scale machine learning workloads. When it comes to feature engineering and retrieval-augmented generation (RAG) systems, the combination of **Feast** and **Ray** provides a powerful solution for building scalable, production-ready pipelines. |
| 13 | + |
| 14 | +This blog post explores how Feast's integration with Ray enables distributed processing for both traditional feature engineering and modern RAG applications, with support for Kubernetes deployment through KubeRay. |
| 15 | + |
| 16 | +## The Scaling Challenge |
| 17 | + |
| 18 | +Modern ML teams face critical scaling challenges: |
| 19 | + |
| 20 | +- **Massive Datasets**: Processing millions of documents for embedding generation |
| 21 | +- **Complex Transformations**: CPU-intensive operations like text processing and feature engineering |
| 22 | +- **Real-time Requirements**: Low-latency retrieval for RAG applications |
| 23 | +- **Resource Efficiency**: Optimal utilization of compute resources across clusters |
| 24 | + |
| 25 | +## Building Scalable Feature Pipelines and RAG Systems with Distributed Computing |
| 26 | + |
| 27 | +Feast's integration with Ray addresses these challenges head-on, providing a unified platform where distributed processing is the default, not an afterthought. The magic happens when you realize that embedding generation, one of the most computationally intensive tasks in modern AI, can be treated as just another transformation in your feature pipeline. |
| 28 | + |
| 29 | +### The Ray RAG Revolution |
| 30 | + |
| 31 | +Consider the Ray RAG template, which demonstrates this new approach in action: |
| 32 | + |
| 33 | +```bash |
| 34 | +# Built-in RAG template with distributed embedding generation |
| 35 | +feast init -t ray_rag my_rag_project |
| 36 | +cd my_rag_project/feature_repo |
| 37 | +``` |
| 38 | + |
| 39 | +This single command gives you a complete system that can process thousands of documents in parallel, generate embeddings using distributed computing, and serve them through a vector database. |
| 40 | + |
| 41 | +The Ray RAG template demonstrates: |
| 42 | + |
| 43 | +- **Parallel Embedding Generation**: Distribute embedding computation across workers |
| 44 | +- **Vector Search Integration**: Seamless integration with vector databases for similarity search |
| 45 | +- **Complete RAG Pipeline**: Data → Embeddings → Search in one workflow |
| 46 | + |
| 47 | +## Embedding Generation as a Feast Transformation |
| 48 | + |
| 49 | +Feast's Ray integration makes embedding generation a first-class transformation operation. When you define a transformation in Feast, Ray handles the complexity of distributed processing. It partitions your data, distributes the computation across available workers, and manages the orchestration, all transparently to the developer. Here's how it works in practice: |
| 50 | + |
| 51 | +### Distributed Embedding Processing |
| 52 | + |
| 53 | +```python |
| 54 | +from feast import BatchFeatureView, Entity, Field, FileSource |
| 55 | +from feast.types import Array, Float32, String |
| 56 | +from datetime import timedelta |
| 57 | + |
| 58 | +# Embedding processor for distributed Ray processing |
| 59 | +class EmbeddingProcessor: |
| 60 | + """Generate embeddings using SentenceTransformer model.""" |
| 61 | + |
| 62 | + def __init__(self): |
| 63 | + import torch |
| 64 | + from sentence_transformers import SentenceTransformer |
| 65 | + |
| 66 | + device = "cuda" if torch.cuda.is_available() else "cpu" |
| 67 | + self.model = SentenceTransformer("all-MiniLM-L6-v2", device=device) |
| 68 | + |
| 69 | + def __call__(self, batch): |
| 70 | + """Process batch and generate embeddings.""" |
| 71 | + descriptions = batch["Description"].fillna("").tolist() |
| 72 | + embeddings = self.model.encode( |
| 73 | + descriptions, |
| 74 | + show_progress_bar=False, |
| 75 | + batch_size=128, |
| 76 | + normalize_embeddings=True, |
| 77 | + convert_to_numpy=True, |
| 78 | + ) |
| 79 | + batch["embedding"] = embeddings.tolist() |
| 80 | + return batch |
| 81 | + |
| 82 | +# Ray native UDF for distributed processing |
| 83 | +def generate_embeddings_ray_native(ds): |
| 84 | + """Distributed embedding generation using Ray Data.""" |
| 85 | + max_workers = 8 |
| 86 | + batch_size = 2500 |
| 87 | + |
| 88 | + # Optimize partitioning for available workers |
| 89 | + num_blocks = ds.num_blocks() |
| 90 | + if num_blocks < max_workers: |
| 91 | + ds = ds.repartition(max_workers) |
| 92 | + |
| 93 | + result = ds.map_batches( |
| 94 | + EmbeddingProcessor, |
| 95 | + batch_format="pandas", |
| 96 | + concurrency=max_workers, |
| 97 | + batch_size=batch_size, |
| 98 | + ) |
| 99 | + return result |
| 100 | + |
| 101 | +# Feature view with Ray transformation |
| 102 | +document_embeddings_view = BatchFeatureView( |
| 103 | + name="document_embeddings", |
| 104 | + entities=[document], |
| 105 | + mode="ray", # Native Ray Dataset mode |
| 106 | + ttl=timedelta(days=365 * 100), |
| 107 | + schema=[ |
| 108 | + Field(name="document_id", dtype=String), |
| 109 | + Field(name="embedding", dtype=Array(Float32), vector_index=True), |
| 110 | + Field(name="movie_name", dtype=String), |
| 111 | + Field(name="movie_director", dtype=String), |
| 112 | + ], |
| 113 | + source=movies_source, |
| 114 | + udf=generate_embeddings_ray_native, |
| 115 | + online=True, |
| 116 | +) |
| 117 | +``` |
| 118 | + |
| 119 | +### RAG Query Example |
| 120 | + |
| 121 | +```python |
| 122 | +from feast import FeatureStore |
| 123 | +from sentence_transformers import SentenceTransformer |
| 124 | + |
| 125 | +# Initialize feature store |
| 126 | +store = FeatureStore(repo_path=".") |
| 127 | + |
| 128 | +# Generate query embedding |
| 129 | +model = SentenceTransformer("all-MiniLM-L6-v2") |
| 130 | +query_embedding = model.encode(["sci-fi movie about space"])[0].tolist() |
| 131 | + |
| 132 | +# Retrieve similar documents |
| 133 | +results = store.retrieve_online_documents_v2( |
| 134 | + features=[ |
| 135 | + "document_embeddings:embedding", |
| 136 | + "document_embeddings:movie_name", |
| 137 | + "document_embeddings:movie_director", |
| 138 | + ], |
| 139 | + query=query_embedding, |
| 140 | + top_k=5, |
| 141 | +).to_dict() |
| 142 | + |
| 143 | +# Display results |
| 144 | +for i in range(len(results["document_id_pk"])): |
| 145 | + print(f"{i+1}. {results['movie_name'][i]}") |
| 146 | + print(f" Director: {results['movie_director'][i]}") |
| 147 | + print(f" Distance: {results['distance'][i]:.3f}") |
| 148 | +``` |
| 149 | + |
| 150 | +## Component Responsibilities |
| 151 | + |
| 152 | +The Feast + Ray integration follows a clear separation of concerns: |
| 153 | + |
| 154 | +- **Ray Compute Engine**: Executes distributed feature computations, transformations, and joins |
| 155 | +- **Ray Offline Store**: Handles data I/O operations, reading from various sources (Parquet, CSV, etc.) |
| 156 | + |
| 157 | +This architectural separation ensures that each component has a single responsibility, making the system more maintainable and allowing for independent optimization of data access and computation layers. |
| 158 | + |
| 159 | +## Ray Integration Modes |
| 160 | + |
| 161 | +Feast supports three execution modes for Ray integration: |
| 162 | + |
| 163 | +### 1. Local Development |
| 164 | +Perfect for experimentation and testing: |
| 165 | + |
| 166 | +```yaml |
| 167 | +offline_store: |
| 168 | + type: ray |
| 169 | + storage_path: data/ray_storage |
| 170 | + # Conservative settings for local development |
| 171 | + broadcast_join_threshold_mb: 25 |
| 172 | + max_parallelism_multiplier: 1 |
| 173 | + target_partition_size_mb: 16 |
| 174 | +``` |
| 175 | +
|
| 176 | +### 2. Remote Ray Cluster |
| 177 | +Connect to existing Ray infrastructure: |
| 178 | +
|
| 179 | +```yaml |
| 180 | +offline_store: |
| 181 | + type: ray |
| 182 | + storage_path: s3://my-bucket/feast-data |
| 183 | + ray_address: "ray://my-cluster.example.com:10001" |
| 184 | +``` |
| 185 | +
|
| 186 | +### 3. Kubernetes with KubeRay |
| 187 | +Enterprise-ready deployment: |
| 188 | +
|
| 189 | +```yaml |
| 190 | +offline_store: |
| 191 | + type: ray |
| 192 | + storage_path: s3://my-bucket/feast-data |
| 193 | + use_kuberay: true |
| 194 | + kuberay_conf: |
| 195 | + cluster_name: "feast-ray-cluster" |
| 196 | + namespace: "feast-system" |
| 197 | +``` |
| 198 | +
|
| 199 | +## Getting Started |
| 200 | +
|
| 201 | +### Install Feast with Ray Support |
| 202 | +```bash |
| 203 | +pip install feast[ray] |
| 204 | +``` |
| 205 | + |
| 206 | +### Initialize Ray RAG Template |
| 207 | +```bash |
| 208 | +# RAG applications with distributed embedding generation |
| 209 | +feast init -t ray_rag my_rag_project |
| 210 | +cd my_rag_project/feature_repo |
| 211 | +``` |
| 212 | + |
| 213 | +### Deploy to Production |
| 214 | +```bash |
| 215 | +feast apply |
| 216 | +feast materialize --disable-event-timestamp |
| 217 | +python test_workflow.py |
| 218 | +``` |
| 219 | + |
| 220 | +Whether you're building traditional feature pipelines or modern RAG systems, Feast + Ray offers the scalability and performance needed for production workloads. The integration supports everything from local development to large-scale Kubernetes deployments, making it an ideal choice for organizations looking to scale their ML infrastructure. |
| 221 | + |
| 222 | +--- |
| 223 | + |
| 224 | +**Ready to build distributed RAG applications?** Get started with our [Ray RAG template](https://docs.feast.dev/reference/compute-engine/ray) and explore [Feast + Ray documentation](https://docs.feast.dev/reference/offline-stores/ray) for distributed embedding generation. |
| 225 | + |
| 226 | +*Learn more about Feast's distributed processing capabilities and join the community at [feast.dev](https://feast.dev).* |
0 commit comments