Skip to content

Commit a05c43f

Browse files
author
Lukas Valatka
authored
Merge branch 'master' into docs/add-castai-contributor-logo
2 parents 8b3f467 + 90b5086 commit a05c43f

File tree

149 files changed

+4682
-646
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

149 files changed

+4682
-646
lines changed

CHANGELOG.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,33 @@
11
# Changelog
22

3+
# [0.56.0](https://github.com/feast-dev/feast/compare/v0.55.0...v0.56.0) (2025-10-27)
4+
5+
6+
### Bug Fixes
7+
8+
* Add mode field to Transformation proto for proper serialization ([2390d2e](https://github.com/feast-dev/feast/commit/2390d2ea654e299fc74f697212542b755f3b4938))
9+
* Date wise remote offline store historical data retrieval ([#5686](https://github.com/feast-dev/feast/issues/5686)) ([949ba3d](https://github.com/feast-dev/feast/commit/949ba3dae420f82f493018113d1fd6de9e130a56))
10+
* Fix STRING type handling in on-demand feature views ([#5669](https://github.com/feast-dev/feast/issues/5669)) ([dfbb743](https://github.com/feast-dev/feast/commit/dfbb7433f059e6f0d1d4ef6c0ef65b63dac1c1ff))
11+
* Fixed torch install issue in CI ([366e5a8](https://github.com/feast-dev/feast/commit/366e5a8c8f8093eda840b667849c6d2e45fa56bb))
12+
* ODFV not getting counted in resource count ([1d640b6](https://github.com/feast-dev/feast/commit/1d640b6c8136c47a78887e4490a5b7ae677b7c99))
13+
* Skip tag updates if user do not have permissions ([#5673](https://github.com/feast-dev/feast/issues/5673)) ([0a951ce](https://github.com/feast-dev/feast/commit/0a951ce8d7f9b31490fa279339eacd444d2d2434))
14+
15+
16+
### Features
17+
18+
* Add document of Go feature server. ([#5697](https://github.com/feast-dev/feast/issues/5697)) ([cbd1dde](https://github.com/feast-dev/feast/commit/cbd1dde9a0a6e5a3ec7e3520b6613d3818bcd842))
19+
* Add flexible commandArgs support for complete Feast CLI control ([#5678](https://github.com/feast-dev/feast/issues/5678)) ([6414924](https://github.com/feast-dev/feast/commit/64149246c1925e9f3dcac60d9ab629225c232261))
20+
* Add HDFS as a feature registry ([#5655](https://github.com/feast-dev/feast/issues/5655)) ([4c65872](https://github.com/feast-dev/feast/commit/4c65872ee6cf7e14ed14c8a8a7e141126027e575))
21+
* Add nodeSelector to service config ([#5675](https://github.com/feast-dev/feast/issues/5675)) ([9728cde](https://github.com/feast-dev/feast/commit/9728cde4d3cf4d22a790d3a3af2eba705b7a56d3))
22+
* Add OTEL based observability to the Go Feature Server ([#5685](https://github.com/feast-dev/feast/issues/5685)) ([f4afdad](https://github.com/feast-dev/feast/commit/f4afdad27c7fe92e9778e29ad08e4b227a3c17a4))
23+
* Added health endpoint for the UI ([#5665](https://github.com/feast-dev/feast/issues/5665)) ([3aec5d5](https://github.com/feast-dev/feast/commit/3aec5d5fd24540d10f07e79c081c2658ca35678c))
24+
* Added kuberay support ([e0b698d](https://github.com/feast-dev/feast/commit/e0b698d7b8733c8177ca053bc89defb01ebeb538))
25+
* Added support for filtering multi-projects ([#5688](https://github.com/feast-dev/feast/issues/5688)) ([eb0a86e](https://github.com/feast-dev/feast/commit/eb0a86eb81defb5ccb2407a0d5f2b2425bcb61c1))
26+
* Batch Embedding at scale for RAG with Ray ([cc2a46d](https://github.com/feast-dev/feast/commit/cc2a46d54c413ed52a9bf568588dd06096592c1f))
27+
* Optimize SQL entity handling without creating temporary tables ([#5695](https://github.com/feast-dev/feast/issues/5695)) ([aa2c838](https://github.com/feast-dev/feast/commit/aa2c8386253181145c4f314187e0873b96b2be59))
28+
* Support aggregation in odfv ([#5666](https://github.com/feast-dev/feast/issues/5666)) ([564e965](https://github.com/feast-dev/feast/commit/564e9651dabea5458a77a8889920749cb1a6a5ed))
29+
* Support cache_mode for registries ([021e9ea](https://github.com/feast-dev/feast/commit/021e9ea759bfee0292c5f7c804119ed9a15d6a58))
30+
331
# [0.55.0](https://github.com/feast-dev/feast/compare/v0.54.0...v0.55.0) (2025-10-14)
432

533

docs/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,11 @@ Feast helps ML platform/MLOps teams with DevOps experience productionize real-ti
5252

5353
* *For AI Engineers*: Feast provides a platform designed to scale your AI applications by enabling seamless integration of richer data and facilitating fine-tuning. With Feast, you can optimize the performance of your AI models while ensuring a scalable and efficient data pipeline.
5454

55+
56+
![](assets/feast_persona_diagram.png)
57+
58+
59+
5560
## What Feast is not?
5661

5762
### Feast is not
568 KB
Loading

docs/reference/beta-on-demand-feature-view.md

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,40 @@ When defining an ODFV, you can specify the transformation mode using the `mode`
3535

3636
### Singleton Transformations in Native Python Mode
3737

38-
Native Python mode supports transformations on singleton dictionaries by setting `singleton=True`. This allows you to
39-
write transformation functions that operate on a single row at a time, making the code more intuitive and aligning with
38+
Native Python mode supports transformations on singleton dictionaries by setting `singleton=True`. This allows you to
39+
write transformation functions that operate on a single row at a time, making the code more intuitive and aligning with
4040
how data scientists typically think about data transformations.
4141

42+
## Aggregations
43+
44+
On Demand Feature Views support aggregations that compute aggregate statistics over groups of rows. When using aggregations, data is grouped by entity columns (e.g., `driver_id`) and aggregated before being passed to the transformation function.
45+
46+
**Important**: Aggregations and transformations are mutually exclusive. When aggregations are specified, they replace the transformation function.
47+
48+
### Usage
49+
50+
```python
51+
from feast import Aggregation
52+
from datetime import timedelta
53+
54+
@on_demand_feature_view(
55+
sources=[driver_hourly_stats_view],
56+
schema=[
57+
Field(name="total_trips", dtype=Int64),
58+
Field(name="avg_rating", dtype=Float64),
59+
],
60+
aggregations=[
61+
Aggregation(column="trips", function="sum"),
62+
Aggregation(column="rating", function="mean"),
63+
],
64+
)
65+
def driver_aggregated_stats(inputs):
66+
# No transformation function needed when using aggregations
67+
pass
68+
```
69+
70+
Aggregated columns are automatically named using the pattern `{function}_{column}` (e.g., `sum_trips`, `mean_rating`).
71+
4272
## Example
4373
See [https://github.com/feast-dev/on-demand-feature-views-demo](https://github.com/feast-dev/on-demand-feature-views-demo) for an example on how to use on demand feature views.
4474

docs/reference/compute-engine/ray.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -62,11 +62,22 @@ batch_engine:
6262
| `max_parallelism_multiplier` | int | 2 | Parallelism as multiple of CPU cores |
6363
| `target_partition_size_mb` | int | 64 | Target partition size (MB) |
6464
| `window_size_for_joins` | string | "1H" | Time window for distributed joins |
65-
| `ray_address` | string | None | Ray cluster address (None = local Ray) |
65+
| `ray_address` | string | None | Ray cluster address (triggers REMOTE mode) |
66+
| `use_kuberay` | boolean | None | Enable KubeRay mode (overrides ray_address) |
67+
| `kuberay_conf` | dict | None | **KubeRay configuration dict** with keys: `cluster_name` (required), `namespace` (default: "default"), `auth_token`, `auth_server`, `skip_tls` (default: false) |
68+
| `enable_ray_logging` | boolean | false | Enable Ray progress bars and logging |
6669
| `enable_distributed_joins` | boolean | true | Enable distributed joins for large datasets |
6770
| `staging_location` | string | None | Remote path for batch materialization jobs |
68-
| `ray_conf` | dict | None | Ray configuration parameters |
69-
| `execution_timeout_seconds` | int | None | Timeout for job execution in seconds |
71+
| `ray_conf` | dict | None | Ray configuration parameters (memory, CPU limits) |
72+
73+
### Mode Detection Precedence
74+
75+
The Ray compute engine automatically detects the execution mode:
76+
77+
1. **Environment Variables** → KubeRay mode (if `FEAST_RAY_USE_KUBERAY=true`)
78+
2. **Config `kuberay_conf`** → KubeRay mode
79+
3. **Config `ray_address`** → Remote mode
80+
4. **Default** → Local mode
7081

7182
## Usage Examples
7283

docs/reference/data-sources/trino.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ from feast.infra.offline_stores.contrib.trino_offline_store.trino_source import
2020
)
2121

2222
driver_hourly_stats = TrinoSource(
23-
event_timestamp_column="event_timestamp",
24-
table_ref="feast.driver_stats",
23+
timestamp_field="event_timestamp",
24+
table="feast.driver_stats",
2525
created_timestamp_column="created",
2626
)
2727
```
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Go feature server (Alpha)
2+
3+
## Overview
4+
The Go feature server is an HTTP/gRPC endpoint that serves features. It is written in Go.
5+
6+
## Configuration of `feature_store.yaml`
7+
The current Go feature server needs a Python based feature Transformation service support. Please refer to the following code as an example:
8+
```
9+
# -*- coding: utf-8 -*-
10+
from feast.feature_store import FeatureStore
11+
12+
13+
def main():
14+
# Init the Feature Store
15+
store = FeatureStore(repo_path="./feature_repo/")
16+
17+
# Start the feature transformation server
18+
# default port is 6569
19+
store.serve_transformations(6569)
20+
21+
if __name__ == "__main__":
22+
main()
23+
```
24+
At the same time, we need to configure the `feature_store.yaml` as following:
25+
26+
```
27+
...
28+
entity_key_serialization_version: 3
29+
feature_server:
30+
type: local
31+
transformation_service_endpoint: "localhost:6569"
32+
...
33+
```
34+
## Supported APIs
35+
Here is the list of supported APIs:
36+
| Method | API | Comment |
37+
|:---: | :---: | :---: |
38+
| POST | /get-online-features | Retrieve features of one or many entities |
39+
| GET | /health | Status of the Go Feature Server |
40+
41+
## OTEL based Observability
42+
The Go feature server support [OTEL](https://opentelemetry.io/) based Observabilities.
43+
To enable it, we need to set the global env `ENABLE_OTEL_TRACING` to `"true"` (as a string type!) in the container or your local OS.
44+
```
45+
export ENABLE_OTEL_TRACING='true'
46+
```
47+
There are example OTEL infra setup under the `/go/infra/docker/otel` folder.
48+
49+
## Demo
50+
Please check the Reference[2] for a local demo of Go feature server. If you want to see a real world example of applying Go feature server in Production, please check Reference[1].
51+
52+
## Reference
53+
1. [Expedia Group's Go Feature Server Implementation (in Production)](https://github.com/EXPEbdodla/feast)
54+
2. [A Go Feature server demo from Feast](https://github.com/feast-dev/feast-credit-score-local-tutorial)

docs/reference/offline-stores/ray.md

Lines changed: 119 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,11 @@ The Ray offline store is a data I/O implementation that leverages [Ray](https://
99

1010
The Ray offline store provides:
1111
- Ray-based data reading from file sources (Parquet, CSV, etc.)
12-
- Support for both local and distributed Ray clusters
12+
- Support for local, remote, and KubeRay (Kubernetes-managed) clusters
1313
- Integration with various storage backends (local files, S3, GCS, HDFS)
1414
- Efficient data filtering and column selection
1515
- Timestamp-based data processing with timezone awareness
16+
- Enterprise-ready KubeRay cluster support via CodeFlare SDK
1617

1718

1819
## Functionality Matrix
@@ -59,9 +60,15 @@ For complex feature processing, historical feature retrieval, and distributed jo
5960

6061
## Configuration
6162

62-
The Ray offline store can be configured in your `feature_store.yaml` file. Below are two main configuration patterns:
63+
The Ray offline store can be configured in your `feature_store.yaml` file. It supports **three execution modes**:
6364

64-
### Basic Ray Offline Store
65+
1. **LOCAL**: Ray runs locally on the same machine (default)
66+
2. **REMOTE**: Connects to a remote Ray cluster via `ray_address`
67+
3. **KUBERAY**: Connects to Ray clusters on Kubernetes via CodeFlare SDK
68+
69+
### Execution Modes
70+
71+
#### Local Mode (Default)
6572

6673
For simple data I/O operations without distributed processing:
6774

@@ -72,7 +79,44 @@ provider: local
7279
offline_store:
7380
type: ray
7481
storage_path: data/ray_storage # Optional: Path for storing datasets
75-
ray_address: localhost:10001 # Optional: Ray cluster address
82+
```
83+
84+
#### Remote Ray Cluster
85+
86+
Connect to an existing Ray cluster:
87+
88+
```yaml
89+
offline_store:
90+
type: ray
91+
storage_path: s3://my-bucket/feast-data
92+
ray_address: "ray://my-cluster.example.com:10001"
93+
```
94+
95+
#### KubeRay Cluster (Kubernetes)
96+
97+
Connect to Ray clusters on Kubernetes using CodeFlare SDK:
98+
99+
```yaml
100+
offline_store:
101+
type: ray
102+
storage_path: s3://my-bucket/feast-data
103+
use_kuberay: true
104+
kuberay_conf:
105+
cluster_name: "feast-ray-cluster"
106+
namespace: "feast-system"
107+
auth_token: "${RAY_AUTH_TOKEN}"
108+
auth_server: "https://api.openshift.com:6443"
109+
skip_tls: false
110+
enable_ray_logging: false
111+
```
112+
113+
**Environment Variables** (alternative to config file):
114+
```bash
115+
export FEAST_RAY_USE_KUBERAY=true
116+
export FEAST_RAY_CLUSTER_NAME=feast-ray-cluster
117+
export FEAST_RAY_AUTH_TOKEN=your-token
118+
export FEAST_RAY_AUTH_SERVER=https://api.openshift.com:6443
119+
export FEAST_RAY_NAMESPACE=feast-system
76120
```
77121

78122
### Ray Offline Store + Compute Engine
@@ -175,8 +219,29 @@ batch_engine:
175219
|--------|------|---------|-------------|
176220
| `type` | string | Required | Must be `feast.offline_stores.contrib.ray_offline_store.ray.RayOfflineStore` or `ray` |
177221
| `storage_path` | string | None | Path for storing temporary files and datasets |
178-
| `ray_address` | string | None | Address of the Ray cluster (e.g., "localhost:10001") |
222+
| `ray_address` | string | None | Ray cluster address (triggers REMOTE mode, e.g., "ray://host:10001") |
223+
| `use_kuberay` | boolean | None | Enable KubeRay mode (overrides ray_address) |
224+
| `kuberay_conf` | dict | None | **KubeRay configuration dict** with keys: `cluster_name` (required), `namespace` (default: "default"), `auth_token`, `auth_server`, `skip_tls` (default: false) |
225+
| `enable_ray_logging` | boolean | false | Enable Ray progress bars and verbose logging |
179226
| `ray_conf` | dict | None | Ray initialization parameters for resource management (e.g., memory, CPU limits) |
227+
| `broadcast_join_threshold_mb` | int | 100 | Size threshold for broadcast joins (MB) |
228+
| `enable_distributed_joins` | boolean | true | Enable distributed joins for large datasets |
229+
| `max_parallelism_multiplier` | int | 2 | Parallelism as multiple of CPU cores |
230+
| `target_partition_size_mb` | int | 64 | Target partition size (MB) |
231+
| `window_size_for_joins` | string | "1H" | Time window for distributed joins |
232+
233+
#### Mode Detection Precedence
234+
235+
The Ray offline store automatically detects the execution mode using the following precedence:
236+
237+
1. **Environment Variables** (highest priority)
238+
- `FEAST_RAY_USE_KUBERAY`, `FEAST_RAY_CLUSTER_NAME`, etc.
239+
2. **Config `kuberay_conf`**
240+
- If present → KubeRay mode
241+
3. **Config `ray_address`**
242+
- If present → Remote mode
243+
4. **Default**
244+
- Local mode (lowest priority)
180245

181246
#### Ray Compute Engine Options
182247

@@ -385,6 +450,8 @@ job.persist(hdfs_storage, allow_overwrite=True)
385450

386451
### Using Ray Cluster
387452

453+
#### Standard Ray Cluster
454+
388455
To use Ray in cluster mode for distributed data access:
389456

390457
1. Start a Ray cluster:
@@ -406,6 +473,53 @@ offline_store:
406473
ray start --address='head-node-ip:10001'
407474
```
408475

476+
#### KubeRay Cluster (Kubernetes)
477+
478+
To use Feast with Ray clusters on Kubernetes via CodeFlare SDK:
479+
480+
**Prerequisites:**
481+
- KubeRay cluster deployed on Kubernetes
482+
- CodeFlare SDK installed: `pip install codeflare-sdk`
483+
- Access credentials for the Kubernetes cluster
484+
485+
**Configuration:**
486+
487+
1. Using configuration file:
488+
```yaml
489+
offline_store:
490+
type: ray
491+
use_kuberay: true
492+
storage_path: s3://my-bucket/feast-data
493+
kuberay_conf:
494+
cluster_name: "feast-ray-cluster"
495+
namespace: "feast-system"
496+
auth_token: "${RAY_AUTH_TOKEN}"
497+
auth_server: "https://api.openshift.com:6443"
498+
skip_tls: false
499+
enable_ray_logging: false
500+
```
501+
502+
2. Using environment variables:
503+
```bash
504+
export FEAST_RAY_USE_KUBERAY=true
505+
export FEAST_RAY_CLUSTER_NAME=feast-ray-cluster
506+
export FEAST_RAY_AUTH_TOKEN=your-k8s-token
507+
export FEAST_RAY_AUTH_SERVER=https://api.openshift.com:6443
508+
export FEAST_RAY_NAMESPACE=feast-system
509+
export FEAST_RAY_SKIP_TLS=false
510+
511+
# Then use standard Feast code
512+
python your_feast_script.py
513+
```
514+
515+
**Features:**
516+
- The CodeFlare SDK handles cluster connection and authentication
517+
- Automatic TLS certificate management
518+
- Authentication with Kubernetes clusters
519+
- Namespace isolation
520+
- Secure communication between client and Ray cluster
521+
- Automatic cluster discovery
522+
409523
### Data Source Validation
410524

411525
The Ray offline store validates data sources to ensure compatibility:

0 commit comments

Comments
 (0)