Skip to content

Commit eb51f00

Browse files
feat: Enable ingestion without event timestamp (#5625)
1 parent 76590bf commit eb51f00

File tree

18 files changed

+436
-179
lines changed

18 files changed

+436
-179
lines changed

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,11 +109,26 @@ print(training_df.head())
109109
```
110110

111111
### 6. Load feature values into your online store
112+
113+
**Option 1: Incremental materialization (recommended)**
112114
```commandline
113115
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
114116
feast materialize-incremental $CURRENT_TIME
115117
```
116118

119+
**Option 2: Full materialization with timestamps**
120+
```commandline
121+
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
122+
feast materialize 2021-04-12T00:00:00 $CURRENT_TIME
123+
```
124+
125+
**Option 3: Simple materialization without timestamps**
126+
```commandline
127+
feast materialize --disable-event-timestamp
128+
```
129+
130+
The `--disable-event-timestamp` flag allows you to materialize all available feature data using the current datetime as the event timestamp, without needing to specify start and end timestamps. This is useful when your source data lacks proper event timestamp columns.
131+
117132
```commandline
118133
Materializing feature view driver_hourly_stats from 2021-04-14 to 2021-04-15 done!
119134
```
Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
1-
# Online store
2-
3-
Feast uses online stores to serve features at low latency.
4-
Feature values are loaded from data sources into the online store through _materialization_, which can be triggered through the `materialize` command.
5-
6-
The storage schema of features within the online store mirrors that of the original data source.
7-
One key difference is that for each [entity key](../concepts/entity.md), only the latest feature values are stored.
8-
No historical values are stored.
9-
10-
Here is an example batch data source:
11-
12-
![](../../.gitbook/assets/image%20%286%29.png)
13-
14-
Once the above data source is materialized into Feast (using `feast materialize`), the feature values will be stored as follows:
15-
16-
![](../../.gitbook/assets/image%20%285%29.png)
17-
1+
# Online store
2+
3+
Feast uses online stores to serve features at low latency.
4+
Feature values are loaded from data sources into the online store through _materialization_, which can be triggered through the `materialize` command (either with specific timestamps or using `--disable-event-timestamp` to materialize all data with current timestamps).
5+
6+
The storage schema of features within the online store mirrors that of the original data source.
7+
One key difference is that for each [entity key](../concepts/entity.md), only the latest feature values are stored.
8+
No historical values are stored.
9+
10+
Here is an example batch data source:
11+
12+
![](../../.gitbook/assets/image%20%286%29.png)
13+
14+
Once the above data source is materialized into Feast (using `feast materialize` with timestamps or `feast materialize --disable-event-timestamp`), the feature values will be stored as follows:
15+
16+
![](../../.gitbook/assets/image%20%285%29.png)
17+
1818
Features can also be written directly to the online store via [push sources](../../reference/data-sources/push.md) .

docs/getting-started/components/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
* **Create Batch Features:** ELT/ETL systems like Spark and SQL are used to transform data in the batch store.
88
* **Create Stream Features:** Stream features are created from streaming services such as Kafka or Kinesis, and can be pushed directly into Feast via the [Push API](../../reference/data-sources/push.md).
99
* **Feast Apply:** The user (or CI) publishes versioned controlled feature definitions using `feast apply`. This CLI command updates infrastructure and persists definitions in the object store registry.
10-
* **Feast Materialize:** The user (or scheduler) executes `feast materialize` which loads features from the offline store into the online store.
10+
* **Feast Materialize:** The user (or scheduler) executes `feast materialize` (with timestamps or `--disable-event-timestamp` to materialize all data with current timestamps) which loads features from the offline store into the online store.
1111
* **Model Training:** A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset that can be used for training models.
1212
* **Get Historical Features:** Feast exports a point-in-time correct training dataset based on the list of features and entity dataframe provided by the model training pipeline.
1313
* **Deploy Model:** The trained model binary (and list of features) are deployed into a model serving system. This step is not executed by Feast.

docs/getting-started/concepts/data-ingestion.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,11 +64,17 @@ materialize_python = PythonOperator(
6464

6565
#### How to run this in the CLI
6666

67+
**With timestamps:**
6768
```bash
6869
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
6970
feast materialize-incremental $CURRENT_TIME
7071
```
7172

73+
**Simple materialization (for data without event timestamps):**
74+
```bash
75+
feast materialize --disable-event-timestamp
76+
```
77+
7278
#### How to run this on Airflow
7379

7480
```python

docs/getting-started/quickstart.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -499,13 +499,19 @@ print(training_df.head())
499499
We now serialize the latest values of features since the beginning of time to prepare for serving. Note, `materialize_incremental` serializes all new features since the last `materialize` call, or since the time provided minus the `ttl` timedelta. In this case, this will be `CURRENT_TIME - 1 day` (`ttl` was set on the `FeatureView` instances in [feature_repo/feature_repo/example_repo.py](feature_repo/feature_repo/example_repo.py)).
500500

501501
{% tabs %}
502-
{% tab title="Bash" %}
502+
{% tab title="Bash (with timestamp)" %}
503503
```bash
504504
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
505505

506506
feast materialize-incremental $CURRENT_TIME
507507
```
508508
{% endtab %}
509+
{% tab title="Bash (simple)" %}
510+
```bash
511+
# Alternative: Materialize all data using current timestamp (for data without event timestamps)
512+
feast materialize --disable-event-timestamp
513+
```
514+
{% endtab %}
509515
{% endtabs %}
510516

511517
{% tabs %}

docs/reference/feast-cli-commands.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,18 +152,30 @@ feast init -t gcp my_feature_repo
152152

153153
## Materialize
154154

155-
Load data from feature views into the online store between two dates
155+
Load data from feature views into the online store.
156156

157+
**With timestamps:**
157158
```bash
158159
feast materialize 2020-01-01T00:00:00 2022-01-01T00:00:00
159160
```
160161

161-
Load data for specific feature views into the online store between two dates
162+
**Without timestamps (uses current datetime):**
163+
```bash
164+
feast materialize --disable-event-timestamp
165+
```
166+
167+
Load data for specific feature views:
162168

163169
```text
164170
feast materialize -v driver_hourly_stats 2020-01-01T00:00:00 2022-01-01T00:00:00
165171
```
166172

173+
```text
174+
feast materialize --disable-event-timestamp -v driver_hourly_stats
175+
```
176+
177+
The `--disable-event-timestamp` flag is useful when your source data lacks event timestamp columns, allowing you to materialize all available data using the current datetime as the event timestamp.
178+
167179
```text
168180
Materializing 1 feature views from 2020-01-01 to 2022-01-01
169181

docs/reference/feature-servers/python-feature-server.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,52 @@ requests.post(
200200
data=json.dumps(push_data))
201201
```
202202

203+
### Materializing features
204+
205+
The Python feature server also exposes an endpoint for materializing features from the offline store to the online store.
206+
207+
**Standard materialization with timestamps:**
208+
```bash
209+
curl -X POST "http://localhost:6566/materialize" -d '{
210+
"start_ts": "2021-01-01T00:00:00",
211+
"end_ts": "2021-01-02T00:00:00",
212+
"feature_views": ["driver_hourly_stats"]
213+
}' | jq
214+
```
215+
216+
**Materialize all data without event timestamps:**
217+
```bash
218+
curl -X POST "http://localhost:6566/materialize" -d '{
219+
"feature_views": ["driver_hourly_stats"],
220+
"disable_event_timestamp": true
221+
}' | jq
222+
```
223+
224+
When `disable_event_timestamp` is set to `true`, the `start_ts` and `end_ts` parameters are not required, and all available data is materialized using the current datetime as the event timestamp. This is useful when your source data lacks proper event timestamp columns.
225+
226+
Or from Python:
227+
```python
228+
import json
229+
import requests
230+
231+
# Standard materialization
232+
materialize_data = {
233+
"start_ts": "2021-01-01T00:00:00",
234+
"end_ts": "2021-01-02T00:00:00",
235+
"feature_views": ["driver_hourly_stats"]
236+
}
237+
238+
# Materialize without event timestamps
239+
materialize_data_no_timestamps = {
240+
"feature_views": ["driver_hourly_stats"],
241+
"disable_event_timestamp": True
242+
}
243+
244+
requests.post(
245+
"http://localhost:6566/materialize",
246+
data=json.dumps(materialize_data))
247+
```
248+
203249
## Starting the feature server in TLS(SSL) mode
204250

205251
Enabling TLS mode ensures that data between the Feast client and server is transmitted securely. For an ideal production environment, it is recommended to start the feature server in TLS mode.

infra/templates/README.md.jinja2

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,11 +107,26 @@ print(training_df.head())
107107
```
108108

109109
### 6. Load feature values into your online store
110+
111+
**Option 1: Incremental materialization (recommended)**
110112
```commandline
111113
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
112114
feast materialize-incremental $CURRENT_TIME
113115
```
114116

117+
**Option 2: Full materialization with timestamps**
118+
```commandline
119+
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
120+
feast materialize 2021-04-12T00:00:00 $CURRENT_TIME
121+
```
122+
123+
**Option 3: Simple materialization without timestamps**
124+
```commandline
125+
feast materialize --disable-event-timestamp
126+
```
127+
128+
The `--disable-event-timestamp` flag allows you to materialize all available feature data using the current datetime as the event timestamp, without needing to specify start and end timestamps. This is useful when your source data lacks proper event timestamp columns.
129+
115130
```commandline
116131
Materializing feature view driver_hourly_stats from 2021-04-14 to 2021-04-15 done!
117132
```

sdk/python/feast/cli/cli.py

Lines changed: 36 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -303,17 +303,26 @@ def registry_dump_command(ctx: click.Context):
303303

304304

305305
@cli.command("materialize")
306-
@click.argument("start_ts")
307-
@click.argument("end_ts")
306+
@click.argument("start_ts", required=False)
307+
@click.argument("end_ts", required=False)
308308
@click.option(
309309
"--views",
310310
"-v",
311311
help="Feature views to materialize",
312312
multiple=True,
313313
)
314+
@click.option(
315+
"--disable-event-timestamp",
316+
is_flag=True,
317+
help="Materialize all available data using current datetime as event timestamp (useful when source data lacks event timestamps)",
318+
)
314319
@click.pass_context
315320
def materialize_command(
316-
ctx: click.Context, start_ts: str, end_ts: str, views: List[str]
321+
ctx: click.Context,
322+
start_ts: Optional[str],
323+
end_ts: Optional[str],
324+
views: List[str],
325+
disable_event_timestamp: bool,
317326
):
318327
"""
319328
Run a (non-incremental) materialization job to ingest data into the online store. Feast
@@ -322,13 +331,35 @@ def materialize_command(
322331
Views will be materialized.
323332
324333
START_TS and END_TS should be in ISO 8601 format, e.g. '2021-07-16T19:20:01'
334+
335+
If --disable-event-timestamp is used, timestamps are not required and all available data will be materialized using the current datetime as the event timestamp.
325336
"""
326337
store = create_feature_store(ctx)
327338

339+
if disable_event_timestamp:
340+
if start_ts or end_ts:
341+
raise click.UsageError(
342+
"Cannot specify START_TS or END_TS when --disable-event-timestamp is used"
343+
)
344+
now = datetime.now()
345+
# Query all available data and use current datetime as event timestamp
346+
start_date = datetime(
347+
1970, 1, 1
348+
) # Beginning of time to capture all historical data
349+
end_date = now
350+
else:
351+
if not start_ts or not end_ts:
352+
raise click.UsageError(
353+
"START_TS and END_TS are required unless --disable-event-timestamp is used"
354+
)
355+
start_date = utils.make_tzaware(parser.parse(start_ts))
356+
end_date = utils.make_tzaware(parser.parse(end_ts))
357+
328358
store.materialize(
329359
feature_views=None if not views else views,
330-
start_date=utils.make_tzaware(parser.parse(start_ts)),
331-
end_date=utils.make_tzaware(parser.parse(end_ts)),
360+
start_date=start_date,
361+
end_date=end_date,
362+
disable_event_timestamp=disable_event_timestamp,
332363
)
333364

334365

sdk/python/feast/feature_server.py

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import time
66
import traceback
77
from contextlib import asynccontextmanager
8+
from datetime import datetime
89
from importlib import resources as importlib_resources
910
from typing import Any, Dict, List, Optional, Union
1011

@@ -73,9 +74,10 @@ class PushFeaturesRequest(BaseModel):
7374

7475

7576
class MaterializeRequest(BaseModel):
76-
start_ts: str
77-
end_ts: str
77+
start_ts: Optional[str] = None
78+
end_ts: Optional[str] = None
7879
feature_views: Optional[List[str]] = None
80+
disable_event_timestamp: bool = False
7981

8082

8183
class MaterializeIncrementalRequest(BaseModel):
@@ -432,10 +434,27 @@ def materialize(request: MaterializeRequest) -> None:
432434
resource=_get_feast_object(feature_view, True),
433435
actions=[AuthzedAction.WRITE_ONLINE],
434436
)
437+
438+
if request.disable_event_timestamp:
439+
# Query all available data and use current datetime as event timestamp
440+
now = datetime.now()
441+
start_date = datetime(
442+
1970, 1, 1
443+
) # Beginning of time to capture all historical data
444+
end_date = now
445+
else:
446+
if not request.start_ts or not request.end_ts:
447+
raise ValueError(
448+
"start_ts and end_ts are required when disable_event_timestamp is False"
449+
)
450+
start_date = utils.make_tzaware(parser.parse(request.start_ts))
451+
end_date = utils.make_tzaware(parser.parse(request.end_ts))
452+
435453
store.materialize(
436-
utils.make_tzaware(parser.parse(request.start_ts)),
437-
utils.make_tzaware(parser.parse(request.end_ts)),
454+
start_date,
455+
end_date,
438456
request.feature_views,
457+
disable_event_timestamp=request.disable_event_timestamp,
439458
)
440459

441460
@app.post("/materialize-incremental", dependencies=[Depends(inject_user_details)])

0 commit comments

Comments
 (0)