-
Notifications
You must be signed in to change notification settings - Fork 1.2k
chore: Update docs for offline and online stores #2946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Update docs for offline and online stores #2946
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2946 +/- ##
==========================================
- Coverage 77.65% 77.61% -0.05%
==========================================
Files 186 186
Lines 16354 16354
==========================================
- Hits 12700 12693 -7
- Misses 3654 3661 +7
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
a194832
to
36fb89c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might call out that this method is optional since it is only used to work with SavedDatasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you commented on the wrong method , you mean write_logged_features right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope. pull_all_from_table_or_query right now in our logic is only exposed for SavedDatasets afaict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk if I would use this as an example. It's a really large PR with also online store + registry store components
Would be better if we created some example PR (similar to what is in this guide)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I can link the spark pr instead. I think the example guide for the feast custom offline store is fine for high level but I dont' see a clear way of implementing some dummy procedures with. more clarity than the feast custom offline store. Actual implementations to reference would actually then be actual real world pr would help them work out the little kinks in implementation. I agree that the postgres one is a little bit too large but I think the spark one is relevant and useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you convert the dummy example in the guide into an actual PR so they can see?
You can also just point users to the directory that has the offline / online store implementations too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users who want to have their offline store support batch materialization (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to write distribute the reading and writing of offline store records to a distributed framework. If this functionality is not needed, the RetrievalJob will default to local materialization. | |
Users who want to have their offline store support scalable batch materialization for online use cases (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to write distribute the reading and writing of offline store records to a distributed framework. If this is not implemented, Feast will default to local materialization (pulling all records in memory to materialize). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixe..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be more clear if you left this as a comment in the example feature_store.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alpha functionality tags is ambiguous imo.
Maybe say our standard is to print messages stating it's alpha status? This is also where a sample simple PR we provide is useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "add documentation" instead of update.
There's also additional work around adding python code docs (i.e. call make build-sphinx
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again would not use this as an example because it's too large and hard to parse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might additionally want to cover what the data model is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
go.sum
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unintentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed after rebase.
36fb89c
to
eb028f6
Compare
@adchia pTal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users who want to have their offline store support scalable batch materialization for online use cases (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to distribute the reading and writing of offline store records to a distributed framework. If this is not implemented, Feast will default to local materialization (pulling all records into memory to materialize). | |
Users who want to have their offline store support scalable batch materialization for online use cases (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to distribute the reading and writing of offline store records to blob storage (such as S3). This may be used by a custom [Materialization Engine](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/materialization/batch_materialization_engine.py#L72) to parallelize the materialization of data by processing it in chunks. If this is not implemented, Feast will default to local materialization (pulling all records into memory to materialize). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to test against the test suite, you need to create a custom `DataSourceCreator`. This class will need to implement our testing infrastructure methods, `create_data_source` and `created_saved_dataset_destination`. `create_data_source` creates a datasource for testing from the dataframe given that will register the dataframe with your offline store and return a datasource pointing to that location. See `BigQueryDataSourceCreator` for an implementation of a datawarehouse data source creator. **Saved datasets** are special datasets used for data validation and is Feast's way of snapshotting your data for future data validation. | |
In order to test against the test suite, you need to create a custom `DataSourceCreator`. This class will need to implement our testing infrastructure methods, `create_data_source` and `created_saved_dataset_destination`. `create_data_source` should create a datasource forbased on the dataframe passed in. It may be implemented by uploading the contents of the dataframe into the offline store and returning a datasource object pointing to that location. See `BigQueryDataSourceCreator` for an implementation of a data source creator. **Saved datasets** are special datasets used for data validation and is Feast's way of snapshotting your data for future data validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should run the unit tests and the unit tests should all pass. Please add unit tests for your data source that test out basic functionality of reading and writing to and from the datasource. This should just be class level functionality that ensures that the methods you implemented for the OfflineStore and the DataSource associated with it work as expected. In order to be approved to merge into Feast, these unit tests should all pass and demonstrate that the DataSource works as intended. | |
This command runs the python unit tests. It's required that unit tests should all pass for contributed components. | |
Contributors should add unit tests for contributed data source that test out basic functionality of reading and writing to and from the datasource. This should just be class level functionality that ensures that the methods you implemented for the OfflineStore and the DataSource associated with it work as expected. In order to be approved to merge into Feast, these unit tests should all pass and demonstrate that the DataSource works as intended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test out basic functionality of reading and writing to and from the datasource
This actually is not clear at all. Do you mean the test should write from say bigquery? How is that a unit test then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I guess most datasources will not have unit tests then. I guess as long as the offline source passes the integration tests, we can trust the store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary right? i.e. the docs are generated automatically when changes are merged, we just need to make sure they are referenced sdk/python/docs/index.rst
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok let me update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same change and questions as the offline store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this section entirely. Will basically point to universal tests only to validate the datasource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would follow the pattern of the other code snippets, which also include a "file title" that shows where this lives
See the example of using:
{% code title="feast_custom_offline_store/file.py" %}
... your code
{% endcode %}
that we did before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took a pass through the offline store only, but I imagine most of the comments also translate to the online store version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you didn't add the make command here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Achal mentioned that that is no longer needed and to just update the rst file instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird that you only have "write_logged_features" as optional when really pretty much everything is optional except for get_historical_features
Would have the "optional" methods towards the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just treated "optional" as any methods that are labeled abstract even though they have a "pass" I still think the decorator necessitates implementation. I added optional to offline_write batch though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generally pretty hard to read imo. Can you make this simpler and more scannable?
e.g.
Contrib offline stores
New offline stores go in sdk/python/feast/infra/offline_stores/contrib/
.
What is a contrib plugin?
- Not guaranteed to implement all interface methods
- Not guaranteed to be stable
- Should have warnings for users to indicate this is contrib
How do I make a contrib plugin an "official" plugin?
To move an offline store plugin out of contrib, you need:
- GitHub actions (i.e.
make test-python-integration
) is setup to run all tests against the offline store and pass. - At least two contributors own the plugin (ideally tracked in an OWNERS / CODEOWNERS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to contradict the text you have below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is offline in brackets? shouldn't it be <offline_store> ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove Finally since this isn't the last step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, would remove the "if your offline store requires special packages" part. Every offline store will need to pull in some deps to support a new offline store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Convert this to individual steps they can follow instead of many sentences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, comments mostly translate to the online store guide too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### How do I make a contrib plugin and "official" plugin? | |
#### How do I make a contrib plugin an "official" plugin? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Github actions (i.e `make test-python-integration`) is setup to run all tests against the offline store and pass. | |
- GitHub actions (i.e `make test-python-integration`) is setup to run all tests against the offline store and pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- At least two contributors own the plugin (ideally tracked in our OWNERS / CODEOWNERS file). | |
- At least two contributors own the plugin (ideally tracked in our `OWNERS` / `CODEOWNERS` file). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of confusing to me. I don't think users should try to "run the integration tests" since that is for official plugins.
It's useful context to understand the second bullet though. Would rephrase that Feast parametrizes integration tests using the FULL_REPO_CONFIGS variable, which you overwrite to enable testing for the contrib offline store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase against master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
fa0e55d
to
977221f
Compare
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adchia, kevjumba The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Kevin Zhang [email protected]
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #