Skip to content

Conversation

kevjumba
Copy link
Collaborator

Signed-off-by: Kevin Zhang [email protected]

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

@codecov-commenter
Copy link

codecov-commenter commented Jul 15, 2022

Codecov Report

Merging #2946 (5bc74db) into master (aa2a86a) will decrease coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2946      +/-   ##
==========================================
- Coverage   77.65%   77.61%   -0.05%     
==========================================
  Files         186      186              
  Lines       16354    16354              
==========================================
- Hits        12700    12693       -7     
- Misses       3654     3661       +7     
Flag Coverage Δ
integrationtests 68.58% <ø> (-0.14%) ⬇️
unittests 58.51% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/tests/utils/online_read_write_test.py 93.54% <0.00%> (-6.46%) ⬇️
.../integration/online_store/test_online_retrieval.py 96.84% <0.00%> (-3.16%) ⬇️
sdk/python/feast/infra/online_stores/datastore.py 87.96% <0.00%> (-1.39%) ⬇️
...east/infra/materialization/lambda/lambda_engine.py 93.75% <0.00%> (+1.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa2a86a...5bc74db. Read the comment docs.

@kevjumba kevjumba force-pushed the update_offline_store_docs branch from a194832 to 36fb89c Compare July 19, 2022 23:01
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might call out that this method is optional since it is only used to work with SavedDatasets

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you commented on the wrong method , you mean write_logged_features right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. pull_all_from_table_or_query right now in our logic is only exposed for SavedDatasets afaict

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if I would use this as an example. It's a really large PR with also online store + registry store components

Would be better if we created some example PR (similar to what is in this guide)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I can link the spark pr instead. I think the example guide for the feast custom offline store is fine for high level but I dont' see a clear way of implementing some dummy procedures with. more clarity than the feast custom offline store. Actual implementations to reference would actually then be actual real world pr would help them work out the little kinks in implementation. I agree that the postgres one is a little bit too large but I think the spark one is relevant and useful

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you convert the dummy example in the guide into an actual PR so they can see?

You can also just point users to the directory that has the offline / online store implementations too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users who want to have their offline store support batch materialization (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to write distribute the reading and writing of offline store records to a distributed framework. If this functionality is not needed, the RetrievalJob will default to local materialization.
Users who want to have their offline store support scalable batch materialization for online use cases (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to write distribute the reading and writing of offline store records to a distributed framework. If this is not implemented, Feast will default to local materialization (pulling all records in memory to materialize).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixe..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more clear if you left this as a comment in the example feature_store.yaml

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alpha functionality tags is ambiguous imo.

Maybe say our standard is to print messages stating it's alpha status? This is also where a sample simple PR we provide is useful

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "add documentation" instead of update.

There's also additional work around adding python code docs (i.e. call make build-sphinx)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again would not use this as an example because it's too large and hard to parse

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might additionally want to cover what the data model is

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

go.sum Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unintentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed after rebase.

@adchia adchia self-assigned this Jul 20, 2022
@kevjumba kevjumba force-pushed the update_offline_store_docs branch from 36fb89c to eb028f6 Compare July 20, 2022 17:46
@kevjumba
Copy link
Collaborator Author

@adchia pTal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users who want to have their offline store support scalable batch materialization for online use cases (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to distribute the reading and writing of offline store records to a distributed framework. If this is not implemented, Feast will default to local materialization (pulling all records into memory to materialize).
Users who want to have their offline store support scalable batch materialization for online use cases (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to distribute the reading and writing of offline store records to blob storage (such as S3). This may be used by a custom [Materialization Engine](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/materialization/batch_materialization_engine.py#L72) to parallelize the materialization of data by processing it in chunks. If this is not implemented, Feast will default to local materialization (pulling all records into memory to materialize).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to test against the test suite, you need to create a custom `DataSourceCreator`. This class will need to implement our testing infrastructure methods, `create_data_source` and `created_saved_dataset_destination`. `create_data_source` creates a datasource for testing from the dataframe given that will register the dataframe with your offline store and return a datasource pointing to that location. See `BigQueryDataSourceCreator` for an implementation of a datawarehouse data source creator. **Saved datasets** are special datasets used for data validation and is Feast's way of snapshotting your data for future data validation.
In order to test against the test suite, you need to create a custom `DataSourceCreator`. This class will need to implement our testing infrastructure methods, `create_data_source` and `created_saved_dataset_destination`. `create_data_source` should create a datasource forbased on the dataframe passed in. It may be implemented by uploading the contents of the dataframe into the offline store and returning a datasource object pointing to that location. See `BigQueryDataSourceCreator` for an implementation of a data source creator. **Saved datasets** are special datasets used for data validation and is Feast's way of snapshotting your data for future data validation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This should run the unit tests and the unit tests should all pass. Please add unit tests for your data source that test out basic functionality of reading and writing to and from the datasource. This should just be class level functionality that ensures that the methods you implemented for the OfflineStore and the DataSource associated with it work as expected. In order to be approved to merge into Feast, these unit tests should all pass and demonstrate that the DataSource works as intended.
This command runs the python unit tests. It's required that unit tests should all pass for contributed components.
Contributors should add unit tests for contributed data source that test out basic functionality of reading and writing to and from the datasource. This should just be class level functionality that ensures that the methods you implemented for the OfflineStore and the DataSource associated with it work as expected. In order to be approved to merge into Feast, these unit tests should all pass and demonstrate that the DataSource works as intended.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test out basic functionality of reading and writing to and from the datasource

This actually is not clear at all. Do you mean the test should write from say bigquery? How is that a unit test then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I guess most datasources will not have unit tests then. I guess as long as the offline source passes the integration tests, we can trust the store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary right? i.e. the docs are generated automatically when changes are merged, we just need to make sure they are referenced sdk/python/docs/index.rst.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let me update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same change and questions as the offline store.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this section entirely. Will basically point to universal tests only to validate the datasource

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would follow the pattern of the other code snippets, which also include a "file title" that shows where this lives

See the example of using:
{% code title="feast_custom_offline_store/file.py" %}
... your code
{% endcode %}

that we did before

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took a pass through the offline store only, but I imagine most of the comments also translate to the online store version

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't add the make command here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Achal mentioned that that is no longer needed and to just update the rst file instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird that you only have "write_logged_features" as optional when really pretty much everything is optional except for get_historical_features

Would have the "optional" methods towards the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just treated "optional" as any methods that are labeled abstract even though they have a "pass" I still think the decorator necessitates implementation. I added optional to offline_write batch though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally pretty hard to read imo. Can you make this simpler and more scannable?

e.g.

Contrib offline stores

New offline stores go in sdk/python/feast/infra/offline_stores/contrib/.

What is a contrib plugin?

  • Not guaranteed to implement all interface methods
  • Not guaranteed to be stable
  • Should have warnings for users to indicate this is contrib

How do I make a contrib plugin an "official" plugin?
To move an offline store plugin out of contrib, you need:

  • GitHub actions (i.e. make test-python-integration) is setup to run all tests against the offline store and pass.
  • At least two contributors own the plugin (ideally tracked in an OWNERS / CODEOWNERS)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to contradict the text you have below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is offline in brackets? shouldn't it be <offline_store> ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Finally since this isn't the last step

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would remove the "if your offline store requires special packages" part. Every offline store will need to pull in some deps to support a new offline store

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Convert this to individual steps they can follow instead of many sentences

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, comments mostly translate to the online store guide too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### How do I make a contrib plugin and "official" plugin?
#### How do I make a contrib plugin an "official" plugin?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Github actions (i.e `make test-python-integration`) is setup to run all tests against the offline store and pass.
- GitHub actions (i.e `make test-python-integration`) is setup to run all tests against the offline store and pass.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- At least two contributors own the plugin (ideally tracked in our OWNERS / CODEOWNERS file).
- At least two contributors own the plugin (ideally tracked in our `OWNERS` / `CODEOWNERS` file).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of confusing to me. I don't think users should try to "run the integration tests" since that is for official plugins.

It's useful context to understand the second bullet though. Would rephrase that Feast parametrizes integration tests using the FULL_REPO_CONFIGS variable, which you overwrite to enable testing for the contrib offline store

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase against master?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

kevjumba added 9 commits July 22, 2022 10:27
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
@kevjumba kevjumba force-pushed the update_offline_store_docs branch from fa0e55d to 977221f Compare July 22, 2022 17:42
kevjumba added 6 commits July 22, 2022 10:46
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
kevjumba added 7 commits July 22, 2022 12:58
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
kevjumba added 10 commits July 22, 2022 13:31
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, kevjumba

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@adchia adchia enabled auto-merge (squash) July 22, 2022 21:41
@feast-ci-bot feast-ci-bot merged commit 1479519 into feast-dev:master Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants