-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
wontfixThis will not be worked onThis will not be worked on
Description
Expected Behavior
When materializing REPEATED features from a BigQuery table into GCP Feast online store we should not get this error
Current Behavior
One column of a BigQuery table REPEATED STRING. The number of values in this column varies from a few to many hundreds of string IDs. It appears that the total size of the repeated string causes the issue. After 97% ingestion:
97%|█████████████████████████████████████████████████████▍ | 49606/51056 [00:30<00:00, 1604.20it/s]
The actual stack trace is:
Traceback (most recent call last):
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 67, in error_remapped_callable
return callable_(*args, **kwargs)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "The value of property "content_30d" is longer than 1500 bytes."
debug_error_string = "{"created":"@1623177301.342896000","description":"Error received from peer ipv6:[2a00:1450:4009:81e::200a]:443","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"The value of property "content_30d" is longer than 1500 bytes.","grpc_status":3}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/bin/feast", line 8, in <module>
sys.exit(cli())
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/cli.py", line 243, in materialize_command
store.materialize(
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/telemetry.py", line 151, in exception_logging_wrapper
result = func(*args, **kwargs)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/feature_store.py", line 444, in materialize
provider.materialize_single_feature_view(
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/infra/gcp.py", line 192, in materialize_single_feature_view
self.online_write_batch(
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/infra/gcp.py", line 121, in online_write_batch
pool.map(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/infra/gcp.py", line 122, in <lambda>
lambda b: _write_minibatch(client, project, table, b, progress),
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/infra/gcp.py", line 270, in _write_minibatch
client.put_multi(entities)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore/batch.py", line 328, in __exit__
self.commit()
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore/transaction.py", line 304, in commit
super(Transaction, self).commit(**kwargs)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore/batch.py", line 300, in commit
self._commit(retry=retry, timeout=timeout)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore/batch.py", line 257, in _commit
commit_response_pb = self._client._datastore_api.commit(
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore_v1/services/datastore/client.py", line 627, in commit
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
return wrapped_func(*args, **kwargs)
File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 69, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 The value of property "content_30d" is longer than 1500 bytes.
The Feature View is:
propensity_model_data_view = FeatureView(
name="propensity_model_data_stats",
entities=["customer_id"],
ttl=Duration(seconds=86400 * 10),
features=[
Feature(name="avg_duration_30d", dtype=ValueType.FLOAT),
Feature(name="content_30d", dtype=ValueType.STRING),
Feature(name="common_genre", dtype=ValueType.STRING),
Feature(name="tenure", dtype=ValueType.FLOAT)
],
online=True,
input=propensity_model_data,
tags={},
)
The BQ schema is:
Field name | Type | Mode |
---|---|---|
tenure | INTEGER | |
target | INTEGER | |
sub_timestamp | TIMESTAMP | |
customer_id | STRING | |
created_timestamp | TIMESTAMP | |
avg_duration_30d | FLOAT | |
common_genre | STRING | |
content_30d | STRING | REPEATED |
Steps to reproduce
Materialise a column of BigQuery REPEATED STRING with total byte count > 1500
Reducing the total size, materialization runs to completion
Specifications
- Version: 0.10.6
- Platform: GCP
- Subsystem: Firestore
Possible Solution
I raised this on Slack originally and Willem Pienaar thinks he knows the cause of the problem. The Slack thread is here:
https://tectonfeast.slack.com/archives/C01MSKCMB37/p1623155984116000
Metadata
Metadata
Assignees
Labels
wontfixThis will not be worked onThis will not be worked on