Skip to content

"The value of property is longer than 1500 bytes" error on BigQquery REPEATED STRING materialization #1633

@adriangay

Description

@adriangay

Expected Behavior

When materializing REPEATED features from a BigQuery table into GCP Feast online store we should not get this error

Current Behavior

One column of a BigQuery table REPEATED STRING. The number of values in this column varies from a few to many hundreds of string IDs. It appears that the total size of the repeated string causes the issue. After 97% ingestion:

97%|█████████████████████████████████████████████████████▍ | 49606/51056 [00:30<00:00, 1604.20it/s]

The actual stack trace is:

Traceback (most recent call last):
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 67, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INVALID_ARGUMENT
        details = "The value of property "content_30d" is longer than 1500 bytes."
        debug_error_string = "{"created":"@1623177301.342896000","description":"Error received from peer ipv6:[2a00:1450:4009:81e::200a]:443","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"The value of property "content_30d" is longer than 1500 bytes.","grpc_status":3}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/bin/feast", line 8, in <module>
    sys.exit(cli())
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/cli.py", line 243, in materialize_command
    store.materialize(
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/telemetry.py", line 151, in exception_logging_wrapper
    result = func(*args, **kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/feature_store.py", line 444, in materialize
    provider.materialize_single_feature_view(
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/infra/gcp.py", line 192, in materialize_single_feature_view
    self.online_write_batch(
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/infra/gcp.py", line 121, in online_write_batch
    pool.map(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/infra/gcp.py", line 122, in <lambda>
    lambda b: _write_minibatch(client, project, table, b, progress),
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/infra/gcp.py", line 270, in _write_minibatch
    client.put_multi(entities)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore/batch.py", line 328, in __exit__
    self.commit()
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore/transaction.py", line 304, in commit
    super(Transaction, self).commit(**kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore/batch.py", line 300, in commit
    self._commit(retry=retry, timeout=timeout)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore/batch.py", line 257, in _commit
    commit_response_pb = self._client._datastore_api.commit(
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore_v1/services/datastore/client.py", line 627, in commit
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 69, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 The value of property "content_30d" is longer than 1500 bytes.

The Feature View is:

propensity_model_data_view = FeatureView(
    name="propensity_model_data_stats",
    entities=["customer_id"],
    ttl=Duration(seconds=86400 * 10),
    features=[
        Feature(name="avg_duration_30d", dtype=ValueType.FLOAT),
        Feature(name="content_30d", dtype=ValueType.STRING),
        Feature(name="common_genre", dtype=ValueType.STRING),
        Feature(name="tenure", dtype=ValueType.FLOAT)
    ],
    online=True,
    input=propensity_model_data,
    tags={},
)

The BQ schema is:

Field name Type Mode
tenure INTEGER
target INTEGER
sub_timestamp TIMESTAMP
customer_id STRING
created_timestamp TIMESTAMP
avg_duration_30d FLOAT
common_genre STRING
content_30d STRING REPEATED

Steps to reproduce

Materialise a column of BigQuery REPEATED STRING with total byte count > 1500

Reducing the total size, materialization runs to completion

Specifications

  • Version: 0.10.6
  • Platform: GCP
  • Subsystem: Firestore

Possible Solution

I raised this on Slack originally and Willem Pienaar thinks he knows the cause of the problem. The Slack thread is here:
https://tectonfeast.slack.com/archives/C01MSKCMB37/p1623155984116000

Metadata

Metadata

Assignees

No one assigned

    Labels

    wontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions