Skip to content

Conversation

ebolblga
Copy link
Contributor

What this PR does / why we need it:

This PR fixes ArrowNotImplementedError: ("NumPyConverter doesn't implement <list<item: int64>> conversion. ", 'Conversion failed for column with type float64') error when calling FeatureStore.get_historical_features().to_df().

Explanation: If I define some feature of type array(bigint) in Trino source after some joins a column with that feature may contain None values. Pandas (NumPy) will infer this column as float64, even though it still contains valid int64 arrays.

def _to_arrow_internal(self, timeout: Optional[int] = None) -> pyarrow.Table:
    """Return payrrow dataset as synchronously including on demand transforms"""
    return pyarrow.Table.from_pandas(
-       self._to_df_internal(timeout=timeout), schema=self.pyarrow_schema
+       self._to_df_internal(timeout=timeout)
    )

Since the schema is passed, PyArrow notices that I am trying to convert float64 value to an int64 array and throws an error. By simply removing forced schema, it solves this issue.

Obviously removing forced schema is questionable: however I checked the exact same function in Spark offline store sdk\python\feast\infra\offline_stores\contrib\spark_offline_store\spark.py and it does just that - does not pass schema. I guess it's up to maintainers to decide if this is a valid fix, but if it works well for Spark, I don't see why not do the same thing in Trino.

Which issue(s) this PR fixes:

N/A — this issue was found independently and not yet tracked.

Misc

PyArrow from_pandas() method

Note that I also added array support to documentation for Trino offline store. I can confirm that arrays are working no problem with this small fix.

Realese note: Fix Trino offline store to support array data types.

@ebolblga ebolblga requested a review from a team as a code owner May 27, 2025 18:14
return pyarrow.Table.from_pandas(
self._to_df_internal(timeout=timeout), schema=self.pyarrow_schema
)
return pyarrow.Table.from_pandas(self._to_df_internal(timeout=timeout))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove the explicit schema declaration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say I have Pandas DataFrame like this:

feature_x feature_y
[1] None
[3, 4] [54, 38]
[75, 1, 12] [40, 0]

If I output column dtypes then feature_x would still be object (array(int)) and feature_y would be converted to float by underlying NumPy because of the Null.

By default, pandas uses NumPy data types, which do not support missing values in integer arrays. If you create a Series or DataFrame column with integers and include a null value (e.g., None or np.nan), pandas will upcast the column to a floating-point type (float64) to accommodate the missing value.

If I later pass this dataframe WITH forced schema, PyArrow will see that I want to cast float to array(int) and throw an error. If I don't pass the schema though, it will infer type itself and work as expected.

@franciscojavierarceo
Copy link
Member

looks like tests are actually failing now.

@ntkathole
Copy link
Member

@ebolblga can you please rebase with master

@ebolblga ebolblga force-pushed the fix-trino-arrays branch from 7352e2b to c809bc7 Compare May 29, 2025 15:01
@ebolblga
Copy link
Contributor Author

@ebolblga can you please rebase with master

Done, didn't help with that test though

@franciscojavierarceo
Copy link
Member

we ended up finally fixing that issue. can you rebase one last time? So sorry here.

@ebolblga ebolblga force-pushed the fix-trino-arrays branch from c809bc7 to cd1f210 Compare June 3, 2025 21:16
@ebolblga
Copy link
Contributor Author

ebolblga commented Jun 3, 2025

we ended up finally fixing that issue. can you rebase one last time? So sorry here.

No worries, finally passed all tests!

@franciscojavierarceo franciscojavierarceo merged commit 9ba9ded into feast-dev:master Jun 4, 2025
21 checks passed
j-wine pushed a commit to j-wine/feast that referenced this pull request Jun 7, 2025
devin-ai-integration bot pushed a commit to franciscojavierarceo/feast that referenced this pull request Jun 9, 2025
franciscojavierarceo pushed a commit that referenced this pull request Jun 30, 2025
# [0.50.0](v0.49.0...v0.50.0) (2025-06-30)

### Bug Fixes

* Add asyncio to integration test ([#5418](#5418)) ([6765515](6765515))
* Add clickhouse to OFFLINE_STORE_CLASS_FOR_TYPE map ([#5251](#5251)) ([9ed2ffa](9ed2ffa))
* Add missing conn.commit() in SnowflakeOnlineStore.online_write_batch ([#5432](#5432)) ([a83dd85](a83dd85))
* Add transformers in required dependencies ([8cde460](8cde460))
* Allow custom annotations on Operator installed objects ([#5339](#5339)) ([44c7a76](44c7a76))
* Dask pulling of latest data ([#5229](#5229)) ([571d81f](571d81f))
* **dask:** preserve remote URIs (e.g. s3://) in DaskOfflineStore path resolution ([2561cfc](2561cfc))
* Fix Event loop is closed error on dynamodb test ([#5480](#5480)) ([fe0f671](fe0f671))
* Fix lineage entity filtering ([#5321](#5321)) ([0d05701](0d05701))
* Fix list saved dataset api ([833696c](833696c))
* Fix NumPy - PyArrow array type mapping in Trino offline store ([#5393](#5393)) ([9ba9ded](9ba9ded))
* Fix pandas 2.x compatibility issue of Trino offline store caused by removed Series.iteritems() method ([#5345](#5345)) ([61e3e02](61e3e02))
* Fix polling mechanism for TestApplyAndMaterialize ([#5451](#5451)) ([b512a74](b512a74))
* Fix remote rbac integration tests ([#5473](#5473)) ([10879ec](10879ec))
* Fix Trino offline store SQL in Jinja template ([#5346](#5346)) ([648c53d](648c53d))
* Fixed CurlGeneratorTab github theme type ([#5425](#5425)) ([5f15329](5f15329))
* Increase the Operator Manager memory limits and requests ([#5441](#5441)) ([6c94dbf](6c94dbf))
* Method signature for push_async is out of date ([#5413](#5413)) ([28c3379](28c3379)), closes [#5410](#5410) [#006BB4](https://github.com/feast-dev/feast/issues/006BB4)
* Operator - support securityContext override at Pod level ([#5325](#5325)) ([33ea0f5](33ea0f5))
* Pybuild-deps throws errors w/ latest pip version ([#5311](#5311)) ([f2d6a67](f2d6a67))
* Reopen for integration test about add s3 storage-based registry store in Go feature server ([#5352](#5352)) ([ef75f61](ef75f61))
* resolve Python logger warnings ([#5361](#5361)) ([37d5c19](37d5c19))
* The ignore_paths not taking effect duration feast apply ([#5353](#5353)) ([e4917ca](e4917ca))
* Update generate_answer function to provide correct parameter format to retrieve function ([dc5b2af](dc5b2af))
* Update milvus connect function to work with remote instance ([#5382](#5382)) ([7e5e7d5](7e5e7d5))
* Updating milvus connect function to work with remote instance ([#5401](#5401)) ([b89fadd](b89fadd))
* Upperbound limit for protobuf generation ([#5309](#5309)) ([a114aae](a114aae))

### Features

* Add CLI, SDK, and API documentation page to Feast UI ([#5337](#5337)) ([203e888](203e888))
* Add dark mode toggle to Feast UI ([#5314](#5314)) ([ad02e46](ad02e46))
* Add data labeling tabs to UI ([#5410](#5410)) ([389ceb7](389ceb7)), closes [#006BB4](https://github.com/feast-dev/feast/issues/006BB4)
* Add Decimal to allowed python scalar types ([#5367](#5367)) ([4777c03](4777c03))
* Add feast rag retriver functionality ([#5405](#5405)) ([0173033](0173033))
* Add feature view curl generator ([#5415](#5415)) ([7a5b48f](7a5b48f))
* Add feature view lineage tab and filtering to home page lineage ([#5308](#5308)) ([308255d](308255d))
* Add feature view tags to dynamo tags ([#5291](#5291)) ([3a787ac](3a787ac))
* Add HybridOnlineStore for multi-backend online store routing ([#5423](#5423)) ([ebd67d1](ebd67d1))
* Add max_file_size to Snowflake config ([#5377](#5377)) ([e8cdf5d](e8cdf5d))
* Add MCP (Model Context Protocol) support for Feast feature server ([#5406](#5406)) ([de650de](de650de)), closes [#5398](#5398) [#5382](#5382) [#5389](#5389) [#5401](#5401)
* Add rag project to default dev UI ([#5323](#5323)) ([3b3e1c8](3b3e1c8))
* Add s3 storage-based registry store in Go feature server ([#5336](#5336)) ([abe18df](abe18df))
* Add support for data labeling in UI ([#5409](#5409)) ([d183c4b](d183c4b)), closes [#27](#27)
* Added Lineage APIs to get registry objects relationships ([#5472](#5472)) ([be004ef](be004ef))
* Added rest-apis serving option for registry server ([#5342](#5342)) ([9740fd1](9740fd1))
* Added torch.Tensor as option for online and offline retrieval ([#5381](#5381)) ([0b4ae95](0b4ae95))
* Adding feast delete to CLI ([#5344](#5344)) ([19fe3ac](19fe3ac))
* Adding permissions to UI and refactoring some things ([#5320](#5320)) ([6f1b0cc](6f1b0cc))
* Allow to set registry server rest/grpc mode in operator ([#5364](#5364)) ([99afd6d](99afd6d))
* Allow to use env variable FEAST_FS_YAML_FILE_PATH and FEATURE_REPO_DIR ([#5420](#5420)) ([6a1b33a](6a1b33a))
* Enable materialization for ODFV Transform on Write ([#5459](#5459)) ([3d17892](3d17892))
* Improve search results formatting ([#5326](#5326)) ([18cbd7f](18cbd7f))
* Improvements to Lambda materialization engine ([#5379](#5379)) ([b486f29](b486f29))
* Make batch_source optional in PushSource ([#5440](#5440)) ([#5454](#5454)) ([ae7e20e](ae7e20e))
* Refactor materialization engine ([#5354](#5354)) ([f5c5360](f5c5360))
* Remote Write to Online Store completes client / server architecture ([#5422](#5422)) ([2368f42](2368f42))
* Serialization version 2 and below removed ([#5435](#5435)) ([9e50e18](9e50e18))
* SQLite online retrieval. Add timezone info into timestamp. ([#5386](#5386)) ([6b05153](6b05153))
* Support dual-mode REST and gRPC for Feast Registry Server ([#5396](#5396)) ([fd1f448](fd1f448))
* Support DynamoDB as online store in Go feature server ([#5464](#5464)) ([40d25c6](40d25c6))
* Update Spark Compute read source node to be able to use other data sources ([#5445](#5445)) ([a93d300](a93d300))

### Reverts

* Feat: Add CLI, SDK, and API documentation page to Feast UI" ([#5341](#5341)) ([b492f14](b492f14)), closes [#5337](#5337)
* Revert "feat: Add s3 storage-based registry store in Go feature server" ([#5351](#5351)) ([d5d6766](d5d6766)), closes [#5336](#5336)
* Revert "fix: Update milvus connect function to work with remote instance" ([#5398](#5398)) ([434dd92](434dd92)), closes [#5382](#5382)
franciscojavierarceo pushed a commit that referenced this pull request Jul 1, 2025
# [0.50.0](v0.49.0...v0.50.0) (2025-07-01)

### Bug Fixes

* Add asyncio to integration test ([#5418](#5418)) ([6765515](6765515))
* Add clickhouse to OFFLINE_STORE_CLASS_FOR_TYPE map ([#5251](#5251)) ([9ed2ffa](9ed2ffa))
* Add missing conn.commit() in SnowflakeOnlineStore.online_write_batch ([#5432](#5432)) ([a83dd85](a83dd85))
* Add transformers in required dependencies ([8cde460](8cde460))
* Allow custom annotations on Operator installed objects ([#5339](#5339)) ([44c7a76](44c7a76))
* Dask pulling of latest data ([#5229](#5229)) ([571d81f](571d81f))
* **dask:** preserve remote URIs (e.g. s3://) in DaskOfflineStore path resolution ([2561cfc](2561cfc))
* Fix Event loop is closed error on dynamodb test ([#5480](#5480)) ([fe0f671](fe0f671))
* Fix lineage entity filtering ([#5321](#5321)) ([0d05701](0d05701))
* Fix list saved dataset api ([833696c](833696c))
* Fix NumPy - PyArrow array type mapping in Trino offline store ([#5393](#5393)) ([9ba9ded](9ba9ded))
* Fix pandas 2.x compatibility issue of Trino offline store caused by removed Series.iteritems() method ([#5345](#5345)) ([61e3e02](61e3e02))
* Fix polling mechanism for TestApplyAndMaterialize ([#5451](#5451)) ([b512a74](b512a74))
* Fix remote rbac integration tests ([#5473](#5473)) ([10879ec](10879ec))
* Fix Trino offline store SQL in Jinja template ([#5346](#5346)) ([648c53d](648c53d))
* Fixed CurlGeneratorTab github theme type ([#5425](#5425)) ([5f15329](5f15329))
* Increase the Operator Manager memory limits and requests ([#5441](#5441)) ([6c94dbf](6c94dbf))
* Method signature for push_async is out of date ([#5413](#5413)) ([28c3379](28c3379)), closes [#5410](#5410) [#006BB4](https://github.com/feast-dev/feast/issues/006BB4)
* Operator - support securityContext override at Pod level ([#5325](#5325)) ([33ea0f5](33ea0f5))
* Pybuild-deps throws errors w/ latest pip version ([#5311](#5311)) ([f2d6a67](f2d6a67))
* Reopen for integration test about add s3 storage-based registry store in Go feature server ([#5352](#5352)) ([ef75f61](ef75f61))
* resolve Python logger warnings ([#5361](#5361)) ([37d5c19](37d5c19))
* The ignore_paths not taking effect duration feast apply ([#5353](#5353)) ([e4917ca](e4917ca))
* Update generate_answer function to provide correct parameter format to retrieve function ([dc5b2af](dc5b2af))
* Update milvus connect function to work with remote instance ([#5382](#5382)) ([7e5e7d5](7e5e7d5))
* Updating milvus connect function to work with remote instance ([#5401](#5401)) ([b89fadd](b89fadd))
* Upperbound limit for protobuf generation ([#5309](#5309)) ([a114aae](a114aae))

### Features

* Add CLI, SDK, and API documentation page to Feast UI ([#5337](#5337)) ([203e888](203e888))
* Add dark mode toggle to Feast UI ([#5314](#5314)) ([ad02e46](ad02e46))
* Add data labeling tabs to UI ([#5410](#5410)) ([389ceb7](389ceb7)), closes [#006BB4](https://github.com/feast-dev/feast/issues/006BB4)
* Add Decimal to allowed python scalar types ([#5367](#5367)) ([4777c03](4777c03))
* Add feast rag retriver functionality ([#5405](#5405)) ([0173033](0173033))
* Add feature view curl generator ([#5415](#5415)) ([7a5b48f](7a5b48f))
* Add feature view lineage tab and filtering to home page lineage ([#5308](#5308)) ([308255d](308255d))
* Add feature view tags to dynamo tags ([#5291](#5291)) ([3a787ac](3a787ac))
* Add HybridOnlineStore for multi-backend online store routing ([#5423](#5423)) ([ebd67d1](ebd67d1))
* Add max_file_size to Snowflake config ([#5377](#5377)) ([e8cdf5d](e8cdf5d))
* Add MCP (Model Context Protocol) support for Feast feature server ([#5406](#5406)) ([de650de](de650de)), closes [#5398](#5398) [#5382](#5382) [#5389](#5389) [#5401](#5401)
* Add rag project to default dev UI ([#5323](#5323)) ([3b3e1c8](3b3e1c8))
* Add s3 storage-based registry store in Go feature server ([#5336](#5336)) ([abe18df](abe18df))
* Add support for data labeling in UI ([#5409](#5409)) ([d183c4b](d183c4b)), closes [#27](#27)
* Added Lineage APIs to get registry objects relationships ([#5472](#5472)) ([be004ef](be004ef))
* Added rest-apis serving option for registry server ([#5342](#5342)) ([9740fd1](9740fd1))
* Added torch.Tensor as option for online and offline retrieval ([#5381](#5381)) ([0b4ae95](0b4ae95))
* Adding feast delete to CLI ([#5344](#5344)) ([19fe3ac](19fe3ac))
* Adding permissions to UI and refactoring some things ([#5320](#5320)) ([6f1b0cc](6f1b0cc))
* Allow to set registry server rest/grpc mode in operator ([#5364](#5364)) ([99afd6d](99afd6d))
* Allow to use env variable FEAST_FS_YAML_FILE_PATH and FEATURE_REPO_DIR ([#5420](#5420)) ([6a1b33a](6a1b33a))
* Enable materialization for ODFV Transform on Write ([#5459](#5459)) ([3d17892](3d17892))
* Improve search results formatting ([#5326](#5326)) ([18cbd7f](18cbd7f))
* Improvements to Lambda materialization engine ([#5379](#5379)) ([b486f29](b486f29))
* Make batch_source optional in PushSource ([#5440](#5440)) ([#5454](#5454)) ([ae7e20e](ae7e20e))
* Refactor materialization engine ([#5354](#5354)) ([f5c5360](f5c5360))
* Remote Write to Online Store completes client / server architecture ([#5422](#5422)) ([2368f42](2368f42))
* Serialization version 2 and below removed ([#5435](#5435)) ([9e50e18](9e50e18))
* SQLite online retrieval. Add timezone info into timestamp. ([#5386](#5386)) ([6b05153](6b05153))
* Support dual-mode REST and gRPC for Feast Registry Server ([#5396](#5396)) ([fd1f448](fd1f448))
* Support DynamoDB as online store in Go feature server ([#5464](#5464)) ([40d25c6](40d25c6))
* Update Spark Compute read source node to be able to use other data sources ([#5445](#5445)) ([a93d300](a93d300))

### Reverts

* Chore Release "chore(release): release 0.50.0" ([#5483](#5483)) ([0eef391](0eef391))
* Feat: Add CLI, SDK, and API documentation page to Feast UI" ([#5341](#5341)) ([b492f14](b492f14)), closes [#5337](#5337)
* Revert "feat: Add s3 storage-based registry store in Go feature server" ([#5351](#5351)) ([d5d6766](d5d6766)), closes [#5336](#5336)
* Revert "fix: Update milvus connect function to work with remote instance" ([#5398](#5398)) ([434dd92](434dd92)), closes [#5382](#5382)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants