Skip to content

Conversation

@timsaucer
Copy link
Member

@timsaucer timsaucer commented Jan 11, 2026

Which issue does this PR close?

None

Rationale for this change

Prepare for next release

What changes are included in this PR?

  • Upgrade to DataFusion 52
  • Add FFI extensions for TaskContextProvider and LogicalExtensionCodec to the SessionContext

Are there any user-facing changes?

Yes, the FFI signatures for catalog, schema, table providers and table functions all now require access to the session context. There is an upgrade guide instructions included in the online documentation as part of this PR. Examples have been updated to reflect the changes.

@timsaucer
Copy link
Member Author

@nuno-faria @kosiew In my attempt to upgrade to DataFusion 52 (release candidate) I'm running into problems again with the test_arrow_c_stream_interrupted test. Now it's becoming a timeout issue - the 10 seconds isn't enough in the CI runner to pass. I don't want to continue arbitrarily bumping this timeout.

I'm also fairly confident that in the CI test we are not triggering the datafusion interrupt code in wait_for_future because the unit test is catching a KeyboardInterrupt and not the datafusion wrapped error.

I'm leaning towards marking this test as skip and opening an issue on it since manual testing shows that the stream is getting interrupted. It would be nice to deep dive into figuring out exactly what is happening in this code and to figure out how important it is to keep this test.

@nuno-faria
Copy link
Contributor

@timsaucer I did a quick check and it appears now that the query does not block as it once did. So batches are continuously generated, meaning the sleep(INTERVAL_CHECK_SIGNALS) => ... branch of wait_for_future is never called.

One way to solve this would be to put the py.check_signals() code before the tokio::select! (in addition to the sleep(INTERVAL_CHECK_SIGNALS) branch). That way, we ensure that we check for the interrupt before a batch is produced. However, I'm not sure if this would negatively impact performance.

@timsaucer timsaucer requested a review from Copilot January 16, 2026 17:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the DataFusion dependency from version 51 to 52 and implements required FFI extensions to support the new TaskContextProvider and LogicalExtensionCodec interfaces. The upgrade introduces breaking changes to FFI signatures that now require session context access for catalog, schema, table providers, and table functions.

Changes:

  • Updated all DataFusion-related crate dependencies from version 51 to 52
  • Added FFI extensions for TaskContextProvider and LogicalExtensionCodec to SessionContext
  • Modified FFI provider signatures to remove intermediate wrapper types and use direct trait conversions
  • Added ResetVariable statement support in logical plan handling
  • Updated documentation with upgrade guide and FFI reference

Reviewed changes

Copilot reviewed 32 out of 34 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Cargo.toml Updated DataFusion dependencies from v51 to v52
src/context.rs Added FFI extension methods and changed SessionContext to Arc-based storage
src/utils.rs Updated global context to return Arc and simplified table provider conversion
src/udaf.rs, src/udf.rs, src/udwf.rs, src/udtf.rs Removed Foreign* wrapper types in favor of direct trait conversions
src/expr/statement.rs Added PyResetVariable struct for new SQL statement support
src/expr/scalar_variable.rs Changed from storing DataType to FieldRef
src/dataframe.rs Modified ParquetWriterOptions to validate writer_version parameter
examples/datafusion-ffi-example/* Updated examples to pass SessionContext and use new FFI codec interfaces
python/tests/test_dataframe.py Updated test expectations for DataFusion 52 query plan output
docs/* Added upgrade guide documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 98 to 104
.. code-block: rust

use datafusion::catalog::MemTable;

Instead you can now write:

.. code-block: rust
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'code-block' directive. The correct reStructuredText syntax uses a double colon (::) not a single colon.

Suggested change
.. code-block: rust
use datafusion::catalog::MemTable;
Instead you can now write:
.. code-block: rust
.. code-block:: rust
use datafusion::catalog::MemTable;
Instead you can now write:
.. code-block:: rust

Copilot uses AI. Check for mistakes.
This version includes a major update to the :ref:`ffi` due to upgrades
to the `Foreign Function Interface <https://doc.rust-lang.org/nomicon/ffi.html>`_.
Users who contribute their own ``CatalogProvider``, ``SchemaProvider``,
``TableProvider`` or ``TableFunction``` via FFI must now provide access to a
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra backtick at the end of 'TableFunction```'. Should be 'TableFunction``'.

Suggested change
``TableProvider`` or ``TableFunction``` via FFI must now provide access to a
``TableProvider`` or ``TableFunction`` via FFI must now provide access to a

Copilot uses AI. Check for mistakes.
Comment on lines 98 to 104
.. code-block: rust

use datafusion::catalog::MemTable;

Instead you can now write:

.. code-block: rust
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'code-block' directive. The correct reStructuredText syntax uses a double colon (::) not a single colon.

Suggested change
.. code-block: rust
use datafusion::catalog::MemTable;
Instead you can now write:
.. code-block: rust
.. code-block:: rust
use datafusion::catalog::MemTable;
Instead you can now write:
.. code-block:: rust

Copilot uses AI. Check for mistakes.
``FFI_LogicalExtensionCodec``, which can satisfy this new requirement.

A complete example can be found in the `FFI example <https://github.com/apache/datafusion-python/tree/main/examples/datafusion-ffi-example>`_.
The constructor for your provider needs to take an an input the ``SessionContext``
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected duplicate article 'an an' to 'as an'.

Suggested change
The constructor for your provider needs to take an an input the ``SessionContext``
The constructor for your provider needs to take as an input the ``SessionContext``

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants