Skip to content

Conversation

lidavidm
Copy link

This adds the skeleton of a very minimal ADBC backend. Most tests
do not yet pass. The main thing is that ADBC is really a database
API abstraction akin to PEP 249, JDBC, or ODBC, so this backend
eventually needs to be able to let the user specify the SQL
dialect to compile to (or: to use Substrait instead).

Also, I haven't yet updated the dependencies, CI, etc. This is
mostly to prove that it (generally) is possible.

ADBC doesn't yet have packages/releases, so getting this up and
working takes some work. Basically:

  1. Add cython to shell.nix
  2. Get the ADBC C SQLite driver built (I did this outside of Nix): https://github.com/apache/arrow-adbc/blob/main/CONTRIBUTING.md#cc
  3. Inside the Nix dev environment, get the Python package built: https://github.com/apache/arrow-adbc/blob/main/CONTRIBUTING.md#python (this only worked if I made a virtualenv inside the Nix shell)
  4. export LD_LIBRARY_PATH=path/to/folder/with/libadbc_driver_sqlite.so
  5. export PYTHONPATH=$PYTHONPATH:path/to/folder/with/adbc_driver_manager/package
  6. python -m pytest -m adbc

@github-actions
Copy link
Contributor

github-actions bot commented Jul 22, 2022

Test Results

       35 files         35 suites   1h 9m 19s ⏱️
  8 873 tests   7 094 ✔️ 1 779 💤 0
32 504 runs  25 728 ✔️ 6 776 💤 0

Results for commit 5d301ee.

♻️ This comment has been updated with latest results.

@codecov
Copy link

codecov bot commented Jul 22, 2022

Codecov Report

Merging #4267 (5d301ee) into master (8d0cf80) will decrease coverage by 0.58%.
The diff coverage is 4.44%.

❗ Current head 5d301ee differs from pull request most recent head 02d8a7e. Consider uploading reports for the commit 02d8a7e to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4267      +/-   ##
==========================================
- Coverage   92.60%   92.02%   -0.59%     
==========================================
  Files         179      180       +1     
  Lines       20240    20372     +132     
  Branches     2893     2914      +21     
==========================================
+ Hits        18743    18747       +4     
- Misses       1132     1260     +128     
  Partials      365      365              
Impacted Files Coverage Δ
ibis/backends/adbc/__init__.py 0.00% <0.00%> (ø)
ibis/backends/pyarrow/datatypes.py 77.77% <ø> (ø)
ibis/backends/conftest.py 89.59% <62.50%> (-0.91%) ⬇️
ibis/backends/base/sql/alchemy/query_builder.py 92.46% <100.00%> (ø)
ibis/backends/impala/__init__.py 86.14% <0.00%> (+0.21%) ⬆️

@cpcloud cpcloud added the experimental Experimental features label Jul 23, 2022
@pep8speaks
Copy link

pep8speaks commented Aug 2, 2022

Hello @lidavidm! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-08-08 18:17:29 UTC

@cpcloud cpcloud added the feature Features or general enhancements label Aug 3, 2022
lidavidm and others added 4 commits August 3, 2022 06:23
This adds the skeleton of a very minimal ADBC backend. Most tests
do not yet pass. The main thing is that ADBC is really a database
API abstraction akin to PEP 249, JDBC, or ODBC, so this backend
eventually needs to be able to let the user specify the SQL
dialect to compile to (or: to use Substrait instead).
@lidavidm
Copy link
Author

lidavidm commented Aug 8, 2022

On Phillip's suggestion, now the ADBC package implements DBAPI, and this backend just defers to SQLAlchemy (except for fetching results, which gets overridden). I also added a bit of pytest magic here (thanks to Phillip again for explaining how it works).

Current status:

91 failed, 260 passed, 2 skipped, 8681 deselected, 192 xfailed in 50.52s

Most of the remaining failures are because of arithmetic UDFs, which need manual skips. The driver has no way of registering Python UDFs; perhaps the driver could optionally register all the ones we need here, though. Part of the pytest magic is to auto-skip anything already skipped for SQLite.

Next up, I'd like to try reintegrating the Substrait backend

@lidavidm
Copy link
Author

lidavidm commented Aug 8, 2022

What I ended up doing was futzing around so that the driver can also be distributed as a package: apache/arrow-adbc#57

Once that gets updated, then instead of having to build the driver and such here, we can just depend on the (prerelease) packages.

@kszucs
Copy link
Member

kszucs commented Aug 9, 2022

ADBC should be able to be used by various backends so it's not a backend on its own.

I think ibis will need another abstraction layer which is responsible for the interfacing between various databases.

@lidavidm
Copy link
Author

lidavidm commented Aug 9, 2022

I guess then we need to decide what specific backend(s) to target here? (Since this won't be as featureful as the existing SQLite driver, without being able to register UDFs.) Possibly Postgres and Acero?

@lidavidm
Copy link
Author

lidavidm commented Aug 9, 2022

In that case though, instead of trying to add all the pytest marks to get everything to pass, I'll focus on looking more into Substrait integration

@cpcloud cpcloud added this to the 4.0.0 milestone Aug 23, 2022
@cpcloud
Copy link
Member

cpcloud commented Oct 6, 2022

@lidavidm Should we keep this PR open?

@lidavidm
Copy link
Author

lidavidm commented Oct 6, 2022

Sorry, I've been letting this sit. I'll close it for now, and I'll follow up with a new PR once I get (nightly) wheel releases for the ADBC packages set up; that'll make development easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experimental Experimental features feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants