-
Notifications
You must be signed in to change notification settings - Fork 663
feat: add ADBC backend #4267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ADBC backend #4267
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4267 +/- ##
==========================================
- Coverage 92.60% 92.02% -0.59%
==========================================
Files 179 180 +1
Lines 20240 20372 +132
Branches 2893 2914 +21
==========================================
+ Hits 18743 18747 +4
- Misses 1132 1260 +128
Partials 365 365
|
This adds the skeleton of a very minimal ADBC backend. Most tests do not yet pass. The main thing is that ADBC is really a database API abstraction akin to PEP 249, JDBC, or ODBC, so this backend eventually needs to be able to let the user specify the SQL dialect to compile to (or: to use Substrait instead).
…he package build
On Phillip's suggestion, now the ADBC package implements DBAPI, and this backend just defers to SQLAlchemy (except for fetching results, which gets overridden). I also added a bit of pytest magic here (thanks to Phillip again for explaining how it works). Current status:
Most of the remaining failures are because of arithmetic UDFs, which need manual skips. The driver has no way of registering Python UDFs; perhaps the driver could optionally register all the ones we need here, though. Part of the pytest magic is to auto-skip anything already skipped for SQLite. Next up, I'd like to try reintegrating the Substrait backend |
What I ended up doing was futzing around so that the driver can also be distributed as a package: apache/arrow-adbc#57 Once that gets updated, then instead of having to build the driver and such here, we can just depend on the (prerelease) packages. |
ADBC should be able to be used by various backends so it's not a backend on its own. I think ibis will need another abstraction layer which is responsible for the interfacing between various databases. |
I guess then we need to decide what specific backend(s) to target here? (Since this won't be as featureful as the existing SQLite driver, without being able to register UDFs.) Possibly Postgres and Acero? |
In that case though, instead of trying to add all the pytest marks to get everything to pass, I'll focus on looking more into Substrait integration |
@lidavidm Should we keep this PR open? |
Sorry, I've been letting this sit. I'll close it for now, and I'll follow up with a new PR once I get (nightly) wheel releases for the ADBC packages set up; that'll make development easier. |
This adds the skeleton of a very minimal ADBC backend. Most tests
do not yet pass. The main thing is that ADBC is really a database
API abstraction akin to PEP 249, JDBC, or ODBC, so this backend
eventually needs to be able to let the user specify the SQL
dialect to compile to (or: to use Substrait instead).
Also, I haven't yet updated the dependencies, CI, etc. This is
mostly to prove that it (generally) is possible.
ADBC doesn't yet have packages/releases, so getting this up and
working takes some work. Basically:
export LD_LIBRARY_PATH=path/to/folder/with/libadbc_driver_sqlite.so
export PYTHONPATH=$PYTHONPATH:path/to/folder/with/adbc_driver_manager/package
python -m pytest -m adbc