Should Airflow keep SQLAlchemy connection pooling enabled when using PgBouncer (transaction pooling mode)? #56890
Replies: 2 comments 1 reply
-
|
This is a very good question and I think we have no good answer for now - i.e. no evidences not enough benchmarks to give definite answer. Purely teorethically - PGBouncer might not be needed any more and could be completely replaced by connection pooling in Airflow. PGBouncer used to be badly needed when each of the workers connnected directly to the DB, but since this is not happening in Airfow 3 - only scheduler, triggerer, api-server, dag processor - and you can limit a number of connections by using the SQLAlchemy pools you mentioned. So theorethically you could just skip PGBouncer and connect directly to the DB if you use SQLAlchemy pools. That would lkely give a slight boost over using PGBouncer. But I have no evidences for it. |
Beta Was this translation helpful? Give feedback.
-
|
@potiuk After trying your suggestion and removing PgBouncer, the main issue we encountered was that it became difficult to centrally manage the total number of database connections. When using PgBouncer, we only needed to monitor the PgBouncer pool count. However, without it, we now have to manage and coordinate multiple factors — such as the number of API replicas, each API server’s internal pool size, and the number of schedulers — to control overall connection usage. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
We’re running Airflow 3.1.0 (CeleryExecutor) on Kubernetes, with an external PostgreSQL (RDS) metadata database sitting behind PgBouncer configured in transaction pooling mode.
The issue we’re seeing is that when triggering very large DAGs with many tasks, RDS Performance Insights shows high AAS (Average Active Sessions) and frequent Client:ClientWrite wait events.
Interestingly, when we reduced the PgBouncer pool size, the AAS spikes decreased, and the database became noticeably more stable.
This led us to wonder:
Since PgBouncer is already handling connection pooling,
does it still make sense for Airflow to maintain its own SQLAlchemy connection pool?
In other words:
When using PgBouncer in transaction pooling mode, is it better to disable Airflow’s internal SQLAlchemy pool (AIRFLOW__DATABASE__SQL_ALCHEMY_POOL_ENABLED=False) and rely entirely on PgBouncer for connection management?
Or does keeping a small SQLAlchemy pool still provide any meaningful benefit?
Beta Was this translation helpful? Give feedback.
All reactions