How to Handle Large Data Exports (10M+ Rows) in Superset? #33530

Bhav2308 · 2025-05-20T06:35:36Z

Bhav2308
May 20, 2025

Currently using Superset v4.1.2 (deployed via helm in Kubernetes).

We’re trying to export very large datasets (over 10 million rows) from Superset and observed that Superset loads all data into memory, causing high memory usage which leads to slow performance or even crashes.

Are there any recommended strategies for handling this? Like making use of batch exports (e.g., load 200 rows at a time) for CSV downloads to avoid overloading memory? Can background workers or streaming help manage this more efficiently?
OR Streaming data directly to file without loading everything into memory.

We want a scalable and reliable way to support large data exports without hurting performance.

@dosu · 2025-05-20T06:38:20Z

dosubot[bot]
bot May 20, 2025

To handle large data exports in Superset efficiently, especially for datasets over 10 million rows, consider the following strategies:

Use Celery for Asynchronous Processing: Superset can leverage Celery as a background worker process to handle large data exports asynchronously. This setup involves configuring Celery workers, a message broker like Redis or RabbitMQ, and a results backend. This allows Superset to manage long-running queries without blocking the main application, which can help with large data exports ^[1]^[2].
Optimize CSV Downloads with S3: If you're using Presto DB, configure it to automatically persist query results to S3. Enable the DownloadCSVFromS3 and ShowDefaultCSVOptions feature flags, and set the necessary environment variables (SUPERSET_REGION, SUPERSET_WORKGROUP, SUPERSET_ATHENA_DB). This approach uses Athena's native S3 functionality to return a presigned S3 URL, reducing download time for large CSV files ^[3].
Limitations and Constraints: Be aware that Superset's current export functionality does not support pagination for CSV exports, which means the entire dataset is exported at once. This can lead to performance issues when dealing with large datasets ^[4].

These strategies can help manage memory usage and improve performance during large data exports in Superset.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

0 replies

jklaise · 2025-08-12T15:47:56Z

jklaise
Aug 12, 2025

I was recently investigating this and it seems this is not something that's supported server-side - see more info in my comment here: #33243 (comment)

The way I would do it as a workaround would to build a client SDK with some logic to split large tables on known primary keys and download them piece-by-piece, e.g.

Estimate size of the table using some pg_relation_size commands (for Postgres)
Execute select count(*) from table; to get the number of rows
Use the above two to define an appropriate export chunk size
Queue up queries to /api/v1/sqllab/execute which get chunks of the large table based on primary key (need to be smart about how to do this for it to be quick)
Poll /api/v1/query/$query_id to see when each query is finished
Download query results from /api/v1/sqllab/export/$client_id
Write to file(s)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to Handle Large Data Exports (10M+ Rows) in Superset? #33530

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How to Handle Large Data Exports (10M+ Rows) in Superset? #33530

Uh oh!

Bhav2308 May 20, 2025

Replies: 2 comments

Uh oh!

dosubot[bot] bot May 20, 2025

Uh oh!

Uh oh!

jklaise Aug 12, 2025

Bhav2308
May 20, 2025

dosubot[bot]
bot May 20, 2025

jklaise
Aug 12, 2025