How to Handle Large Data Exports (10M+ Rows) in Superset? #33530
Replies: 2 comments
-
To handle large data exports in Superset efficiently, especially for datasets over 10 million rows, consider the following strategies:
These strategies can help manage memory usage and improve performance during large data exports in Superset. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Beta Was this translation helpful? Give feedback.
-
I was recently investigating this and it seems this is not something that's supported server-side - see more info in my comment here: #33243 (comment) The way I would do it as a workaround would to build a client SDK with some logic to split large tables on known primary keys and download them piece-by-piece, e.g.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently using Superset v4.1.2 (deployed via helm in Kubernetes).
We’re trying to export very large datasets (over 10 million rows) from Superset and observed that Superset loads all data into memory, causing high memory usage which leads to slow performance or even crashes.
Are there any recommended strategies for handling this? Like making use of batch exports (e.g., load 200 rows at a time) for CSV downloads to avoid overloading memory? Can background workers or streaming help manage this more efficiently?
OR Streaming data directly to file without loading everything into memory.
We want a scalable and reliable way to support large data exports without hurting performance.
Beta Was this translation helpful? Give feedback.
All reactions