Preventing Duplicate Queries During Asynchronous Chart Data Processing #34316
Replies: 2 comments 1 reply
-
Hi @kch8306! I'm Dosu and I’m helping the superset team. Superset 3.1 does not have built-in deduplication or distributed locking for async chart queries—when multiple users hit a cold cache, each worker will execute the same query independently, causing the duplicate load you described. Your approach of using a CacheKey table as a lock is a valid pattern for serializing work, and it would prevent duplicate queries at the application level. However, be aware of potential race conditions and deadlocks, especially if a worker crashes or times out before releasing the lock. You may want to add a TTL or cleanup mechanism for stale locks. A more common approach in distributed systems is to use a distributed lock or mutex (for example, via Redis SETNX or Redlock) keyed by the cache key, which avoids the need for a database table and is generally more robust in high-concurrency environments. Superset already uses Redis for caching, so implementing a Redis-based lock could be more idiomatic and performant. If you stick with the DB table approach, make sure to handle edge cases like lock cleanup and error recovery. There’s no evidence of a built-in or recommended deduplication mechanism for async chart queries in Superset as of 3.1, so your solution would be a novel extension to the current architecture. For more on how caching and cache keys work in Superset, see this discussion and the async query code. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
We are using Superset 3.1 with Redis as our cache database. To handle long-running queries, we have enabled GLOBAL_ASYNC_QUERIES. However, we’ve noticed an issue: when a dashboard is loaded and the charts are not yet cached, if multiple users access the same dashboard simultaneously, the same queries are sent repeatedly to the database. This is inefficient and increases the load.
To address this, I’m considering modifying the "load_chart_data_into_cache" function in "/var/local/idk2/heartset/superset/tasks/async_queries.py" (proposed changes below). I’d like to hear feedback from the community before proceeding.
If you have a specific code diff or further context you want included, let me know and I can update the body!
Beta Was this translation helpful? Give feedback.
All reactions