Preventing Duplicate Queries During Asynchronous Chart Data Processing #34316

kch8306 · 2025-07-25T14:29:09Z

kch8306
Jul 25, 2025

Hello!

We are using Superset 3.1 with Redis as our cache database. To handle long-running queries, we have enabled GLOBAL_ASYNC_QUERIES. However, we’ve noticed an issue: when a dashboard is loaded and the charts are not yet cached, if multiple users access the same dashboard simultaneously, the same queries are sent repeatedly to the database. This is inefficient and increases the load.

To address this, I’m considering modifying the "load_chart_data_into_cache" function in "/var/local/idk2/heartset/superset/tasks/async_queries.py" (proposed changes below). I’d like to hear feedback from the community before proceeding.

If you have a specific code diff or further context you want included, let me know and I can update the body!

@celery_app.task(name="load_chart_data_into_cache", soft_time_limit=query_timeout)
def load_chart_data_into_cache(
    job_metadata: dict[str, Any],
    form_data: dict[str, Any],
) -> None:
    # pylint: disable=import-outside-toplevel
    from superset.commands.chart.data.get_data_command import ChartDataCommand

    with override_user(_load_user_from_job_metadata(job_metadata), force=False):
        try:
            set_form_data(form_data)
            query_context = _create_query_context_from_form(form_data)
            t_cache_key = query_context._processor.cache_key().removeprefix("qc-")

            session = db.create_scoped_session()
            try:
                cache_entry = session.query(CacheKey).filter_by(cache_key=t_cache_key).first()
                if cache_entry is None:
                    new_cache_entry = CacheKey(cache_key=t_cache_key, datasource_uid=1)
                    session.add(new_cache_entry)
                    session.commit()
                else:
                    session.close()
                    while True:
                        check_session = db.create_scoped_session()
                        try:
                            cache_entry_cnt = check_session.query(CacheKey).filter_by(cache_key=t_cache_key).count()
                            if cache_entry_cnt == 0:
                                break
                        finally:
                            check_session.close()
                        logger.warning("=================wait=================:%s", t_cache_key)
                        time.sleep(5)
            except Exception as e:
                session.rollback()
                logger.error("Error in cache entry creation: %s", str(e))
                raise e
            finally:
                session.close()

            command = ChartDataCommand(query_context)
            result = command.run(cache=True)
            cache_key = result["cache_key"]
            result_url = f"/api/v1/chart/data/{cache_key}"

            delete_session = db.create_scoped_session()
            try:
                cache_entry = delete_session.query(CacheKey).filter_by(cache_key=t_cache_key).first()
                if cache_entry:
                    delete_session.delete(cache_entry)
                    delete_session.commit()
            except Exception as e:
                delete_session.rollback()
                logger.error("Error in cache entry deletion: %s", str(e))
                raise e
            finally:
                delete_session.close()

            async_query_manager.update_job(
                job_metadata,
                async_query_manager.STATUS_DONE,
                result_url=result_url,
            )

        except SoftTimeLimitExceeded as ex:
            logger.warning("A timeout occurred while loading chart data, error: %s", ex)
            raise ex
        except Exception as ex:
            # TODO: QueryContext should support SIP-40 style errors
            error = str(ex.message if hasattr(ex, "message") else ex)
            errors = [{"message": error}]
            async_query_manager.update_job(
                job_metadata, async_query_manager.STATUS_ERROR, errors=errors
            )
            raise ex

@kch8306 · 2025-07-25T14:32:47Z

dosubot[bot]
bot Jul 25, 2025

Hi @kch8306! I'm Dosu and I’m helping the superset team.

Superset 3.1 does not have built-in deduplication or distributed locking for async chart queries—when multiple users hit a cold cache, each worker will execute the same query independently, causing the duplicate load you described. Your approach of using a CacheKey table as a lock is a valid pattern for serializing work, and it would prevent duplicate queries at the application level. However, be aware of potential race conditions and deadlocks, especially if a worker crashes or times out before releasing the lock. You may want to add a TTL or cleanup mechanism for stale locks.

A more common approach in distributed systems is to use a distributed lock or mutex (for example, via Redis SETNX or Redlock) keyed by the cache key, which avoids the need for a database table and is generally more robust in high-concurrency environments. Superset already uses Redis for caching, so implementing a Redis-based lock could be more idiomatic and performant.

If you stick with the DB table approach, make sure to handle edge cases like lock cleanup and error recovery. There’s no evidence of a built-in or recommended deduplication mechanism for async chart queries in Superset as of 3.1, so your solution would be a novel extension to the current architecture. For more on how caching and cache keys work in Superset, see this discussion and the async query code.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

0 replies

kch8306 · 2025-07-27T11:08:32Z

kch8306
Jul 27, 2025
Author

@dosu

@celery_app.task(name="load_chart_data_into_cache", soft_time_limit=query_timeout)
def load_chart_data_into_cache(
    job_metadata: dict[str, Any],
    form_data: dict[str, Any],
) -> None:
    # pylint: disable=import-outside-toplevel
    from superset.commands.chart.data.get_data_command import ChartDataCommand

    redis_host = os.environ.get("REDIS_HOST", "172.20.0.1")
    redis_port = int(os.environ.get("REDIS_PORT", 26840))
    redis_db = int(os.environ.get("REDIS_DB", 9))
    redis_password = os.environ.get("REDIS_PASSWORD", None)

    redis_client = redis.StrictRedis(
        host=redis_host,
        port=redis_port,
        db=redis_db,
        password=redis_password,
        decode_responses=True
    )

    cache_entry_created = False
    LOCK_KEY_PREFIX = "superset:chart_data_lock:"
    LOCK_EXPIRATION_SECONDS = 600
    INITIAL_WAIT_SECONDS = 1
    MAX_WAIT_TIME_SECONDS = 600
    MAX_WAIT_SECONDS = 30

    with override_user(_load_user_from_job_metadata(job_metadata), force=False):
        try:
            set_form_data(form_data)
            query_context = _create_query_context_from_form(form_data)
            t_cache_key = query_context._processor.cache_key().removeprefix("qc-")
            logger.warning("=================t_cache_key: \n%s", t_cache_key)
            lock_key = f"{LOCK_KEY_PREFIX}{t_cache_key}"
            logger.warning("=================lock_key: \n%s", lock_key)

            lock_acquired_by_this_task = redis_client.set(lock_key, "locked", nx=True, ex=LOCK_EXPIRATION_SECONDS)
            if not lock_acquired_by_this_task:
                logger.warning(
                    "=================Failed to acquire Redis lock for %s. Another task is likely processing. Waiting...",
                    t_cache_key
                )
                current_wait_time = INITIAL_WAIT_SECONDS
                total_waited_time = 0
                while redis_client.exists(lock_key) and total_waited_time < MAX_WAIT_TIME_SECONDS:
                    logger.info(
                        "Waiting for lock %s to be released. Sleeping for %s seconds. Total waited: %s/%s",
                        lock_key, current_wait_time, total_waited_time, MAX_WAIT_TIME_SECONDS
                    )
                    time.sleep(current_wait_time)
                    total_waited_time += current_wait_time
                    current_wait_time = min(MAX_WAIT_SECONDS, current_wait_time * 2)

                logger.warning("=================Redis lock %s released. Proceeding to execute ChartDataCommand.", lock_key)

            else:
                logger.warning("=================Successfully acquired Redis lock for %s.", t_cache_key)

            command = ChartDataCommand(query_context)
            result = command.run(cache=True)
            cache_key = result["cache_key"]
            logger.warning("=================cache_key: \n%s", cache_key)
            result_url = f"/api/v1/chart/data/{cache_key}"

            async_query_manager.update_job(
                job_metadata,
                async_query_manager.STATUS_DONE,
                result_url=result_url,
            )

        except SoftTimeLimitExceeded as ex:
            logger.warning("A timeout occurred while loading chart data, error: %s", ex)
            raise ex
        except Exception as ex:
            # TODO: QueryContext should support SIP-40 style errors
            error = str(ex.message if hasattr(ex, "message") else ex)
            errors = [{"message": error}]
            async_query_manager.update_job(
                job_metadata, async_query_manager.STATUS_ERROR, errors=errors
            )
            raise ex
        finally:
            if lock_acquired_by_this_task and t_cache_key:
                try:
                    redis_client.delete(lock_key)
                    logger.warning("=================Redis lock released for %s.", t_cache_key)
                except Exception as e:
                    logger.error("Error releasing Redis lock for %s: %s", t_cache_key, str(e))

1 reply

dosubot[bot] bot Jul 27, 2025

Your Redis-based locking approach is a solid way to prevent duplicate async chart queries in Superset, since the core system does not provide any built-in distributed locking or deduplication for this use case (source). Using SETNX with an expiration and exponential backoff is a proven pattern.

A few things to watch for: if a worker crashes or is killed, the lock may be orphaned until expiration, and if the lock expires before the query finishes, you could get duplicate queries (the "thundering herd" problem). The lock TTL (LOCK_EXPIRATION_SECONDS) is critical—too short and you risk duplicate work, too long and you risk lock contention or abandoned locks. Also, since Redis does not guarantee that only the lock owner can release the lock, consider using a unique value (token) per lock and only deleting if the value matches, to avoid accidental unlocks by other workers.

For extra robustness, you might look into the Redlock algorithm or implement lock renewal if queries can run longer than your TTL. Monitoring for repeated lock acquisition failures or long waits can help you tune these values and catch issues early.

Overall, your solution is a well-motivated extension to Superset’s async query flow and should help reduce DB load under concurrent dashboard access.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Preventing Duplicate Queries During Asynchronous Chart Data Processing #34316

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Preventing Duplicate Queries During Asynchronous Chart Data Processing #34316

Uh oh!

Uh oh!

kch8306 Jul 25, 2025

Replies: 2 comments · 1 reply

Uh oh!

dosubot[bot] bot Jul 25, 2025

Uh oh!

Uh oh!

kch8306 Jul 27, 2025 Author

Uh oh!

dosubot[bot] bot Jul 27, 2025

kch8306
Jul 25, 2025

Replies: 2 comments 1 reply

dosubot[bot]
bot Jul 25, 2025

kch8306
Jul 27, 2025
Author