Document flags for async endpoints

dagardner-nv · dagardner-nv · commit b25dc00479e8 · 2025-10-31T08:40:51.000-07:00
Signed-off-by: David Gardner &lt;dagardner@nvidia.com&gt;
diff --git a/docs/source/reference/api-server-endpoints.md b/docs/source/reference/api-server-endpoints.md
@@ -65,6 +65,18 @@ This endpoint is only available when the `async_endpoints` optional dependency e
 
 Asynchronous jobs are managed using [Dask](https://docs.dask.org/en/stable/). By default, a local Dask cluster is created at start time, however you can also configure the server to connect to an existing Dask scheduler by setting the `scheduler_address` configuration parameter. The Dask scheduler is used to manage the execution of asynchronous jobs, and can be configured to run on a single machine or across a cluster of machines. Job history and metadata is stored in a SQL database using [SQLAlchemy](https://www.sqlalchemy.org/). By default, a temporary SQLite database is created at start time, however you can also configure the server to use a persistent database by setting the `db_url` configuration parameter. Refer to the [SQLAlchemy documentation](https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls) for the format of the `db_url` parameter. Any database supported by [SQLAlchemy's Asynchronous I/O extension](https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html) can be used. Refer to [SQLAlchemy's Dialects](https://docs.sqlalchemy.org/en/20/dialects/index.html) for a complete list (many but not all of these support Asynchronous I/O).
 
+### Asynchronous Specific CLI flags
+The following CLI flags are available to configure the asynchronous generate endpoint when using `nat serve`:
+* --scheduler_address: The address of an existing Dask scheduler to connect to. If not set, a local Dask cluster will be created.
+* --db_url: The [SQLAlchemy database](https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls) URL to use for storing job history and metadata. If not set, a temporary SQLite database will be created.
+* --max_concurrent_jobs: The maximum number of asynchronous jobs to run concurrently. This controls the number of Dask workers created when a local Dask cluster is used. Default is 10. This is only used when `scheduler_address` is not set.
+* --dask_workers: The type of Dask workers to use. Options are `threads` for Threaded Dask workers or `processes` for Process based Dask workers. Default is `processes`. This is only used when `scheduler_address` is not set.
+* --dask_log_level: The logging level for Dask. Default is `WARNING`.
+
+:::{note}
+When processes are used Dask workers, standard output and standard error from the workflow will not be visible in the server logs, however threaded Dask workers will allow workflow output to be visible in the server logs. When multiple concurrent jobs are running using threaded Dask workers, workflow output from different jobs may be interleaved in the server logs.
+:::
+
 
 - **Route:** `/generate/async`
 - **Description:** A non-streaming transaction that submits a workflow to run in the background.