[BUG]: Performance degradation in 0.7.0

### Describe the Bug

I'm seeing a performance degradation between dynamo 0.6.1 and 0.7.0

I am using LMBenchmark with 30 users on their synthetic multi qa test. I ONLY test 2 QPS
I configured a /raid device with the following fio results:

Finally, I set it up for disk offload. In 0.6.1 I was getting a TTFT of 1.27 seconds. Now I'm seeing a TTFT of 8.23 seconds.

### Steps to Reproduce

Pull from main
start dynamo container using ./container/run.sh script.

Once inside, start vllm with dynamo:
DYN_KVBM_CPU_CACHE_GB=50 \
DYN_KVBM_DISK_CACHE_GB=500 \
DYN_KVBM_DISK_CACHE_DIR=/raid \
DYN_KVBM_DISABLE_DISK_OFFLOAD_FILTER=1 \
DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS=120 \
vllm serve --port 8000 \
--block-size 128 \
--gpu-memory-utilization 0.7 \
--disable-log-requests \
--kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv_both", "kv_connector_module_path": "kvbm.vllm_integration.connector"}' \
Qwen/Qwen3-8B 

I'm on a H100 machine so the above command gives the following stats:

GPU Utilization: 0.7 (70%) 
G1 (GPU): 38 GiB 
G2 (CPU): 50 GB 
G3 (Disk): 500 GB 
Users: 30 => 96 GB 
Rounds: 20 
Model: Qwen3-8B

Clone LMBenchmark's tests: https://github.com/LMCache/LMBenchmark
in LMBenchmark/synthetic-multi-round-qa
run 
long_input_short_output_run.sh Qwen/Qwen3-8B http://localhost:8000 /tmp 2

Note: you must modify long_input_short_output_run.sh to use 30 users instead of 15 users otherwise you will never offload to disk with settings above.

### Expected Behavior

Under 1.5 seconds

### Actual Behavior

12.7 seconds

### Environment

H100
Ubuntu 22.04.4 LTS

NV_LIBCUBLAS_VERSION=12.8.4.1-1
NVIDIA_VISIBLE_DEVICES=all
NV_NVTX_VERSION=12.8.90-1
NV_LIBCUSPARSE_VERSION=12.5.8.93-1
NV_LIBNPP_VERSION=12.3.3.100-1
NCCL_VERSION=2.25.1-1
PWD=/workspace
NIXL_LIB_DIR=/opt/nvidia/nvda_nixl/lib/x86_64-linux-gnu
NIXL_PREFIX=/opt/nvidia/nvda_nixl
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NV_LIBNPP_PACKAGE=libnpp-12-8=12.3.3.100-1
NVIDIA_PRODUCT_NAME=CUDA
NV_CUDA_CUDART_VERSION=12.8.90-1
HOME=/home/dynamo
VIRTUAL_ENV=/opt/dynamo/venv
CUDA_VERSION=12.8.1
NV_LIBCUBLAS_PACKAGE=libcublas-12-8=12.8.4.1-1
DYNAMO_COMMIT_SHA=f49d6873e417ef82090ed492ef00b6939bd5a8d0
NIXL_PLUGIN_DIR=/opt/nvidia/nvda_nixl/lib/x86_64-linux-gnu/plugins
NV_LIBCUBLAS_PACKAGE_NAME=libcublas-12-8
TERM=xterm
SHLVL=1
NV_CUDA_LIB_VERSION=12.8.1-1
NVARCH=x86_64
VIRTUAL_ENV_PROMPT=venv
NV_LIBNCCL_PACKAGE=libnccl2=2.25.1-1+cuda12.8
LD_LIBRARY_PATH=/opt/vllm/tools/ep_kernels/ep_kernels_workspace/nvshmem_install/lib:/opt/nvidia/nvda_nixl/lib/x86_64-linux-gnu:/opt/nvidia/nvda_nixl/lib/x86_64-linux-gnu/plugins:/usr/local/ucx/lib:/usr/local/ucx/lib/ucx:/usr/local/cuda/lib64
PS1=\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\u@\h:\w\$
PATH=/opt/dynamo/venv/bin:/usr/local/ucx/bin:/usr/local/bin/etcd/:/usr/local/cuda/nvvm/bin:/opt/dynamo/venv/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
NV_LIBNCCL_PACKAGE_NAME=libnccl2
DYNAMO_HOME=/opt/dynamo
NV_LIBNCCL_PACKAGE_VERSION=2.25.1-1
CPATH=/usr/local/cuda/include

### Additional Context

_No response_

### Screenshots

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]: Performance degradation in 0.7.0 #4774

Describe the Bug

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Screenshots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: Performance degradation in 0.7.0 #4774

Description

Describe the Bug

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Screenshots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions