Skip to content

Commit 2b70b50

Browse files
Merge pull request #341 from charitarthchugh/charitarthchugh/vllm-defaults-speedup
Add chunked prefill and limit mm per prompt options
2 parents f4356de + fe425fd commit 2b70b50

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

olmocr/pipeline.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -636,6 +636,8 @@ async def vllm_server_task(model_name_or_path, args, semaphore, unknown_args=Non
636636
str(args.tensor_parallel_size),
637637
"--data-parallel-size",
638638
str(args.data_parallel_size),
639+
"--enable-chunked-prefill",
640+
"--limit-mm-per-prompt '{\"video\": 0}'"
639641
]
640642

641643
if args.gpu_memory_utilization is not None:

0 commit comments

Comments
 (0)