Skip to content

Commit df64f98

Browse files
committed
Don't warmup by default
llama-server by default warms up the model with an empty run for performance reasons. We can warm up ourselves with a real query. Warming up was causing issues and delays start time. Signed-off-by: Eric Curtin <[email protected]>
1 parent f07a062 commit df64f98

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

ramalama/model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -524,7 +524,7 @@ def build_exec_args_serve(self, args, exec_model_path, chat_template_path="", mm
524524
draft_model = self.draft_model.get_model_path(args)
525525
draft_model_path = MNT_FILE_DRAFT if args.container or args.generate else draft_model
526526

527-
exec_args += ["llama-server", "--port", args.port, "--model", exec_model_path]
527+
exec_args += ["llama-server", "--port", args.port, "--model", exec_model_path, "--no-warmup"]
528528
if mmproj_path:
529529
exec_args += ["--mmproj", mmproj_path]
530530
else:

0 commit comments

Comments
 (0)