You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ramalama-serve.1.md
+98-1Lines changed: 98 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -167,7 +167,7 @@ llama.cpp explains this as:
167
167
168
168
The higher the number is the more creative the response is, but more likely to hallucinate when set too high.
169
169
170
-
Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories
170
+
Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories
171
171
172
172
#### **--threads**, **-t**
173
173
Maximum number of cpu threads to use.
@@ -340,6 +340,103 @@ spec:
340
340
name: dri
341
341
```
342
342
343
+
### Generate a Llama Stack Kubernetes YAML file named MyLamaStack
is "${lines[0]}""Error: --nocontainer and --name options conflict. The --name option requires a container.""conflict between nocontainer and --name line"
60
+
run_ramalama -q --dryrun serve ${model}
61
+
assert "$output" =~".*--host 0.0.0.0""Outside container sets host to 0.0.0.0"
62
+
is "$output"".*--cache-reuse 256""should use cache"
63
+
if is_darwin;then
64
+
is "$output"".*--flash-attn""use flash-attn on Darwin metal"
is "${lines[0]}""Error: --nocontainer and --name options conflict. The --name option requires a container.""conflict between nocontainer and --name line"
0 commit comments