README: fix model name and improve CUDA section

mbortoli · mbortoli · commit 3e8997525537 · 2025-07-14T08:36:18.000-03:00
- Corrected the model name under the Benchmark section; previous name was not available in Ollama's registry.

- Added instructions to switch between CPU-only mode and using all available GPUs via CUDA_VISIBLE_DEVICES.

Signed-off-by: Mario Antonio Bortoli Filho &lt;mario@bortoli.dev&gt;
diff --git a/README.md b/README.md
@@ -224,7 +224,7 @@ $ cat /usr/share/ramalama/shortnames.conf
 	<br>
 
 	```
-	$ ramalama bench granite-moe3
+	$ ramalama bench granite3-moe
 	```
 </details>
 
@@ -831,7 +831,7 @@ $ cat /usr/share/ramalama/shortnames.conf
 
 	Perplexity measures how well the model can predict the next token with lower values being better
 	```
-	$ ramalama perplexity granite-moe3
+	$ ramalama perplexity granite3-moe
 	```
 </details>
 
diff --git a/docs/ramalama-bench.1.md b/docs/ramalama-bench.1.md
@@ -148,7 +148,7 @@ Benchmark specified AI Model.
 ## EXAMPLES
 
 ```
-ramalama bench granite-moe3
+ramalama bench granite3-moe
 ```
 
 ## SEE ALSO
diff --git a/docs/ramalama-cuda.7.md b/docs/ramalama-cuda.7.md
@@ -137,6 +137,19 @@ ramalama run granite
 
 This is particularly useful in multi-GPU systems where you want to dedicate specific GPUs to different workloads.
 
+If the `CUDA_VISIBLE_DEVICES` environment variable is set to an empty string, RamaLama will default to using the CPU.
+
+```bash
+export CUDA_VISIBLE_DEVICES=""  # Defaults to CPU
+ramalama run granite
+```
+
+To revert to using all available GPUs, unset the environment variable:
+
+```bash
+unset CUDA_VISIBLE_DEVICES
+```
+
 ## Troubleshooting
 
 ### CUDA Updates
diff --git a/docs/ramalama-perplexity.1.md b/docs/ramalama-perplexity.1.md
@@ -156,7 +156,7 @@ Calculate the perplexity of an AI Model. Perplexity measures how well the model
 ## EXAMPLES
 
 ```
-ramalama perplexity granite-moe3
+ramalama perplexity granite3-moe
 ```
 
 ## SEE ALSO