Running with MLX runtime cannot find an existing model

### Issue Description

I'm running on a M2 Max with ramalama 0.11.1 and mlx 0.26.5. I have trouble running or serving a model from huggingface. Even though I can pull the model just fine, I always get an error about the model not being found.

### Steps to reproduce the issue

1. Pull the model
```sh
ramalama pull hf://mlx-community/Llama-3.2-1B-Instruct-4bit        
Downloading hf://mlx-community/Llama-3.2-1B-Instruct-4bit ...
Trying to pull hf://mlx-community/Llama-3.2-1B-Instruct-4bit ...
Fetching 8 files:   0%|                                                    | 0/8 [00:00<?, ?it/s]Downloading 'tokenizer.json' to '/var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/.cache/huggingface/download/HgM_lKo9sdSCfRtVg7MMFS7EKqo=.6b9e4e7fb171f92fd137b777cc2714bf87d11576700a1dcd7a399e7bbe39537b.incomplete'
Downloading 'model.safetensors.index.json' to '/var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/.cache/huggingface/download/yVzAsSxRSINSz-tQbpx-TLpfkLU=.32101c2481caabb396a3b36c3fd8b219b0da9c2c.incomplete'
Downloading '.gitattributes' to '/var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/.cache/huggingface/download/wPaCkH-WbT7GsmxMKKrNZTV4nSM=.52373fe24473b1aa44333d318f578ae6bf04b49b.incomplete'
Downloading 'tokenizer_config.json' to '/var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/.cache/huggingface/download/vzaExXFZNBay89bvlQv-ZcI6BTg=.6568c91f9cdd35e8ac07b8ff0c201f7e835affc8.incomplete'
Downloading 'special_tokens_map.json' to '/var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/.cache/huggingface/download/ahkChHUJFxEmOdq5GDFEmerRzCY=.02ee80b6196926a5ad790a004d9efd6ab1ba6542.incomplete'
model.safetensors.index.json: 26.2kB [00:00, 110MB/s]
Download complete. Moving file to /var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/model.safetensors.index.json
.gitattributes: 1.57kB [00:00, 19.1MB/s]
Download complete. Moving file to /var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/.gitattributes
tokenizer_config.json: 54.6kB [00:00, 206MB/s]                     | 1/8 [00:00<00:00,  8.77it/s]
Download complete. Moving file to /var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/tokenizer_config.json
special_tokens_map.json: 100%|██████████████████████████████████| 296/296 [00:00<00:00, 2.67MB/s]
Download complete. Moving file to /var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/special_tokens_map.json
                                                                                                Downloading 'config.json' to '/var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/.cache/huggingface/download/8_PA_wEVGiVa2goH2H4KQOQpvVY=.25e549e5bb9b201031726870cec84fd9bef3d707.incomplete'
config.json: 1.12kB [00:00, 1.71MB/s]
Download complete. Moving file to /var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/config.jsonon: 0.00B [00:00, ?B/s]
Downloading 'model.safetensors' to '/var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/.cache/huggingface/download/xGOKKLRSlIhH692hSVvI1-gpoa8=.35e396644bca888eec399f9c0f843ec7fa78b8f8c5e06841661be62b4edf96dd.incomplete'
Downloading 'README.md' to '/var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/.cache/huggingface/download/Xn7B-BWUGOee2Y6hCZtEhtFu4BE=.f8c048069e67e62805503ba050832af2e69a210b.incomplete'
README.md: 16.3kB [00:00, 12.0MB/s]
Download complete. Moving file to /var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/README.mdmd: 0.00B [00:00, ?B/s]
tokenizer.json: 100%|███████████████████████████████████████| 17.2M/17.2M [00:00<00:00, 20.2MB/s]
Download complete. Moving file to /var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/tokenizer.jsonnsors:   0%|                                              | 0.00/695M [00:00<?, ?B/s]
model.safetensors: 100%|███████████████████████████████████████| 695M/695M [00:03<00:00, 191MB/s]
Download complete. Moving file to /var/folders/7_/3zhq5rhd78d4vc04h30qzmz80000gp/T/tmpdi28_w_v/model.safetensorsrs: 100%|███████████████████████████████████████| 695M/695M [00:03<00:00, 200MB/s]
Fetching 8 files: 100%|████████████████████████████████████████████| 8/8 [00:03<00:00,  2.07it/s]
```

**listing the models**
```sh
ramalama list
NAME                                          MODIFIED     SIZE     
hf://mlx-community/Llama-3.2-1B-Instruct-4bit 55 years ago 0 B      
hf://mlx-community/gemma-3-12b-it-qat-4bit    55 years ago 0 B      
ollama://granite3-moe/granite3-moe:latest     2 weeks ago  783.77 MB
```
(noting that the size and modified values are a bit weird)

2. Run or serve the model with the MLX runtime
```sh
ramalama --runtime=mlx --nocontainer --debug --engine=docker run hf://mlx-community/Llama-3.2-1B-Instruct-4bit
```

### Describe the results you received

I get this error when running the `run` or `serve` commands with the mlx runtime

```sh
ramalama --runtime=mlx --nocontainer --debug --engine=docker run hf://mlx-community/Llama-3.2-1B-Instruct-4bit
2025-07-23 16:50:44 - DEBUG - run_cmd: npu-smi info
2025-07-23 16:50:44 - DEBUG - Working directory: None
2025-07-23 16:50:44 - DEBUG - Ignore stderr: False
2025-07-23 16:50:44 - DEBUG - Ignore all: False
2025-07-23 16:50:44 - DEBUG - run_cmd: mthreads-gmi
2025-07-23 16:50:44 - DEBUG - Working directory: None
2025-07-23 16:50:44 - DEBUG - Ignore stderr: False
2025-07-23 16:50:44 - DEBUG - Ignore all: False
2025-07-23 16:50:44 - DEBUG - Checking if 8080 is available
2025-07-23 16:50:44 - DEBUG - MLX server not ready, waiting... (attempt 1/10)
2025-07-23 16:50:44 - DEBUG - Checking if 8080 is available
Traceback (most recent call last):
  File "/opt/homebrew/bin/ramalama", line 8, in <module>
    sys.exit(main())
             ~~~~^^
  File "/opt/homebrew/Cellar/ramalama/0.11.1/libexec/lib/python3.13/site-packages/ramalama/cli.py", line 1248, in main
    args.func(args)
    ~~~~~~~~~^^^^^^
  File "/opt/homebrew/Cellar/ramalama/0.11.1/libexec/lib/python3.13/site-packages/ramalama/cli.py", line 986, in run_cli
    model.serve(args, quiet=True) if args.rag else model.run(args)
                                                   ~~~~~~~~~^^^^^^
  File "/opt/homebrew/Cellar/ramalama/0.11.1/libexec/lib/python3.13/site-packages/ramalama/model.py", line 358, in run
    self._start_server(args)
    ~~~~~~~~~~~~~~~~~~^^^^^^
  File "/opt/homebrew/Cellar/ramalama/0.11.1/libexec/lib/python3.13/site-packages/ramalama/model.py", line 369, in _start_server
    self.serve(args, True)
    ~~~~~~~~~~^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/ramalama/0.11.1/libexec/lib/python3.13/site-packages/ramalama/model.py", line 739, in serve
    exec_args = self.build_exec_args_serve(args)
  File "/opt/homebrew/Cellar/ramalama/0.11.1/libexec/lib/python3.13/site-packages/ramalama/model.py", line 642, in build_exec_args_serve
    exec_args = self.mlx_serve(args)
  File "/opt/homebrew/Cellar/ramalama/0.11.1/libexec/lib/python3.13/site-packages/ramalama/model.py", line 636, in mlx_serve
    return self._build_mlx_exec_args("server", args, extra)
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/ramalama/0.11.1/libexec/lib/python3.13/site-packages/ramalama/model.py", line 472, in _build_mlx_exec_args
    shlex.quote(self._get_entry_model_path(args.container, args.generate, args.dryrun)),
                ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/ramalama/0.11.1/libexec/lib/python3.13/site-packages/ramalama/model.py", line 189, in _get_entry_model_path
    raise NoRefFileFound(self.model)
ramalama.model.NoRefFileFound: No ref file or models found for 'mlx-community/Llama-3.2-1B-Instruct-4bit'. Please pull model.
```

### Describe the results you expected

I should be able to `run` or `serve` the model with the mlx runtime without issue

### ramalama info output

```yaml
{
    "Accelerator": "none",
    "Engine": {
        "Name": null
    },
    "Image": "quay.io/ramalama/ramalama:latest",
    "Runtime": "llama.cpp",
    "Selinux": false,
    "Shortnames": {
        "Files": [
            "/opt/homebrew/Cellar/ramalama/0.11.1/libexec/share/ramalama/shortnames.conf"
        ],
        "Names": {
            "cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
            "deepseek": "ollama://deepseek-r1",
            "dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
            "gemma3": "hf://ggml-org/gemma-3-4b-it-GGUF",
            "gemma3:12b": "hf://ggml-org/gemma-3-12b-it-GGUF",
            "gemma3:1b": "hf://ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf",
            "gemma3:27b": "hf://ggml-org/gemma-3-27b-it-GGUF",
            "gemma3:4b": "hf://ggml-org/gemma-3-4b-it-GGUF",
            "gemma3n": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
            "gemma3n:e2b": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-Q8_0.gguf",
            "gemma3n:e2b-it-f16": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-f16.gguf",
            "gemma3n:e2b-it-q8_0": "hf://ggml-org/gemma-3n-E2B-it-GGUF/gemma-3n-E2B-it-Q8_0.gguf",
            "gemma3n:e4b": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
            "gemma3n:e4b-it-f16": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-f16.gguf",
            "gemma3n:e4b-it-q8_0": "hf://ggml-org/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q8_0.gguf",
            "granite": "ollama://granite3.1-dense",
            "granite-code": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
            "granite-code:20b": "hf://ibm-granite/granite-20b-code-base-8k-GGUF/granite-20b-code-base.Q4_K_M.gguf",
            "granite-code:34b": "hf://ibm-granite/granite-34b-code-base-8k-GGUF/granite-34b-code-base.Q4_K_M.gguf",
            "granite-code:3b": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
            "granite-code:8b": "hf://ibm-granite/granite-8b-code-base-4k-GGUF/granite-8b-code-base.Q4_K_M.gguf",
            "granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite-lab-8b": "huggingface://ibm-granite/granite-8b-code-base-GGUF/granite-8b-code-base.Q4_K_M.gguf",
            "granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite:2b": "ollama://granite3.1-dense:2b",
            "granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite:8b": "ollama://granite3.1-dense:8b",
            "hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
            "ibm/granite": "ollama://granite3.1-dense:8b",
            "ibm/granite:2b": "ollama://granite3.1-dense:2b",
            "ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "ibm/granite:8b": "ollama://granite3.1-dense:8b",
            "merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "mistral": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
            "mistral-small3.1": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
            "mistral-small3.1:24b": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
            "mistral:7b": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
            "mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
            "mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
            "mistral:7b-v3": "hf://lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf",
            "mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
            "mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
            "mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
            "openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
            "openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
            "phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
            "qwen2.5vl": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
            "qwen2.5vl:2b": "hf://ggml-org/Qwen2.5-VL-2B-Instruct-GGUF",
            "qwen2.5vl:32b": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
            "qwen2.5vl:3b": "hf://ggml-org/Qwen2.5-VL-3B-Instruct-GGUF",
            "qwen2.5vl:7b": "hf://ggml-org/Qwen2.5-VL-7B-Instruct-GGUF",
            "smollm:135m": "ollama://smollm:135m",
            "smolvlm": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
            "smolvlm:256m": "hf://ggml-org/SmolVLM-256M-Instruct-GGUF",
            "smolvlm:2b": "hf://ggml-org/SmolVLM-Instruct-GGUF",
            "smolvlm:500m": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
            "tiny": "ollama://tinyllama"
        }
    },
    "Store": "/Users/bobby/.local/share/ramalama",
    "UseContainer": false,
    "Version": "0.11.1"
}
```

### Upstream Latest Release

Yes

### Additional environment details

I have not messed with the default store location or anything else with environment variables.

### Additional information

Here are the contents of the Llama-3.2-1B-Instruct-4bit/refs/latest.json

```json
{
  "files": [
    {
      "hash": "sha256:35e396644bca888eec399f9c0f843ec7fa78b8f8c5e06841661be62b4edf96dd",
      "name": "model.safetensors",
      "type": "other"
    },
    {
      "hash": "sha256:6568c91f9cdd35e8ac07b8ff0c201f7e835affc8",
      "name": "tokenizer_config.json",
      "type": "other"
    },
    {
      "hash": "sha256:02ee80b6196926a5ad790a004d9efd6ab1ba6542",
      "name": "special_tokens_map.json",
      "type": "other"
    },
    {
      "hash": "sha256:25e549e5bb9b201031726870cec84fd9bef3d707",
      "name": "config.json",
      "type": "other"
    },
    {
      "hash": "sha256:6b9e4e7fb171f92fd137b777cc2714bf87d11576700a1dcd7a399e7bbe39537b",
      "name": "tokenizer.json",
      "type": "other"
    },
    {
      "hash": "sha256:32101c2481caabb396a3b36c3fd8b219b0da9c2c",
      "name": "model.safetensors.index.json",
      "type": "other"
    }
  ],
  "hash": "sha256-f8c048069e67e62805503ba050832af2e69a210b",
  "path": "/Users/bobby/.local/share/ramalama/store/huggingface/mlx-community/Llama-3.2-1B-Instruct-4bit/refs/latest.json",
  "version": "v1.0"
}
```

I'm hypothesizing that this ramalama thinks there are no model files because of this returning an empty array https://github.com/containers/ramalama/blob/95eeef95f309f94577ead4dba8a4f532c9e5d057/ramalama/model_store/reffile.py#L123 . That is a total stab in the dark, however.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running with MLX runtime cannot find an existing model #1735

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

ramalama info output

Upstream Latest Release

Additional environment details

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running with MLX runtime cannot find an existing model #1735

Description

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

ramalama info output

Upstream Latest Release

Additional environment details

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions