Releases: jorgealias/llama.cpp
Releases · jorgealias/llama.cpp
b3913
flake.lock: Update (#9870)
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/bc947f541ae55e999ffdb4013441347d83b00feb?narHash=sha256-NOiTvBbRLIOe5F6RbHaAh6%2B%2BBNjsb149fGZd1T4%2BKBg%3D' (2024-10-04)
→ 'github:NixOS/nixpkgs/5633bcff0c6162b9e4b5f1264264611e950c8ec7?narHash=sha256-9UTxR8eukdg%2BXZeHgxW5hQA9fIKHsKCdOIUycTryeVw%3D' (2024-10-09)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
b3780
[SYCL]set context default value to avoid memory issue, update guide (…
b3772
ggml : move common CPU backend impl to new header (#9509)
b3755
ggml : ggml_type_name return "NONE" for invalid values (#9458) When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.
b3687
llama.android : fix build (#9350)
b3620
CPU/CUDA: Gemma 2 FlashAttention support (#8542) * CPU/CUDA: Gemma 2 FlashAttention support * apply logit_softcap to scale in kernel * disable logit softcapping tests on Metal * remove metal check
b3602
flake.lock: Update (#9068)
b3570
gguf-py : Numpy dequantization for most types (#8939) * gguf-py : Numpy dequantization for most types * gguf-py : Numpy dequantization for grid-based i-quants
b3511
Install curl in runtime layer (#8693)
b3488
ggml: bugfix: fix the inactive elements is agnostic for risc-v vector…