Create alternative requirements.txt with AMD and Metal wheels #4052

oobabooga · 2023-09-23T22:48:42Z

This is an attempt at making the one-click installer more universal.

Added requirements_amd.txt with all the AMD wheels that I could find, replacing the manual install commands in one_click.py.
Renamed requirements_nowheels.txt to requirements_minimal.txt and added the AVX2 version of llama-cpp-python there.
Added requirements_minimal_noavx2.txt, identical to the previous one but with the llama-cpp-python wheels without AVX2 by @jllllll.
Added requirements_mac.txt with the only wheel with "mac" in its name that I could find (for llama-cpp-python). I don't know if it's useful.
Added a function that detects if the CPU has AVX2.
Changed the one-click installer to install requirements_amd.txt when +rocm is present, requirements_minimal when +cpu is present, and requirements.txt otherwise.

An open question is whether it makes sense to have avx2 and no_avx2 versions of each requirements.txt. If so, there will be at least 6 to 8 requirements.txt in the end.

Also, one issue I found is that on Linux + CUDA, pip show torch currently returns the following string that has no mention of +cu in the Version line, causing is_cuda to be set to False.

Name: torch
Version: 2.0.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Requires: filelock, jinja2, networkx, nvidia-cublas-cu11, nvidia-cuda-cupti-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cufft-cu11, nvidia-curand-cu11, nvidia-cusolver-cu11, nvidia-cusparse-cu11, nvidia-nccl-cu11, nvidia-nvtx-cu11, sympy, triton, typing-extensions
Required-by: torchaudio, torchvision, triton

@jllllll what do you think of these changes?

jllllll · 2023-09-24T00:59:57Z

Only thing that stands out to me is that the webui currently uses cuBLAS wheels for llama-cpp-python under the name of llama-cpp-python-cuda.

All of the llama-cpp-python-cuda wheels that I've built can be easily browsed through here:
https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/textgen/

All of the typical llama-cpp-python wheels I have built are here: https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels
These also include CPU-only wheels. The CPU-only wheels have also been built with MacOS versions as well, though there are 3 major versions of them and I'm not sure how to separate them in requirements.txt or if that is even necessary. These wheels have been built with Metal support for better performance with separate wheels for Macs with Intel CPUs and Apple Silicon CPUs.

I'm not sure what the performance differences between the AVX versions are and whether it is worth using the AVX wheels over the basic wheels.

As for the issue with the Linux torch versioning, I'm not sure what to do about it beyond using this command to install it:

python -m pip install torch==2.0.1+cu117 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

This works on my WSL install. The ==2.0.1+cu117 part is what matters most as that restricts pip to choosing the correct wheel to install. If needed, I can write some code to import torch and determine the version that way. Something like this:

import torch
is_cuda = torch.version.cuda is not None
is_rocm = torch.version.hip is not None
is_cpu = not (is_cuda or is_rocm or is_intel)

An alternative to the above is to set the torver variable like so:

from torch import __version__ as torver

I downloaded the new Linux CUDA wheel and verified that it does have __version__ set to 2.0.1+cu117.
Downloaded the Intel wheel as well and it is set to 2.0.1a0+cxx11.abi, so this may be the most reliable solution.

oobabooga · 2023-09-24T02:26:22Z

I tested AVX2 vs no AVX2 as you suggested, and to my surprise, AVX2 is consistently slower:

test 1
https://github.com/abetlen/llama-cpp-python/releases/download/v0.2.6/llama_cpp_python-0.2.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Output generated in 113.34 seconds (1.76 tokens/s, 199 tokens, context 239, seed 680155108)
Output generated in 112.89 seconds (1.76 tokens/s, 199 tokens, context 239, seed 1921410947)

test 2
https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/basic/llama_cpp_python-0.2.6+cu117-cp310-cp310-manylinux_2_31_x86_64.whl

Output generated in 106.26 seconds (1.87 tokens/s, 199 tokens, context 239, seed 963229003)
Output generated in 107.04 seconds (1.86 tokens/s, 199 tokens, context 239, seed 1180646349)

test 3
pip install llama-cpp-python (compiled locally, ends up with avx2)

Output generated in 110.83 seconds (1.80 tokens/s, 199 tokens, context 239, seed 20789062)
Output generated in 110.76 seconds (1.80 tokens/s, 199 tokens, context 239, seed 1245230211)

So I just dumped AVX2 everywhere. I'll double check this on another computer before merging to confirm.

For Mac I just arbitrarly picked the 11_0 wheels and will hope for the best (backward compatibility?).

The llama_cpp_cuda library name for a library without cuda will indeed me a little awkward, but I have updated the llama.cpp imports and it should work fine.

oobabooga · 2023-09-24T02:29:47Z

An alternative to the above is to set the torver variable like so:
An alternative to the above is to set the torver variable like so:

from torch import __version__ as torver

from torch import __version__ as torver

Nice! I have made this change, that's indeed much cleaner.

jllllll · 2023-09-24T02:38:33Z

For llama-cpp-python built with cuBLAS, most of the intensive math operations take place on the GPU. In that case, AVX doesn't really matter much. It's mostly only relevant for CPU-only builds, though I haven't tested speeds myself.

Not sure how relevant it is, but the basic wheels I provide aren't just built without AVX, but also without FMA and F16C, both of which are enabled by default. My AVX (non-AVX2) wheels are also without those 2 instruction sets for better CPU compatibility.

oobabooga · 2023-09-24T03:23:00Z

Not sure how relevant it is, but the basic wheels I provide aren't just built without AVX, but also without FMA and F16C, both of which are enabled by default.

This may be related to the performance degradation that I found on my laptop (i5-10300H CPU):

test 1
https://github.com/abetlen/llama-cpp-python/releases/download/v0.2.6/llama_cpp_python-0.2.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Output generated in 223.67 seconds (0.89 tokens/s, 199 tokens, context 239, seed 1004419314)
Output generated in 225.85 seconds (0.88 tokens/s, 199 tokens, context 239, seed 634638385)

test 2
https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/basic/llama_cpp_python-0.2.6+cu117-cp310-cp310-manylinux_2_31_x86_64.whl

Output generated in 738.82 seconds (0.27 tokens/s, 199 tokens, context 239, seed 877489395)

test 3
pip install llama-cpp-python (compiled locally, ends up with avx2)

Output generated in 152.65 seconds (1.30 tokens/s, 199 tokens, context 239, seed 1900209302)
Output generated in 231.62 seconds (0.86 tokens/s, 199 tokens, context 239, seed 2015739191)

I ended up just adding back the AVX2 check and creating _avx2 versions of all the requirements.txt.

oobabooga added 7 commits September 23, 2023 14:37

Create requirements_amd.txt with AMD wheels

456f3a4

Add requirements_macos.txt

277bd93

Add more checks

1625269

Add requirements_mac.txt

fd61b0b

Add requirements_minimal

0e69f52

Update requirements_minimal

bf18a81

Add missing import

75bc829

oobabooga added 3 commits September 23, 2023 19:09

Don't use AVX2

0b7e257

Add mac wheels

dcbb0ce

Update the llama.cpp imports

86f8399

Use torch __version__

ee39ba8

oobabooga changed the title ~~Create requirements_amd.txt with AMD wheels~~ Create requirements_amd.txt with AMD and Metal wheels Sep 24, 2023

Add back AVX2

9256159

oobabooga added 4 commits September 24, 2023 00:36

Update README.md

7551e42

Fix cpuinfo install

6606db5

Print what requirements file is being used

e1f64d3

Update README, rename some requirements files

4d55c18

oobabooga changed the title ~~Create requirements_amd.txt with AMD and Metal wheels~~ Create alternative requirements.txt with AMD and Metal wheels Sep 24, 2023

oobabooga added 2 commits September 23, 2023 21:26

Add llama-cpp-python rocm wheel (Linux)

7b0f18e

Rename some files

cd3bc74

oobabooga merged commit 2e7b6b0 into main Sep 24, 2023

oobabooga deleted the gpu-requirements branch October 22, 2023 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create alternative requirements.txt with AMD and Metal wheels #4052

Create alternative requirements.txt with AMD and Metal wheels #4052

Uh oh!

oobabooga commented Sep 23, 2023 •

edited

Loading

Uh oh!

jllllll commented Sep 24, 2023 •

edited

Loading

Uh oh!

oobabooga commented Sep 24, 2023

Uh oh!

oobabooga commented Sep 24, 2023

Uh oh!

jllllll commented Sep 24, 2023 •

edited

Loading

Uh oh!

oobabooga commented Sep 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Create alternative requirements.txt with AMD and Metal wheels #4052

Create alternative requirements.txt with AMD and Metal wheels #4052

Uh oh!

Conversation

oobabooga commented Sep 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jllllll commented Sep 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oobabooga commented Sep 24, 2023

Uh oh!

oobabooga commented Sep 24, 2023

Uh oh!

jllllll commented Sep 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oobabooga commented Sep 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oobabooga commented Sep 23, 2023 •

edited

Loading

jllllll commented Sep 24, 2023 •

edited

Loading

jllllll commented Sep 24, 2023 •

edited

Loading