Skip to content

Conversation

@oobabooga
Copy link
Owner

@oobabooga oobabooga commented Sep 23, 2023

This is an attempt at making the one-click installer more universal.

  • Added requirements_amd.txt with all the AMD wheels that I could find, replacing the manual install commands in one_click.py.
  • Renamed requirements_nowheels.txt to requirements_minimal.txt and added the AVX2 version of llama-cpp-python there.
  • Added requirements_minimal_noavx2.txt, identical to the previous one but with the llama-cpp-python wheels without AVX2 by @jllllll.
  • Added requirements_mac.txt with the only wheel with "mac" in its name that I could find (for llama-cpp-python). I don't know if it's useful.
  • Added a function that detects if the CPU has AVX2.
  • Changed the one-click installer to install requirements_amd.txt when +rocm is present, requirements_minimal when +cpu is present, and requirements.txt otherwise.

An open question is whether it makes sense to have avx2 and no_avx2 versions of each requirements.txt. If so, there will be at least 6 to 8 requirements.txt in the end.

Also, one issue I found is that on Linux + CUDA, pip show torch currently returns the following string that has no mention of +cu in the Version line, causing is_cuda to be set to False.

Name: torch
Version: 2.0.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Requires: filelock, jinja2, networkx, nvidia-cublas-cu11, nvidia-cuda-cupti-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cufft-cu11, nvidia-curand-cu11, nvidia-cusolver-cu11, nvidia-cusparse-cu11, nvidia-nccl-cu11, nvidia-nvtx-cu11, sympy, triton, typing-extensions
Required-by: torchaudio, torchvision, triton

@jllllll what do you think of these changes?

@jllllll
Copy link
Contributor

jllllll commented Sep 24, 2023

Only thing that stands out to me is that the webui currently uses cuBLAS wheels for llama-cpp-python under the name of llama-cpp-python-cuda.

All of the llama-cpp-python-cuda wheels that I've built can be easily browsed through here:
https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/textgen/

All of the typical llama-cpp-python wheels I have built are here: https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels
These also include CPU-only wheels. The CPU-only wheels have also been built with MacOS versions as well, though there are 3 major versions of them and I'm not sure how to separate them in requirements.txt or if that is even necessary. These wheels have been built with Metal support for better performance with separate wheels for Macs with Intel CPUs and Apple Silicon CPUs.

I'm not sure what the performance differences between the AVX versions are and whether it is worth using the AVX wheels over the basic wheels.

As for the issue with the Linux torch versioning, I'm not sure what to do about it beyond using this command to install it:

python -m pip install torch==2.0.1+cu117 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

This works on my WSL install. The ==2.0.1+cu117 part is what matters most as that restricts pip to choosing the correct wheel to install. If needed, I can write some code to import torch and determine the version that way. Something like this:

import torch
is_cuda = torch.version.cuda is not None
is_rocm = torch.version.hip is not None
is_cpu = not (is_cuda or is_rocm or is_intel)

An alternative to the above is to set the torver variable like so:

from torch import __version__ as torver

I downloaded the new Linux CUDA wheel and verified that it does have __version__ set to 2.0.1+cu117.
Downloaded the Intel wheel as well and it is set to 2.0.1a0+cxx11.abi, so this may be the most reliable solution.

@oobabooga
Copy link
Owner Author

I tested AVX2 vs no AVX2 as you suggested, and to my surprise, AVX2 is consistently slower:

test 1
https://github.com/abetlen/llama-cpp-python/releases/download/v0.2.6/llama_cpp_python-0.2.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Output generated in 113.34 seconds (1.76 tokens/s, 199 tokens, context 239, seed 680155108)
Output generated in 112.89 seconds (1.76 tokens/s, 199 tokens, context 239, seed 1921410947)

test 2
https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/basic/llama_cpp_python-0.2.6+cu117-cp310-cp310-manylinux_2_31_x86_64.whl

Output generated in 106.26 seconds (1.87 tokens/s, 199 tokens, context 239, seed 963229003)
Output generated in 107.04 seconds (1.86 tokens/s, 199 tokens, context 239, seed 1180646349)

test 3
pip install llama-cpp-python (compiled locally, ends up with avx2)

Output generated in 110.83 seconds (1.80 tokens/s, 199 tokens, context 239, seed 20789062)
Output generated in 110.76 seconds (1.80 tokens/s, 199 tokens, context 239, seed 1245230211)

So I just dumped AVX2 everywhere. I'll double check this on another computer before merging to confirm.

For Mac I just arbitrarly picked the 11_0 wheels and will hope for the best (backward compatibility?).

The llama_cpp_cuda library name for a library without cuda will indeed me a little awkward, but I have updated the llama.cpp imports and it should work fine.

@oobabooga
Copy link
Owner Author

An alternative to the above is to set the torver variable like so:
An alternative to the above is to set the torver variable like so:

from torch import __version__ as torver
from torch import __version__ as torver

Nice! I have made this change, that's indeed much cleaner.

@oobabooga oobabooga changed the title Create requirements_amd.txt with AMD wheels Create requirements_amd.txt with AMD and Metal wheels Sep 24, 2023
@jllllll
Copy link
Contributor

jllllll commented Sep 24, 2023

For llama-cpp-python built with cuBLAS, most of the intensive math operations take place on the GPU. In that case, AVX doesn't really matter much. It's mostly only relevant for CPU-only builds, though I haven't tested speeds myself.

Not sure how relevant it is, but the basic wheels I provide aren't just built without AVX, but also without FMA and F16C, both of which are enabled by default. My AVX (non-AVX2) wheels are also without those 2 instruction sets for better CPU compatibility.

@oobabooga
Copy link
Owner Author

Not sure how relevant it is, but the basic wheels I provide aren't just built without AVX, but also without FMA and F16C, both of which are enabled by default.

This may be related to the performance degradation that I found on my laptop (i5-10300H CPU):

test 1
https://github.com/abetlen/llama-cpp-python/releases/download/v0.2.6/llama_cpp_python-0.2.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Output generated in 223.67 seconds (0.89 tokens/s, 199 tokens, context 239, seed 1004419314)
Output generated in 225.85 seconds (0.88 tokens/s, 199 tokens, context 239, seed 634638385)

test 2
https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/basic/llama_cpp_python-0.2.6+cu117-cp310-cp310-manylinux_2_31_x86_64.whl

Output generated in 738.82 seconds (0.27 tokens/s, 199 tokens, context 239, seed 877489395)

test 3
pip install llama-cpp-python (compiled locally, ends up with avx2)

Output generated in 152.65 seconds (1.30 tokens/s, 199 tokens, context 239, seed 1900209302)
Output generated in 231.62 seconds (0.86 tokens/s, 199 tokens, context 239, seed 2015739191)

I ended up just adding back the AVX2 check and creating _avx2 versions of all the requirements.txt.

@oobabooga oobabooga changed the title Create requirements_amd.txt with AMD and Metal wheels Create alternative requirements.txt with AMD and Metal wheels Sep 24, 2023
@oobabooga oobabooga merged commit 2e7b6b0 into main Sep 24, 2023
@oobabooga oobabooga deleted the gpu-requirements branch October 22, 2023 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants