-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Description
Is there an existing issue for this?
- I have searched the existing issues and checked the recent builds/commits
What would your feature do ?
The work on the CPU can be quite long.
Using some system optimizations, borrowed from HuggingFace, it turned out to increase the speed of work from 1.25x to 1.5x.
For my inference:
- Xeon E3 1265L v3 (16Gb, 4 core) speed up from 10s/it to 8s/it
- Ryzen 9 7950X (32Gb, 16 core) speed up from 2.54s/it to 1.7s/it
Proposed workflow
I added the following lines to the end of the webui-user.sh file:
export OMP_NUM_THREADS=16
export MKL_NUM_THREADS=16
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so:$LD_PRELOAD
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms: 60000,muzzy_decay_ms:60000"
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libiomp5.so:$LD_PRELOAD
Having previously installed
- sudo apt-get install -y libjemalloc-dev
- sudo apt-get install intel-mkl
Additional information
Other system informations:
COMMANDLINE_ARGS="--precision autocast --use-cpu all --no-half --opt-channelslast --skip-torch-cuda-test --enable-insecure-extension-access"
python: 3.10.6 • torch: 2.1.0.dev20230506+cpu • xformers: N/A • gradio: 3.28.1 • commit: 5ab7f213 • checkpoint: b4391b7978
OS Ubuntu 22.04