Skip to content

[Feature Request]: Add System-level optimization for CPU inference to wiki #10514

@LynxPDA

Description

@LynxPDA

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

The work on the CPU can be quite long.
Using some system optimizations, borrowed from HuggingFace, it turned out to increase the speed of work from 1.25x to 1.5x.

For my inference:

  • Xeon E3 1265L v3 (16Gb, 4 core) speed up from 10s/it to 8s/it
  • Ryzen 9 7950X (32Gb, 16 core) speed up from 2.54s/it to 1.7s/it

Proposed workflow

I added the following lines to the end of the webui-user.sh file:

export OMP_NUM_THREADS=16
export MKL_NUM_THREADS=16
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so:$LD_PRELOAD
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms: 60000,muzzy_decay_ms:60000"
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libiomp5.so:$LD_PRELOAD

Having previously installed

  • sudo apt-get install -y libjemalloc-dev
  • sudo apt-get install intel-mkl

Additional information

Other system informations:

COMMANDLINE_ARGS="--precision autocast --use-cpu all --no-half --opt-channelslast --skip-torch-cuda-test --enable-insecure-extension-access"

python: 3.10.6  •  torch: 2.1.0.dev20230506+cpu  •  xformers: N/A  •  gradio: 3.28.1  •  commit: 5ab7f213  •  checkpoint: b4391b7978

OS Ubuntu 22.04

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions