Skip to content

MaidReal/Head

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Description

This project provides a wrapper for large language models (LLMs) and related audio processing tools. It is designed for Ubuntu and WSL2, supporting GPU acceleration and audio workflows for speech-to-text, text-to-speech, and model conversion.


Dependencies

  • Ubuntu 24 or WSL2 (Windows Subsystem for Linux)
  • Python (recommended: pyenv)
  • CUDA toolkit (for GPU support)
  • PulseAudio (for audio in WSL)
  • PyTorch
  • Required Python packages: lark-parser, numpy, empy, catkin_pkg, setuptools, wheel, pygame, pyyaml, sounddevice
  • FFmpeg
  • Terminator

Quick Setup

Option A: Automated (only tested with wsl)

Make setup.sh executable and run:

chmod +x setup.sh
./setup.sh

Option B: Manual

1. WSL & Python Environment

  • Install WSL in Windows Powershell:
    wsl --install
  • Install pyenv dependencies:
    sudo apt update
    sudo apt install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev git
  • Install pyenv:
    curl https://pyenv.run | bash
  • Add to ~/.bashrc:
    export PATH="$HOME/.pyenv/bin:$PATH"
    eval "$(pyenv init -)"
    eval "$(pyenv virtualenv-init -)"
    source ~/.bashrc
  • Install and activate Python:
    pyenv install <version>
    pyenv virtualenv <version> <env_name>
    pyenv activate <env_name>

Submodules setup

To grab all of the submodules just run:

git submodule update --init

2. Python Packages

Upgrade pip and install requirements:

python -m pip install --upgrade pip
pip install -r requirements.txt --upgrade
pip install --user -U nltk

python src/lip_sync/misc/download_nltk.py

3. Audio in WSL

  • For PulseAudio (WSLg):
    export PULSE_SERVER=unix:/mnt/wslg/PulseServer
    paplay /usr/share/sounds/alsa/Front_Center.wav
  • Install PulseAudio utilities:
    sudo apt install -y pulseaudio-utils

4. CUDA Toolkit (12.9 Example)

Follow official instructions or:

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda-repo-wsl-ubuntu-12-9-local_12.9.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-9-local_12.9.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-9-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-9
  • Add CUDA to PATH:
    nano ~/.bashrc
    export PATH=$PATH:/usr/local/cuda/bin
    source ~/.bashrc

5. PyTorch

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129

6. Additional Tools

  • Terminator: sudo apt install terminator (optional)
  • FFmpeg: sudo apt update && sudo apt install ffmpeg

Run files

Export workspace root for zonos:

export WORKSPACE_ROOT=/ # Add path to your ros2_ws here

in root directory (Head) run:

colcon build

source install:

source install/setup.bash

Gaze tracking

ros2 run gaze_tracking head_tracker

Lip Sync

ros2 run lip_sync lip_sync_sub

Brain

Server:

llama-server -m models/gpt-oss-20b.gguf --jinja -ngl 99 -fa --n-cpu-moe

Bridge:

ros2 launch brain llama_bridge.launch.py

Text to Speech

ros2 run text_to_speech service

Speech to Text

Listen through microphone:

ros2 run speech_to_text listener

Running the model:

ros2 run speech_to_text stt_model

Notes

  • For ROS2 setup, see official documentation and consider adding source /opt/ros/jazzy/setup.bash to your ~/.bashrc.
  • For troubleshooting audio, use parecord and paplay.
  • For model conversion and usage, refer to the scripts and documentation in the repository.

About

ROS2-based wrapper for LLMs and audio processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •