A universal desktop application for loading, managing, and interacting with AI models in all major formats (GGUF, safetensors, PyTorch bin, and Hugging Face models). Built with PySide6 and featuring intelligent backend selection, format detection, and a modular architecture with drag-and-drop addon support.
- Universal Model Support: Load GGUF, safetensors, PyTorch bin files, and Hugging Face models seamlessly
- Intelligent Backend Selection: Automatic backend routing based on model format and hardware capabilities
- LM Studio-like GPU Acceleration: Automatic GPU detection and seamless CPU fallback
- Format Detection: Automatic model format identification and validation
- Hugging Face Integration: Direct model loading from Hugging Face Hub with caching
- Clean and intuitive interface for universal model management
- Detailed model information display across all formats
- Enhanced error reporting with actionable solutions
- Memory usage optimization for large models across all backends
- Modular architecture with addon support
- Drag-and-drop addon installation
- Theme customization
- Smart Hardware Detection: Automatically uses NVIDIA CUDA, Apple Metal, or AMD ROCm
- Portable Deployment: Uses only system-installed drivers, no bundled dependencies
- Python 3.8 or higher
- Windows, macOS, or Linux operating system
- Sufficient RAM for loading AI models (varies by model size and format)
-
Clone this repository:
git clone https://github.com/hussainnazary2/LLM-Toolkit.git cd llm-toolkit -
Set up the virtual environment:
Windows:
setup_env.batmacOS/Linux:
./setup_env.sh -
Optional - GPU Acceleration:
For faster model inference, install GPU support:
Windows:
setup_gpu.batmacOS/Linux:
./setup_gpu.shThis will automatically detect your GPU and install the appropriate acceleration:
- NVIDIA: CUDA acceleration
- AMD: ROCm acceleration
- Apple: Metal acceleration (macOS)
- Intel: Vulkan acceleration
Note: GPU acceleration requires appropriate drivers:
- NVIDIA: CUDA 11.8+ or 12.x drivers
- AMD: ROCm 5.4+ drivers
- Apple: macOS 10.15+ (built-in)
- Intel: Latest GPU drivers
-
Activate the virtual environment:
Windows:
venv\Scripts\activatemacOS/Linux:
source venv/bin/activate
After activating the virtual environment, run the application:
python main.py
To quickly evaluate this project:
-
Clone and setup:
git clone <repository-url> cd llm-toolkit
-
Windows users:
setup_env.bat venv\Scripts\activate python main.py
-
macOS/Linux users:
./setup_env.sh source venv/bin/activate python main.py
The application will launch with a GUI interface for loading and managing AI models. All dependencies are automatically installed during setup.
llm-toolkit/
├── main.py # Application entry point
├── app/
│ ├── core/ # Core application logic (format detection, backend routing)
│ ├── ui/ # UI components
│ ├── models/ # Data models
│ ├── backends/ # Backend implementations (transformers, llama-cpp-python, etc.)
│ ├── services/ # Services (Hugging Face integration, model loading)
│ └── addons/ # Addon system
├── interfaces/ # Public interfaces
├── utils/ # Utility functions
├── resources/ # Application resources
└── addons/ # Directory for installed addons
To run the test suite:
python run_tests.py
Addons must implement the interfaces defined in the interfaces directory. See the documentation in the docs directory for more information on creating addons.
This project was developed with significant assistance from Kiro, an AI-powered development assistant. Kiro was instrumental in accelerating development, maintaining code quality, and implementing complex features across the entire stack.
-
Modular Architecture: Kiro designed the clean separation between backends (
app/backends/), UI components (app/ui/), and core logic (app/core/)- Example: Created the
BackendInterfaceabstraction that allows seamless switching between llama-cpp-python, transformers, and other backends - Designed the plugin system in
app/addons/with drag-and-drop installation support
- Example: Created the
-
Interface Design: Generated all abstraction interfaces in the
interfaces/directoryIBackend: Unified interface for all model backendsIModelLoader: Standard interface for loading different model formatsIAddon: Plugin interface for extensibility
-
Format Detection System: Implemented intelligent model format detection in
app/core/format_detector.py- Automatic identification of GGUF, safetensors, PyTorch bin, and Hugging Face models
- Smart routing to appropriate backends based on file signatures and metadata
app/backends/llama_backend.py: Complete GGUF model support with llama-cpp-python integrationapp/backends/transformers_backend.py: Hugging Face transformers backend with automatic device mappingapp/backends/safetensors_backend.py: Native safetensors format supportapp/backends/pytorch_backend.py: PyTorch bin file loading and inference
app/ui/main_window.py: Main application window with model management interfaceapp/ui/model_info_widget.py: Detailed model information display supporting all formatsapp/ui/addon_manager.py: Drag-and-drop addon installation interface- Theme System: Custom theming support with dark/light mode switching
app/services/model_loader.py: Unified model loading service with format detection and backend routingapp/services/huggingface_service.py: Direct Hugging Face Hub integration with cachingapp/services/hardware_detector.py: Automatic GPU detection (CUDA, Metal, ROCm, Vulkan)
setup_env.bat/setup_env.sh: Automated virtual environment setup for Windows/Unixsetup_gpu.bat/setup_gpu.sh: Intelligent GPU acceleration installation- Detects NVIDIA (CUDA), AMD (ROCm), Apple (Metal), or Intel (Vulkan)
- Installs appropriate PyTorch and acceleration libraries
- Handles driver version compatibility
tests/test_format_detection.py: Comprehensive format detection teststests/test_backends.py: Backend integration tests for all supported formatstests/test_model_loading.py: End-to-end model loading testsrun_tests.py: Unified test runner with coverage reporting- Error Handling: Implemented user-friendly error messages with actionable solutions throughout the application
Used Kiro's spec system to plan and implement major features:
- Model format detection specification
- Backend architecture design
- Addon system requirements
- GPU acceleration implementation plan
Leveraged Kiro's #File and #Folder context features to:
- Maintain consistency across related files
- Refactor code while preserving interfaces
- Update multiple backend implementations simultaneously
- Used Kiro's diagnostic tools to catch type errors, linting issues, and import problems
- Iteratively fixed cross-platform compatibility issues
- Optimized memory usage for large model loading
-
Hardware Detection Logic: Kiro generated the complete GPU detection system that automatically identifies and configures CUDA, Metal, ROCm, or Vulkan acceleration based on available hardware
-
Format Detection Algorithm: Implemented sophisticated file signature checking and metadata parsing to accurately identify model formats without user input
-
Error Recovery: Created comprehensive error handling that provides users with specific, actionable error messages (e.g., "CUDA not available. Install CUDA 11.8+ drivers or run in CPU mode")
-
Cross-Platform Scripts: Generated both Windows batch files and Unix shell scripts with identical functionality, handling path differences and platform-specific commands
-
Memory Optimization: Implemented lazy loading and memory-mapped file access for handling models larger than available RAM
- Rapid Prototyping: Initial working prototype completed in hours instead of days
- Multi-Format Support: Adding each new model format (GGUF, safetensors, PyTorch) took minutes with Kiro's assistance
- Cross-Platform Testing: Kiro helped identify and fix platform-specific issues without requiring multiple test machines
- Documentation: Generated comprehensive README, API docs, and inline code comments automatically
This project demonstrates how AI-assisted development with Kiro enables:
- Faster iteration on complex features
- Consistent code quality and architecture
- Comprehensive error handling and user experience
- Rapid cross-platform compatibility
- Maintainable, well-documented codebases
The .kiro/ directory contains specs and development artifacts that showcase the AI-assisted development process, including requirements gathering, design decisions, and implementation planning.
Contributions are welcome! Please feel free to submit a Pull Request.