This C++ application enables machine learning tasks (e.g. object detection, classification, optical flow ...) using the Nvidia Triton Server. Triton manages multiple framework backends for streamlined model deployment.
- Project Structure
- Tested Models
- Build Client Libraries
- Dependencies
- Build and Compile
- Tasks
- Notes
- Deploying Models
- Running Inference
- Docker Support
- Demo
- References
- Feedback
tritonic/
├── src/ # Source code
│ ├── main/ # Main application
│ │ └── client.cpp # Main entry point
│ ├── triton/ # Triton client code
│ ├── tasks/ # Task implementations
│ └── utils/ # Utility classes
├── include/ # Header files
├── deploy/ # Model deployment scripts
│ └── classifier/ # Classification models
│ ├── tensorflow/ # TensorFlow deployments
│ ├── torchvision/ # Torchvision deployments
│ └── vit/ # Vision Transformer deployments
├── scripts/ # All scripts
│ ├── docker/ # Docker-related scripts
│ │ ├── docker_triton_run.sh # Run Triton server
│ │ ├── extract_triton_libs.sh# Extract client libraries
│ │ ├── run_client.sh # Run client application
│ │ ├── run_debug.sh # Run with debug mode
│ │ ├── run_optical_flow.sh # Run optical flow
│ │ └── run_tests.sh # Run unit tests
│ ├── setup/ # Setup scripts
│ └── tools/ # Utility scripts
├── config/ # Configuration files
│ └── environments/ # Environment configs
├── docs/ # Documentation
│ └── guides/ # User guides
├── labels/ # Label files
│ ├── coco.txt # COCO class labels
│ └── imagenet.txt # ImageNet class labels
├── data/ # Data files
│ ├── images/ # Test images
│ ├── videos/ # Test videos
│ └── models/ # Model files
└── tests/ # Test files
├── mocks/ # Mock objects
├── unit/ # Unit tests
└── integration/ # Integration tests
- YOLOv5
- YOLOv6
- YOLOv7
- YOLOv8
- YOLOv9
- YOLOv10
- YOLO11
- YOLOv12
- YOLO-NAS
- RT-DETR
- RT-DETRv2
- D-FINE
- DEIM
- RF-DETR
To build the client libraries, refer to the official Triton Inference Server client libraries.
For convenience, you can extract pre-built Triton client libraries from the official NVIDIA Triton Server SDK Docker image:
# Run the extraction script
./scripts/docker/extract_triton_libs.sh
This script will:
- Create a temporary Docker container from the
nvcr.io/nvidia/tritonserver:25.06-py3-sdk
image - Extract the Triton client libraries from
/workspace/install
- Copy additional Triton server headers and libraries if available
- Save everything to
./triton_client_libs/
directory
After extraction, set the environment variable:
export TritonClientBuild_DIR=$(pwd)/triton_client_libs/install
The extracted directory structure will contain:
install/
- Triton client build artifactstriton_server_include/
- Triton server headerstriton_server_lib/
- Triton server librariesworkspace/
- Additional workspace files
Ensure the following dependencies are installed:
- Nvidia Triton Inference Server:
docker pull nvcr.io/nvidia/tritonserver:25.06-py3
- Triton client libraries: Tested on Release r25.06
- Protobuf and gRPC++: Versions compatible with Triton
- RapidJSON:
apt install rapidjson-dev
- libcurl:
apt install libcurl4-openssl-dev
- OpenCV 4: Tested version: 4.7.0
To maintain code quality and consistency, install pre-commit hooks:
# Run the setup script
./scripts/setup/pre_commit_setup.sh
# Or install manually
pip install pre-commit
pre-commit install
-
Set the environment variable
TritonClientBuild_DIR
or update theCMakeLists.txt
with the path to your installed Triton client libraries. -
Create a build directory:
mkdir build
- Navigate to the build directory:
cd build
- Run CMake to configure the build:
cmake -DCMAKE_BUILD_TYPE=Release ..
Optional flags:
-DSHOW_FRAME
: Enable to display processed frames after inference-DWRITE_FRAME
: Enable to write processed frames to disk
- Build the application:
cmake --build .
Other tasks are in TODO list.
Ensure the model export versions match those supported by your Triton release. Check Triton releases here.
To deploy models, set up a model repository following the Triton Model Repository schema. The config.pbtxt
file is optional unless you're using the OpenVino backend, implementing an Ensemble pipeline, or passing custom inference parameters.
<model_repository>/
<model_name>/
config.pbtxt
<model_version>/
<model_binary>
To start Triton Server:
docker run --gpus=1 --rm \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v /full/path/to/model_repository:/models \
nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver \
--model-repository=/models
Omit the --gpus
flag if using the CPU version.
./tritonic \
--source=/path/to/source.format \
--model_type=<model_type> \
--model=<model_name_folder_on_triton> \
--labelsFile=/path/to/labels/coco.names \
--protocol=<http or grpc> \
--serverAddress=<triton-ip> \
--port=<8000 for http, 8001 for grpc> \
For dynamic input sizes:
--input_sizes="c,h,w"
Use the provided Docker scripts for quick testing:
# Run object detection
./scripts/docker/run_client.sh
# Run with debug mode
./scripts/docker/run_debug.sh
# Run optical flow
./scripts/docker/run_optical_flow.sh
# Run unit tests
./scripts/docker/run_tests.sh
Check .vscode/launch.json
for additional configuration examples
/path/to/source.format
: Path to the input video or image file, for optical flow you must pass two images as comma separated list<model_type>
: Model type (e.g.,yolov5
,yolov8
,yolo11
,yoloseg
,torchvision-classifier
,tensorflow-classifier
,vit-classifier
, check below Model Type Parameters)<model_name_folder_on_triton>
: Name of the model folder on the Triton server/path/to/labels/coco.names
: Path to the label file (e.g., COCO labels)<http or grpc>
: Communication protocol (http
orgrpc
)<triton-ip>
: IP address of your Triton server<8000 for http, 8001 for grpc>
: Port number<batch or b >
: Batch size, currently only 1 is supported<input_sizes or -is>
: Input sizes input for dynamic axes. Semi-colon separated list format: CHW;CHW;... (e.g., '3,224,224' for single input or '3,224,224;3,224,224' for two inputs, '3,640,640;2' for rtdetr/dfine models)
To view all available parameters, run:
./tritonic --help
Model | Model Type Parameter |
---|---|
YOLOv5 | yolov5 |
YOLOv6 | yolov6 |
YOLOv7 | yolov7 |
YOLOv8 | yolov8 |
YOLOv9 | yolov9 |
YOLOv10 | yolov10 |
YOLO11 | yolo11 |
YOLOv12 | yolov12 |
RT-DETR | rtdetr |
RT-DETRV2 | rtdetrv2 |
RT-DETR Ultralytics | rtdetrul |
RF-DETR | rfdetr |
D-FINE | dfine |
DEIM | deim |
Torchvision Classifier | torchvision-classifier |
Tensorflow Classifier | tensorflow-classifier |
ViT Classifier | vit-classifier |
YOLOv5 Segmentation | yoloseg |
YOLOv8 Segmentation | yoloseg |
YOLO11 Segmentation | yoloseg |
YOLO12 Segmentation | yoloseg |
RAFT Optical Flow | raft |
For detailed instructions on installing Docker and the NVIDIA Container Toolkit, refer to the Docker Setup Document.
docker build --rm -t tritonic .
docker run --rm \
-v /path/to/host/data:/app/data \
tritonic \
--network host \
--source=<path_to_source_on_container> \
--model_type=<model_type> \
--model=<model_name_folder_on_triton> \
--labelsFile=<path_to_labels_on_container> \
--protocol=<http or grpc> \
--serverAddress=<triton-ip> \
--port=<8000 for http, 8001 for grpc>
Real-time inference test (GPU RTX 3060):
- YOLOv7-tiny exported to ONNX: Demo Video
- YOLO11s exported to onnx: Demo Video
- RAFT Optical Flow Large(exported to traced torchscript): Demo Video
- Triton Inference Server Client Example
- Triton User Guide
- Triton Tutorials
- ONNX Models
- Torchvision Models
- Tensorflow Model Garden
Any feedback is greatly appreciated. If you have any suggestions, bug reports, or questions, don't hesitate to open an issue. Contributions, corrections, and suggestions are welcome to keep this repository relevant and useful.