End-to-end ML pipelines — from training to serving across multiple frameworks
This project aims to demonstrate progressive MLOps maturity, inspired by Google’s MLOps maturity model.
- GitHub as the source of truth: All code, configuration, and promotion logic lives in GitHub.
- MLflow is used to log experiments (see notebooks) and track metadata such as:
- Git commit hash
- Model configuration
- Artifacts for reproducible deployment
- GitHub Actions are explored to automate the promotion of models across environments (e.g. from development to staging/production) triggered via PR merges.
- 🔁 CI/CD is not yet implemented — promotion pipelines are triggered on PR merge.
- 🧪 Not full GitOps — no state reconciliation between deployed models and registry.
- 🕓 Continuous Training (CT) is planned — will enable pipeline re-training on fresh data periodically (e.g. on a schedule).
- 🧩 Feature Store support is not yet implemented (optional for this use case).
📝 A future enhancement will show how to use KitOps to manage model promotion without relying on a centralized model registry.
Explore fully functional ML deployments of the Iris dataset via different tooling and frameworks:
- app/ — FastAPI + GPU inference with PyTorch or ONNX
- data/ — raw Iris dataset / data ingestion
- deployment/ — Dockerfiles, Kubernetes-ready configs
- notebooks/ — Jupyter notebooks for training and tuning
- src/ — core model definition and training scripts
- tests/ — unit/integration tests
- torch_serve/ — TorchServe
.mar
archives & handler configs - triton_model_repository/ — Triton Server model repo structure
- model.* — pre-trained model artifacts (
.pt
,.onnx
,.compiled.pt
) - pyproject.toml, uv.lock — dependency management
- LICENSE, .gitignore, etc.
-
Install dependencies
uv sync --frozen --no-editable --no-cache --extra torch --extra torchserve
-
Train & export models
uv run src/train/train_iris.py
Outputs:
model.pt
— PyTorch weightsmodel.onnx
— export via ONNX
-
Build Docker inference containers
🚀 FastAPI + PyTorch
docker build -f Dockerfile.torch -t fastapi-torch . docker run --gpus all -p 8000:8000 fastapi-torch
Test:
curl -X POST http://localhost:8000/predict \ -H "Content-Type: application/json" \ -d '{"features":[[5.1,3.5,1.4,0.2]]}'
🚀 FastAPI + ONNX
docker build -f Dockerfile.onnx -t fastapi-onnx . docker run --gpus all -p 8000:8000 fastapi-onnx
-
TorchServe Deployment
uv run torch-model-archiver \ --model-name iris \ --version 1.0 \ --serialized-file model.pt \ --handler src/models/torch_handler.py \ --export-path torch_serve \ --extra-files "src/models/iris.py,src/models/torch_model_config.py,torch_serve/config.properties" docker build -f deployment/Dockerfile.torchserve -t iris-serve . docker run --gpus all -p 8080:8080 iris-serve curl -X POST http://localhost:8080/predictions/iris \ -H "Content-Type: application/json" \ -d '[[5.1,3.5,1.4,0.2],[6.2,3.4,5.4,2.3]]'
-
Triton Inference Server
docker run --rm --gpus all \ -p 8000:8000 -p 8001:8001 -p 8002:8002 \ -v $PWD/triton_model_repository:/models \ nvcr.io/nvidia/tritonserver:25.06-py3 \ tritonserver --model-repository=/models
Test:
curl -X POST http://localhost:8000/v2/models/iris/infer \ -H "Content-Type: application/json" \ --data-binary @- <<EOF { "inputs":[{"name":"input","shape":[1,4],"datatype":"FP32","data":[[5.1,3.5,1.4,0.2]]}], "outputs":[{"name":"output"}] } EOF
- Model training and tuning with Optuna + PyTorch
- Export to TorchScript and ONNX
- GPU-powered inference via FastAPI, TorchServe, and Triton
- Containerization workflows and best practices
- How to register and serve models in diverse environments
- Run unit tests:
uv run pytest -q
- CI/CD integrations can easily be added via GitHub Actions or other platforms, using
uv sync
,pytest
, and Docker builds.
We welcome:
- Corrections or bug fixes
- New deployment workflows (e.g., KServe, TFKeras)
- Improvements in deployment docs or examples
Please open an issue or submit a pull request!
Apache 2.0 — see LICENSE for details.