This is a monorepository is for my home k3s clusters. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Ansible, Terraform, Kubernetes, Flux, Renovate, and GitHub Actions.
The purpose here is to learn k8s, while practicing Gitops.
There is a template over at onedr0p/flux-cluster-template if you want to try and follow along with some of the practices I use here.
My cluster is k3s provisioned overtop bare-metal Debian using the Ansible galaxy role ansible-role-k3s. This is a semi-hyper-converged cluster, workloads and block storage are sharing the same available resources on my nodes while I have a separate NAS server with ZFS for NFS/SMB shares, bulk file storage and backups.
This repository uses Taskfile to automate Fly app management. Each Fly app task loads its environment variables from a per-app .config.env
file, ensuring isolation and security for each app.
- Each Fly app (e.g.,
gatus
) has its own directory underinfrastructure/flyio/<APP>
containing a.config.env
file with the required environment variables for that app. - The Taskfile tasks (e.g.,
fly:app:create
,fly:app:deploy
,fly:volume:create
, etc.) use theAPP
variable to select which app to operate on. - When you run a task, the Taskfile loads the environment from
infrastructure/flyio/<APP>/.config.env
using thedotenv:
key at the task level. This ensures all commands for that app use the correct environment.
Suppose you want to deploy or manage the gatus
Fly app. First, ensure you have a .config.env
file at infrastructure/flyio/gatus/.config.env
with all required variables.
To create a volume for gatus
:
task fly:volume:create APP=gatus
To deploy the app:
task fly:app:deploy APP=gatus
To view logs:
task fly:app:logs APP=gatus
All these commands will automatically load environment variables from infrastructure/flyio/gatus/.config.env
.
Note: You do not need to manually export environment variables or use a global
.config.env
. Each app is fully isolated.
The .config.env
file contains environment variables needed to deploy the apps in this template. Each app has its own .config.env
file as described above.
- Copy the
.config.sample.env
to.config.env
in the appropriate app directory and fill out all the environment variables. All uncommented variables are required.
Fly.io setup
For some commands below, we use a task instead of flyctl
because we
the task writes (on app creation) and reads (subsequent commands) your
app name from the config file. This is the only way to keep your app
name hidden.
-
Signup to Fly
If you already have a Fly account, use
flyctl auth login
instead.flyctl auth signup
-
Create a new fly app
If this is your first app, you'll be asked to add credit card information, but, don't worry, you'll not be charged by this app.
task fly:app:create APP=gatus
-
Create a new volume
This will show you a warning about invididual volumes. It's ok to have a single volume because we're not concerned about downtime for our gatus instance.
task fly:volume:create APP=gatus
-
Deploy your app
task fly:app:deploy APP=gatus
-
Setup your custom domain
After your app is deployed, follow the steps here to setup your custom domain.
-
Open your new gatus website
That's all! Now you can open your custom domain and gatus should work.
This template uses Renovatebot to scan and open new PRs when dependencies are out of date.
To enable this, open their Github app page, click the "Configure" button, then choose your repo. The template already provides Renovate configs and there's no need for further action.
If your deployment failed or you can't open gatus web, you can see the logs with:
task fly:app:logs APP=gatus
If that command fails (eg, if the machine is stopped), try opening your logs in the browser:
task fly:app:logs:web APP=gatus
You can also ssh in the machine with:
task fly:app:ssh
and check individual logs using overmind:
# Run this command inside your fly machine
overmind connect gatus
This will open a tmux window with gatus logs.
You can scroll your tmux window with Ctrl-B-]
and use
Ctrl-B-D
to exit the tmux window.
Substitute gatus
with caddy
, or backup
to see logs for
other apps.
After your first manual deploy to Fly.io, per instructions above, you can setup continuous deployment via Github Actions.
-
Install Github CLI
brew install gh
-
Login to Github
gh auth login
-
Set Fly secrets to your Github repo
task github:secrets:set
-
Test your workflow deployment
task github:workflow:deploy
That's all! Now, any changes to your Dockerfile
, fly.toml
or
scripts
/config
will trigger a fly deploy.
-
Why every
fly
command I run errors with:Error: the config for your app is missing an app name
?For security reasons the app name is not sdaved in the fly.toml file. In that case, you have to add
-a your-app-name
to allfly
commands.Your app name is found in your
.config.env
file.Example:
fly secrets list -a your-app-name
Or you can add:
app = "your-app-name"
to the beginning of your fly.toml file.
-
How do I update the environment variables?
After updating the
.config.env
file, you can update your environment variables in two different ways:task fly:secrets:set APP=gatus
will read your
.config.env
file and import every defined variable to your fly app, Or you can just do a new deployment:task fly:app:deploy APP=gatus
which will run the command above and do a new deployment afterwards.
- actions-runner-controller: self-hosted Github runners
- cilium: internal Kubernetes networking plugin
- cert-manager: creates SSL certificates for services in my cluster
- external-dns: automatically syncs DNS records from my cluster ingresses to a DNS provider
- external-secrets: managed Kubernetes secrets using Bitwarden.
- ingress-nginx: ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer
- sops: managed secrets for Kubernetes, Ansible, and Terraform which are committed to Git
- tf-controller: additional Flux component used to run Terraform from within a Kubernetes cluster.
- volsync: backup and recovery of persistent volume claims
Flux watches the clusters in my kubernetes folder (see Directories below) and makes the changes to my clusters based on the state of my Git repository.
The way Flux works for me here is it will recursively search the kubernetes/apps
folder until it finds the most top level kustomization.yaml
per directory and then apply all the resources listed in it. That aforementioned kustomization.yaml
will generally only have a namespace resource and one or many Flux kustomizations. Those Flux kustomizations will generally have a HelmRelease
or other resources related to the application underneath it which will be applied.
Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When some PRs are merged Flux applies the changes to my cluster.
This Git repository contains the following directories under Kubernetes.
π kubernetes
βββ π apps # applications
βββ π bootstrap # bootstrap procedures
βββ π flux # core flux configuration
βββ π templates # re-useable components
This is a high-level look how Flux deploys my applications with dependencies. Below there are 3 apps postgres
, authentik
and weave-gitops
. postgres
is the first app that needs to be running and healthy before authentik
and weave-gitops
. Once postgres
is healthy authentik
will be deployed and after that is healthy weave-gitops
will be deployed.
graph TD;
id1>Kustomization: cluster] -->|Creates| id2>Kustomization: cluster-apps];
id2>Kustomization: cluster-apps] -->|Creates| id3>Kustomization: postgres];
id2>Kustomization: cluster-apps] -->|Creates| id6>Kustomization: authentik]
id2>Kustomization: cluster-apps] -->|Creates| id8>Kustomization: weave-gitops]
id2>Kustomization: cluster-apps] -->|Creates| id5>Kustomization: postgres-cluster]
id3>Kustomization: postgres] -->|Creates| id4[HelmRelease: postgres];
id5>Kustomization: postgres-cluster] -->|Depends on| id3>Kustomization: postgres];
id5>Kustomization: postgres-cluster] -->|Creates| id10[Postgres Cluster];
id6>Kustomization: authentik] -->|Creates| id7(HelmRelease: authentik);
id6>Kustomization: authentik] -->|Depends on| id5>Kustomization: postgres-cluster];
id8>Kustomization: weave-gitops] -->|Creates| id9(HelmRelease: weave-gitops);
id8>Kustomization: weave-gitops] -->|Depends on| id5>Kustomization: postgres-cluster];
id9(HelmRelease: weave-gitops) -->|Depends on| id7(HelmRelease: authentik);
While most of my infrastructure and workloads are self-hosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about two things. (1) Dealing with chicken/egg scenarios and (2) services I critically need whether my cluster is online or not.
The alternative solution to these two problems would be to host a Kubernetes cluster in the cloud and deploy applications like HCVault and ntfy. However, maintaining another cluster and monitoring another group of workloads is a lot more time and effort than I am willing to put in.
Service | Use | Cost |
---|---|---|
Bitwarden | Secrets with External Secrets | ~$TBC/yr |
Cloudflare | Domain and S3 | ~$TBC/yr |
GitHub | Hosting this repository and continuous integration/deployments | Free |
Gatus | Monitoring internet connectivity and external facing applications | Free |
Total: ~$TBC/mo |
Name | Device | CPU | OS Disk | Data Disk | RAM | OS | Purpose |
---|
First server (running NAS + compute home server)
Component | Specification |
---|---|
CPU | Intel E-2246G 3.6GHz (base) Xeon β 6 cores / 12 threads |
RAM | 64GB ECC Kingston |
Motherboard | Gigabyte C246-WU4 Workstation/Server |
OS Disk | 500GB Samsung Evo 850 SSD |
NVMe | 1TB Samsung 970 Evo Plus NVMe |
NAS/Data Disks | Seagate IronWolf / EXOS β various (1TB / 2TB / 4TB / 6TB / 8TB / 10TB) x2 (7200RPM, 256MB, CMR) |
Networking | Intel WiFi-6 AX201 PCIe |
Cooling | Be Quiet! Dark Rock Slim |
PSU | Corsair AX760 |
Case | Lian-Li PC-7H Full Aluminum ATX |
Note: PROPOSING TO DECOUPLE NAS AND HOME SERVER β budget constrained (3rd world country costs). This is a proposed architecture change; details and migration plan are tracked elsewhere in the repo.
The rest of the homelab and network devices include:
Device | Purpose / Notes |
---|---|
HORACO 2.5GbE Managed Switch (8-port 2.5GBASE-T, 10G SFP+ uplink) | Core LAN switching / aggregation |
PiKVM-A3 (for Raspberry Pi) | Remote KVM management for hardware access |
Intel N100 Celeron N5105 fanless mini PC (Aliexpress model) | Soft-router / pfSense / ESXi-capable device; 4x Intel i226/i225 2.5G LAN ports, HD/DP β used as edge firewall/router |
TP-LINK EAP670 AX5400 Ceiling Mount WiFi 6 Access Point | Wireless coverage (ceiling mount) |
PoE Cameras | reolink_rlc1212a_frontdoor, reolink_rlc520a_upstair, reolink_rlc811a_backdoor, reolink_rlc520a_dining |
D-Link DES-F1010P-E (10-port PoE switch, 8 PoE + 2 uplink; 93W budget) | Powers PoE cameras / APs |
TP-Link TL-SG108E (8-Port Easy Smart Switch) | Small L2 management switch for edge/office |
Device | Purpose |
---|---|
Mini PC (Aliexpress Intel N100 / N5105) | Edge router / firewall appliance |
TP-LINK EAP670 | Primary wireless AP (ceiling mount) |
HORACO 2.5GbE switch | Primary LAN switch (2.5G uplinks) |
TP-Link TL-SG108E | Small managed L2 switch for office/VLANs |
D-Link DES-F1010P-E | PoE switch for cameras / APs (93W power budget) |
UPS | (TBD β planned) |
Assumptions: these are rough, conservative estimates for typical (idle/average) and peak loads. Real consumption depends on workload, drive spin-up, PoE loads, and UPS efficiency. I assume a UPS/inverter efficiency of ~90% for mains draw calculations.
Item | Typical (W) | Peak (W) | Notes |
---|---|---|---|
First server (E-2246G, 64GB, NVMe + SATA SSD + 2x HDD) | 120 | 230 | Typical under light/medium home-lab load; peak under full CPU+disk IO |
HORACO 2.5GbE managed switch (8-port + SFP+) | 25 | 30 | Fanless small managed switch |
PiKVM-A3 (Raspberry Pi) | 5 | 6 | Low-power remote KVM board |
Intel N100 / N5105 mini PC (edge router) | 20 | 35 | Typical for fanless mini-PC, spikes under load/VPN/ESXi usage |
TP-LINK EAP670 AP | 12 | 20 | PoE/AC/AP radio active under clients |
D-Link DES-F1010P-E (PoE switch) - switch chassis | 15 | 20 | Switch chassis overhead (PoE separate) |
Cameras (4x Reolink, listed models) - total | 32 | 40 | ~8W each typical, up to ~10W peak each |
TP-Link TL-SG108E | 6 | 8 | Small L2 switch |
Totals:
- Total typical IT load: 235 W
- Total peak IT load: 388 W
Accounting for UPS/inverter inefficiency (~90%):
- Mains draw (typical) β 235 W / 0.9 β 261 W (0.261 kW)
- Mains draw (peak) β 388 W / 0.9 β 431 W (0.431 kW)
Estimated energy consumption:
- Daily (typical): 0.261 kW Γ 24 h β 6.26 kWh/day
- Monthly (30d, typical): β 187.9 kWh/month
Notes & next steps:
- These are estimates to help size UPS, breakers, and monthly energy costs. If you want, I can: compute estimated monthly cost for your local electricity rate, produce a per-device YAML inventory with the wattages, or create a small PowerPoint/CSV for UPS sizing and redundancy.
This repository includes several Taskfile tasks to automate infrastructure provisioning and cluster configuration:
-
init Initializes the Terraform working directory for Proxmox infrastructure.
task talos:init
- Runs
terraform init
ininfrastructure/terraform/proxmox
.
- Runs
-
apply Applies the Terraform configuration to provision/update Proxmox infrastructure.
task talos:apply
- Runs
terraform apply -auto-approve
ininfrastructure/terraform/proxmox
.
- Runs
See
.taskfiles/talos/Taskfile.yaml
for more details and additional tasks.
Big shout out to original flux-cluster-template, and the Home Operations Discord community.
Be sure to check out kubesearch.dev for ideas on how to deploy applications or get ideas on what you may deploy.
See my awful commit history
ARCHIVES FOLDER IS REMOVED ON Aug 10 14:20:50
See LICENSE