diff --git a/README.md b/README.md index 322234d..70df9ab 100644 --- a/README.md +++ b/README.md @@ -1,81 +1,115 @@ # Homelab Infrastructure -A collection of self-hosted services running on Docker containers, orchestrated through Portainer and exposed via Traefik reverse proxy. +Self-hosted services running on a single-node Talos Kubernetes cluster, provisioned via Terraform on Proxmox and managed through Flux CD GitOps. ## Architecture -This homelab uses a stack-based approach where each service is containerized and deployed as a complete stack with its dependencies. All services integrate with a centralized Traefik instance for SSL termination and domain routing. - -### Stack Structure ``` -docker/stacks// - - docker-compose.yaml # Service definition - - stack.env # Environment template (tracked) - - stack.env.real # Actual values with secrets (gitignored) +Proxmox (hypervisor) +└── Talos Linux VM (Kubernetes node) + └── Flux CD (GitOps) + ├── config → cluster-wide variables & secrets + ├── infrastructure → Traefik, cert-manager, Authelia, MetalLB, NFS, ... + └── apps → application workloads +``` + +### Repository Layout + +``` +homelab-v2/ +├── terraform/ # Proxmox VM + Talos cluster provisioning +└── kubernetes/ # Flux CD manifests (Kustomize + Helm) + ├── config/ + ├── flux-system/ + ├── infrastructure/ + │ ├── controllers/ # Traefik, cert-manager, Authelia, MetalLB, ... + │ └── configs/ # ClusterIssuer, MetalLB config + ├── app/ + │ ├── archmirror/ + │ ├── external/ # External service vars (e.g. Home Assistant) + │ ├── grocy/ + │ ├── homepage/ + │ ├── immich/ + │ ├── jellyfin/ + │ ├── lubelogger/ + │ ├── media/ + │ ├── paperless/ + │ ├── pihole/ + │ └── podsync/ + └── docs/ + └── k8s-service-spec.md ``` ## Services -| Service | Description | Purpose | -|---------|-------------|---------| -| **Immich** | Self-hosted photo and video management | Personal media library with ML features | -| **Paperless-ngx** | Document management system with OCR | Digital document archive and search | -| **Media Stack** | Sonarr, Radarr, Prowlarr, qBittorrent | Automated media acquisition and management | -| **Pi-hole** | DNS sinkhole with ad blocking and dnscrypt-proxy | Network-wide ad blocking and encrypted DNS | -| **Arch Mirror** | Local Arch Linux package repository mirror | Local package cache for faster updates | +| Service | Description | +|---------|-------------| +| **Immich** | Photo and video management with face recognition | +| **Jellyfin** | Media streaming with Intel GPU hardware transcoding | +| **Media Stack** | Sonarr, Radarr, Prowlarr, qBittorrent — automated media acquisition | +| **Paperless-ngx** | Document management with OCR | +| **Pi-hole** | DNS sinkhole with ad blocking and encrypted DNS via dnscrypt-proxy | +| **Grocy** | Pantry and grocery management | +| **LubeLogger** | Vehicle maintenance tracker | +| **Homepage** | Dashboard aggregator | +| **Podsync** | Podcast downloader | +| **Archmirror** | Local Arch Linux package repository mirror | + +## Infrastructure Stack + +| Component | Role | +|-----------|------| +| **Flux CD** | GitOps controller — reconciles this repo to the cluster | +| **Traefik** | Ingress controller with Let's Encrypt TLS | +| **cert-manager** | TLS certificate provisioning (Cloudflare DNS-01) | +| **Authelia** | SSO / OIDC provider for protected services | +| **MetalLB** | Bare-metal load balancer | +| **NFS Provisioner** | Dynamic PVC provisioning backed by Synology NAS | +| **Intel GPU Plugin** | Hardware transcoding device plugin (Jellyfin) | +| **SOPS + age** | Secret encryption at rest | + +### Storage + +- **Synology NAS** — primary storage backend for all services + - Dynamic NFS PVCs via `nfs-synology-ssd` storage class + - Static NFS PVs for media library and document archives +- **local-path-provisioner** — node-local storage for SQLite databases + +### Backups + +Unified strategy using **restic + resticprofile**: + +- **Primary**: Synology NAS via `rest-server` container (`${BACKUP_LOCAL_HOST}:8000`) +- **Secondary**: Backblaze B2 (offsite), synced via `resticprofile copy` +- PostgreSQL: pg_dump init container → restic +- SQLite: online backup API → restic +- Files/media: NFS mount → restic ## Deployment -Services are deployed through **Portainer WebUI**: +All changes are deployed by pushing to this repository. Flux CD reconciles on every commit. -1. Access Portainer dashboard -2. Navigate to Stacks section -3. Create new stack or update existing -4. Copy content from `docker-compose.yaml` -5. Configure environment variables from `stack.env.real` -6. Deploy stack +```sh +# Check reconciliation status +flux get kustomizations -### Environment Setup +# Force reconciliation +flux reconcile source git flux-system -For each stack: -```bash -cd docker/stacks// -cp stack.env stack.env.real -# Edit stack.env.real with actual values +# Check application status +kubectl get helmreleases -A +kubectl get pods -A ``` -## Common Operations - -### Stack Management -- Stack status and logs monitored through Portainer WebUI dashboard -- Updates performed by pulling new images and recreating containers - -### Backup Operations -Each stack includes automated backup services: -- **Database backups**: Hourly PostgreSQL dumps using postgres-backup-local -- **File backups**: Scheduled Restic backups to AWS S3 backend - -## Network Architecture - -- **traefik** (external): Reverse proxy network for SSL termination and routing -- **service-specific**: Internal networks for each stack (immich, paperless, sonarr, radarr) -- Services primarily accessed through Traefik with minimal direct port exposure +For initial cluster bootstrap, see [`kubernetes/README.md`](kubernetes/README.md). ## Security -- All services behind Traefik reverse proxy with Let's Encrypt SSL certificates -- Environment variables with secrets stored in `*.env.real` files (gitignored) -- API endpoints protected with HTTP basic authentication where applicable -- Internal service communication isolated over Docker networks +- All ingress through Traefik with Let's Encrypt TLS +- Secrets encrypted with SOPS + age (decrypted at runtime by Flux) +- SSO via Authelia (OIDC) for user-facing services +- Per-namespace NetworkPolicies with default-deny + explicit Traefik ingress allow -## Requirements +## Provisioning -- Docker and Docker Compose -- Portainer CE for stack management -- Traefik reverse proxy (external dependency) -- Valid domain names for SSL certificate generation - -## Notes - -- This repository contains infrastructure definitions only -- Actual deployment and management handled through Portainer WebUI +The cluster is provisioned with Terraform (Proxmox + Talos). See [`terraform/README.md`](terraform/README.md). diff --git a/kubernetes/README.md b/kubernetes/README.md new file mode 100644 index 0000000..74a7e55 --- /dev/null +++ b/kubernetes/README.md @@ -0,0 +1,112 @@ +# Kubernetes Cluster Bootstrap + +This covers **phase 2** of the full cluster setup. The two phases are: + +1. **Terraform** (`terraform/`) — provisions the Talos VM on Proxmox and bootstraps the Kubernetes control plane. Outputs `kubeconfig` and `talosconfig`. +2. **Flux CD** (this file) — installs the GitOps controller into the running cluster and points it at this repository. From that point on, everything in `kubernetes/` is reconciled automatically. + +If you haven't run Terraform yet, start with [`terraform/README.md`](../terraform/README.md). + +## Prerequisites + +- `flux` CLI installed +- AGE private key for SOPS decryption +- `kubectl` configured with the cluster kubeconfig from Terraform: + ```sh + cd ../terraform + terraform output -json kubeconfig | jq -r '.homelab' > ~/.kube/config + ``` + +## Bootstrap Steps + +### 1. Verify cluster access + +```sh +kubectl get nodes +``` + +### 2. Bootstrap Flux CD + +```sh +flux bootstrap github \ + --owner=berezovskyi-oleksandr \ + --repository=homelab \ + --branch=homelab-v2 \ + --path=./kubernetes \ + --token-auth \ + --personal +``` + +You will be prompted for a GitHub PAT, or set it beforehand: + +```sh +export GITHUB_TOKEN= +``` + +Create a fine-grained PAT scoped to the `homelab` repository with: +- **Contents**: Read and write +- **Metadata**: Read-only (granted automatically) + +This installs the Flux controllers and creates the `flux-system` namespace. + +### 3. Create the SOPS AGE secret + +Flux needs the AGE private key to decrypt SOPS-encrypted secrets. + +```sh +kubectl create secret generic sops-age \ + --namespace=flux-system \ + --from-file=age.agekey= +``` + +### 4. Verify Flux is reconciling + +```sh +flux get kustomizations --watch +``` + +All kustomizations should eventually show as `Ready`. + +### 5. Troubleshooting + +Check Flux controller logs: + +```sh +flux logs +``` + +Force a reconciliation: + +```sh +flux reconcile source git flux-system +flux reconcile kustomization flux-system +``` + +## Changing the Target Branch + +To point Flux at a different branch (e.g. after merging `homelab-v2` into `master`): + +1. Merge the branch as usual via a PR. +2. Re-run `flux bootstrap` with the new `--branch` value: + +```sh +flux bootstrap github \ + --owner=berezovskyi-oleksandr \ + --repository=homelab \ + --branch=master \ + --path=./kubernetes \ + --token-auth \ + --personal +``` + +This updates both the `GitRepository` resource in the cluster and the `flux-system/gotk-sync.yaml` file committed to the repo. No manual `kubectl patch` needed. + +## Reconciliation Order + +Flux applies resources in dependency order: + +1. **config** — Cluster-wide variables and encrypted secrets +2. **infrastructure-controllers** — Traefik, cert-manager, Authelia, MetalLB, NFS provisioner, Intel GPU plugin (depends on config) +3. **infrastructure-configs** — ClusterIssuer, MetalLB config (depends on infrastructure-controllers) +4. **external-vars** — External service variables (e.g. Home Assistant) +5. **apps** — All application workloads (depends on config + infrastructure-configs + external-vars) diff --git a/terraform/README.md b/terraform/README.md new file mode 100644 index 0000000..91464a8 --- /dev/null +++ b/terraform/README.md @@ -0,0 +1,68 @@ +# Terraform — Cluster Provisioning + +Provisions a Talos Linux VM on Proxmox and bootstraps the Kubernetes control plane. + +## What It Does + +1. Downloads the Talos ISO to Proxmox local storage +2. Creates a VM per entry in `var.clusters` (UEFI, SCSI disk, host CPU passthrough) +3. Generates Talos machine secrets and applies the machine configuration +4. Bootstraps the Talos cluster and waits for health check +5. Outputs `kubeconfig` and `talosconfig` for cluster access + +## Providers + +| Provider | Version | +|----------|---------| +| `bpg/proxmox` | 0.95.0 | +| `siderolabs/talos` | 0.10.1 | + +## Variables + +Configured via `terraform.tfvars` (gitignored): + +| Variable | Description | +|----------|-------------| +| `proxmox_endpoint` | Proxmox API URL (e.g. `https://pve:8006`) | +| `proxmox_api_token` | Proxmox API token (`user@realm!token=secret`) | +| `clusters` | Map of cluster definitions (see below) | + +Each entry in `clusters`: + +```hcl +clusters = { + homelab = { + cores = 8 + memory = 16384 + disk_size_gb = 100 + hostname = "talos.example.com" + mac_address = "BC:24:11:xx:xx:xx" + ip_address = "192.168.1.x" + datastore_id = "local-lvm" + } +} +``` + +## Usage + +```sh +terraform init +terraform apply + +# Write kubeconfig +terraform output -json kubeconfig | jq -r '.homelab' > ~/.kube/config + +# Write talosconfig +terraform output -json talosconfig | jq -r '.homelab' > ~/.talos/config +``` + +## Notes + +- The Talos ISO resource has `prevent_destroy = true` to avoid accidental re-download +- Control plane node has `allowSchedulingOnControlPlanes = true` (single-node cluster) +- State files (`terraform.tfstate`, `terraform.tfstate.backup`, `terraform.tfvars`, `talosconfig`) are gitignored + +## Next Steps + +Once `terraform apply` completes and you have a working kubeconfig, proceed to +[`kubernetes/README.md`](../kubernetes/README.md) to bootstrap Flux CD onto the cluster.