Skip to content

solanyn/home-ops

Repository files navigation

πŸš€ My Home Operations Repository 🚧

... managed with Flux, Renovate, and GitHub Actions πŸ€–

TalosΒ Β  KubernetesΒ Β  FluxΒ Β 

Renovate

Home-InternetΒ Β  Status-PageΒ Β  Alertmanager

Age-DaysΒ Β  Uptime-DaysΒ Β  Node-CountΒ Β  Pod-CountΒ Β  CPU-UsageΒ Β  Memory-UsageΒ Β  Power-UsageΒ Β  Alerts


πŸ’‘ Overview

This repository shows a production Kubernetes cluster running on bare metal with ML/MLOps tooling, data engineering platforms and automated operations.

  • Infrastructure as Code (IaC) with declarative configuration management
  • GitOps deployment workflows with automated reconciliation
  • Modern security with encrypted secrets and zero-trust networking
  • Production observability with comprehensive monitoring and alerting
  • Automated operations including dependency management and backup orchestration

🌱 Architecture

The cluster runs on Talos Linux, a security-hardened operating system designed for Kubernetes:

  • Bare metal deployment with immutable infrastructure
  • Semi-hyper-converged architecture combining compute and storage resources
  • Distributed storage using Rook-Ceph for persistent volumes
  • Networking with Cilium CNI, BGP load balancing and Kubernetes Gateway API
  • AI gateway integration with Envoy proxy for model routing
  • GitOps automation with FluxCD and dependency management

There is a template at onedr0p/cluster-template if you want to follow along with some of the practices used here.

ML/MLOps Platform

  • kubeflow: Complete ML platform with pipelines, notebooks, model serving and experiment tracking
  • kserve: Production model serving with autoscaling, multi-framework support and Envoy AI Gateway integration
  • ray: Distributed computing for ML workloads and hyperparameter tuning
  • feast: Feature store for ML feature management and serving
  • label-studio: Data annotation platform for ML dataset preparation
  • mlflow: MLOps platform for experiment tracking, model registry and deployment
  • katib: Hyperparameter tuning and neural architecture search

Analytics & Data Engineering

  • spark: Big data processing engine for large-scale analytics
  • dask: Parallel computing library for scalable data science
  • trino: Distributed SQL engine for analytics across data sources
  • kafka: Event streaming platform for real-time data processing
  • flink: Stream processing for real-time analytics and ML inference
  • lakekeeper: Apache Iceberg REST catalog for data lakehouse operations

Infrastructure Components

  • envoy: API gateway with Kubernetes Gateway API implementation and AI model routing
  • istio: Service mesh for traffic management, security and observability
  • knative: Serverless workload deployment with scale-to-zero capabilities
  • rook-ceph: Cloud-native distributed storage with Ceph orchestration
  • openebs: Container-native storage with local PV provisioning
  • volsync: Automated backup orchestration with cross-cluster replication
  • kopia: Fast and secure backup/restore with encryption and deduplication
  • spegel: Performance optimisation with distributed OCI registry mirror
  • external-dns: Multi-zone DNS automation with split-horizon configuration

Security & Compliance

  • cert-manager: Automated TLS certificate lifecycle management
  • external-secrets: Centralised secret management with 1Password Connect integration
  • sops: Git-committed encrypted secrets for declarative secret management
  • cilium: Zero-trust networking with eBPF-based security policies
  • oauth2-proxy: Authentication proxy for securing applications with SSO
  • dex: Identity provider with OIDC and OAuth2 support
  • kyverno: Policy management for security and governance enforcement

DevOps & Development Infrastructure

  • actions-runner-controller: Self-hosted CI/CD runners for secure pipeline execution
  • cloudflared: Zero-trust access tunnels for secure ingress
  • headlamp: User-friendly Kubernetes web UI for cluster management
  • keda: Event-driven autoscaling for Kubernetes workloads

Data Storage & Databases

  • cloudnative-pg: PostgreSQL operator for production database workloads
  • percona-xtradb-cluster: Highly available MySQL cluster with synchronous replication
  • dragonfly: High-performance in-memory data store compatible with Redis and Memcached
  • minio: S3-compatible object storage for unstructured data

Observability & Monitoring

  • prometheus: Metrics collection and alerting via kube-prometheus-stack
  • grafana: Visualisation and dashboarding for metrics and logs
  • victoria-logs: High-performance log aggregation and search
  • fluent-bit: Lightweight log forwarding and processing
  • gatus: Health monitoring and status page generation
  • blackbox-exporter: External endpoint monitoring and probing
  • kromgo: Prometheus metrics to badge service for status displays

GitOps Implementation

Flux provides declarative cluster management through Git-based state reconciliation:

  • Hierarchical resource organisation with dependency-aware deployment ordering
  • Multi-tenant namespace isolation with RBAC boundary enforcement
  • Automated reconciliation with drift detection and self-healing capabilities
  • Release management through Git-based promotion workflows

Renovate provides automated dependency management across the entire repository, creating pull requests for updates and enabling continuous security patching when changes are merged.

Directories

This Git repository contains the following directories under Kubernetes.

πŸ“ kubernetes
β”œβ”€β”€ πŸ“ apps           # applications
β”œβ”€β”€ πŸ“ components     # commonly reused components e.g., status monitoring templates + volsync backed pvc
└── πŸ“ flux           # flux system configuration

Dependency Management

Applications deploy in dependency order based on infrastructure requirements, preventing race conditions.

graph TD
    A>Kustomization: rook-ceph] -->|Creates| B[HelmRelease: rook-ceph]
    A>Kustomization: rook-ceph] -->|Creates| C[HelmRelease: rook-ceph-cluster]
    C>HelmRelease: rook-ceph-cluster] -->|Depends on| B>HelmRelease: rook-ceph]
    D>Kustomization: atuin] -->|Creates| E(HelmRelease: atuin)
    E>HelmRelease: atuin] -->|Depends on| C>HelmRelease: rook-ceph-cluster]
Loading

😢 Hybrid Cloud Strategy

The setup maximises self-hosted infrastructure whilst using cloud services where appropriate.

Service Use Cost (AUD)
1Password Secrets with External Secrets ~$50/yr
Cloudflare Domains and S3 ~$30/yr
GitHub Hosting this repository and continuous integration/deployments Free
Pushover Kubernetes Alerts and application notifications $5 OTP
healthchecks.io Monitoring internet connectivity and external facing applications Free
Total: ~$7/mo

🌎 DNS Architecture

The cluster implements automated split-horizon DNS across multiple zones:

  • Internal zone management via UniFi controller integration using webhook providers
  • Public DNS automation with Cloudflare API integration
  • Traffic segmentation through ingress class-based routing (internal/external)
  • Zero-touch operations with automatic record lifecycle management

This pattern enables secure service exposure whilst maintaining internal network isolation.


βš™ Hardware

Device OS Disk Data Disk Memory OS Function
Dell Optiplex 7050 Samsung PM991 256GB Samsung PM863 960GB 32GB Talos Kubernetes
Dell Optiplex 7060 Samsung PM991 256GB Samsung PM863 960GB 32GB Talos Kubernetes
Dell Optiplex 7060 Samsung PM991 256GB Samsung PM863 960GB 32GB Talos Kubernetes
NAS (Repurposed PC) 512GB 1x12TB ZFS 16GB TrueNAS SCALE NFS + Backup Server
UniFi UCG Ultra - - - - Router

πŸ™ Gratitude and Thanks

Thanks to all the people who donate their time to the Home Operations Discord community. Be sure to check out kubesearch.dev for ideas on how to deploy applications or get ideas on what you could deploy.

Sponsor this project

 

Contributors 2

  •  
  •