tsmetrics

A comprehensive Tailscale Prometheus exporter that combines API metadata with live device metrics for complete network observability.

🚀 Features

Dual Data Sources: Combines Tailscale REST API metadata with live device client metrics
Comprehensive Metrics: Device status, network traffic, routing configuration, and health monitoring
tsnet Integration: Optional Tailscale network integration for secure internal access
Concurrent Scraping: Configurable parallel device metrics collection
Production Ready: Docker/Kubernetes deployments with proper health checks
Memory Efficient: Automatic cleanup of stale device metrics
Modern Go Architecture: Standard Go project structure with clear package boundaries

Installation

Binary Installation

Download release binary:

# Download latest release
curl -L https://github.com/sbaerlocher/tsmetrics/releases/latest/download/tsmetrics-linux-amd64 -o tsmetrics
chmod +x tsmetrics
./tsmetrics

Build from source:

git clone https://github.com/sbaerlocher/tsmetrics
cd tsmetrics
make build
./bin/tsmetrics

Configure environment:

cp .env.example .env
# Edit .env with your Tailscale credentials

Verify installation:

curl http://localhost:9100/metrics
curl http://localhost:9100/health

Docker Installation

Standalone mode:

docker run -d \
  --name tsmetrics \
  -e OAUTH_CLIENT_ID=your_client_id \
  -e OAUTH_CLIENT_SECRET=your_client_secret \
  -e TAILNET_NAME=your-company \
  -p 9100:9100 \
  ghcr.io/sbaerlocher/tsmetrics:latest

tsnet mode (recommended for production):

docker run -d \
  --name tsmetrics \
  -e USE_TSNET=true \
  -e TSNET_HOSTNAME=tsmetrics \
  -e TSNET_TAGS=exporter \
  -e OAUTH_CLIENT_ID=your_client_id \
  -e OAUTH_CLIENT_SECRET=your_client_secret \
  -e TAILNET_NAME=your-company \
  -v tsnet-state:/tmp/tsnet-state \
  ghcr.io/sbaerlocher/tsmetrics:latest

Quick Start

Prerequisites

Tailscale account with API access
OAuth2 client credentials from Tailscale Admin Console
Target devices with client metrics enabled (tailscale set --metrics-listen-addr=0.0.0.0:5252)

Basic Setup

Clone and build:

git clone https://github.com/sbaerlocher/tsmetrics
cd tsmetrics
make build

Configure environment:

cp .env.example .env
# Edit .env with your Tailscale credentials

Run standalone:
```
make run
```

Verify metrics:

curl http://localhost:9100/metrics
curl http://localhost:9100/health

Project Structure

tsmetrics/
├── cmd/tsmetrics/          # Application entry point
│   └── main.go
├── internal/               # Private application packages
│   ├── api/               # Tailscale API client
│   ├── config/            # Configuration management
│   ├── errors/            # Error types and handling
│   ├── metrics/           # Metrics collection and definitions
│   └── server/            # HTTP server and handlers
├── pkg/device/            # Public device package
├── scripts/               # Build and development scripts
├── deploy/                # Deployment configurations
│   ├── docker-compose.yaml
│   ├── kubernetes.yaml
│   └── systemd.service
├── .env.example           # Environment configuration template
├── Makefile              # Build and development targets
├── Dockerfile            # Container build configuration
└── bin/                  # Compiled binaries

Package Overview

Package	Description
`cmd/tsmetrics`	Application entry point and main function
`internal/api`	Tailscale API client with OAuth2 authentication
`internal/config`	Configuration loading and validation
`internal/errors`	Custom error types and error handling
`internal/metrics`	Prometheus metrics definitions and collection
`internal/server`	HTTP server, handlers, and tsnet integration
`pkg/device`	Public device data structures and utilities

Docker Deployment

Standalone mode:

docker run -d \
  --name tsmetrics \
  -e OAUTH_CLIENT_ID=your_client_id \
  -e OAUTH_CLIENT_SECRET=your_client_secret \
  -e TAILNET_NAME=your-company \
  -p 9100:9100 \
  ghcr.io/sbaerlocher/tsmetrics:latest

tsnet mode (recommended for production):

docker run -d \
  --name tsmetrics \
  -e USE_TSNET=true \
  -e TSNET_HOSTNAME=tsmetrics \
  -e TSNET_TAGS=exporter \
  -e OAUTH_CLIENT_ID=your_client_id \
  -e OAUTH_CLIENT_SECRET=your_client_secret \
  -e TAILNET_NAME=your-company \
  -v tsnet-state:/tmp/tsnet-state \
  ghcr.io/sbaerlocher/tsmetrics:latest

Configuration

Core Settings

Environment Variable	Description	Default
`OAUTH_CLIENT_ID`	Tailscale OAuth2 Client ID	Required
`OAUTH_CLIENT_SECRET`	Tailscale OAuth2 Client Secret	Required
`TAILNET_NAME`	Tailnet name or "-" for default	Required
`PORT`	HTTP server port	`9100`
`ENV`	`production`/`prod` binds 0.0.0.0, otherwise 127.0.0.1	`development`

tsnet Configuration

Environment Variable	Description	Default
`USE_TSNET`	Enable Tailscale tsnet integration	`false`
`TSNET_HOSTNAME`	Hostname in Tailnet	`tsmetrics`
`TSNET_STATE_DIR`	Persistent state directory	`/tmp/tsnet-tsmetrics`
`TSNET_TAGS`	Comma-separated device tags	-
`TS_AUTHKEY`	Auth key for automatic device registration with tags	-
`REQUIRE_EXPORTER_TAG`	Enforce "exporter" tag requirement	`false`

Note: To automatically assign tags to the tsnet device, create an auth key in the Tailscale admin console with the desired tags and set TS_AUTHKEY. The TSNET_TAGS variable is used for validation only.

Performance Tuning

Environment Variable	Description	Default
`CLIENT_METRICS_TIMEOUT`	Device metrics timeout	`10s`
`MAX_CONCURRENT_SCRAPES`	Parallel device scrapes	`10`
`SCRAPE_INTERVAL`	Device discovery interval	`30s`

Advanced Configuration

Logging

# Set log level (debug, info, warn, error)
LOG_LEVEL=info

# Set log format (json, text)
LOG_FORMAT=text

Security

# Enforce exporter tag requirement
REQUIRE_EXPORTER_TAG=true

# Custom metrics port for devices
CLIENT_METRICS_PORT=5252

Development/Testing

# Mock devices for testing
TEST_DEVICES=gateway-1,gateway-2,server-3

# Target specific devices only
TARGET_DEVICES=production-gateway,backup-server

Environment Variables Reference

Required Variables

Variable	Description	Example
`OAUTH_CLIENT_ID`	Tailscale OAuth2 Client ID	`k123abc...`
`OAUTH_CLIENT_SECRET`	Tailscale OAuth2 Client Secret	`tskey-client-...`
`TAILNET_NAME`	Your tailnet name or "-" for personal	`company.ts.net`

Optional Variables

Variable	Default	Description
`PORT`	`9100`	HTTP server port
`ENV`	`development`	Environment (`production`/`prod` binds 0.0.0.0)
`USE_TSNET`	`false`	Enable tsnet integration
`TSNET_HOSTNAME`	`tsmetrics`	Hostname in tailnet
`TSNET_STATE_DIR`	`/tmp/tsnet-tsmetrics`	Persistent state directory
`TSNET_TAGS`	-	Comma-separated device tags
`TS_AUTHKEY`	-	Auth key for automatic registration
`REQUIRE_EXPORTER_TAG`	`false`	Enforce "exporter" tag requirement
`LOG_LEVEL`	`info`	Logging level
`LOG_FORMAT`	`text`	Log format (`text` or `json`)

Metrics Reference

Device Management (from Tailscale API)

tailscale_device_count
tailscale_device_info{device_id, device_name, online, os, version}
tailscale_device_authorized{device_id, device_name}
tailscale_device_last_seen_timestamp{device_id, device_name}
tailscale_device_user{device_id, device_name, user_email}
tailscale_device_machine_key_expiry{device_id, device_name}
tailscale_device_update_available{device_id, device_name}
tailscale_device_created_timestamp{device_id, device_name}
tailscale_device_external{device_id, device_name}
tailscale_device_blocks_incoming_connections{device_id, device_name}
tailscale_device_ephemeral{device_id, device_name}
tailscale_device_multiple_connections{device_id, device_name}
tailscale_device_tailnet_lock_error{device_id, device_name}

Network Configuration (from Tailscale API)

tailscale_device_routes_advertised{device_id, device_name, route}
tailscale_device_routes_enabled{device_id, device_name, route}
tailscale_device_exit_node{device_id, device_name}
tailscale_device_subnet_router{device_id, device_name}

Network Performance (from device client metrics)

tailscaled_inbound_bytes_total{device_id, device_name, path}
tailscaled_outbound_bytes_total{device_id, device_name, path}
tailscaled_inbound_packets_total{device_id, device_name, path}
tailscaled_outbound_packets_total{device_id, device_name, path}
tailscaled_inbound_dropped_packets_total{device_id, device_name}
tailscaled_outbound_dropped_packets_total{device_id, device_name, reason}
tailscaled_health_messages{device_id, device_name, type}
tailscaled_advertised_routes{device_id, device_name}
tailscaled_approved_routes{device_id, device_name}

Connectivity & Performance (from Tailscale API)

tailscale_device_latency_ms{device_id, device_name, derp_region, preferred}
tailscale_device_endpoints_total{device_id, device_name}
tailscale_device_client_supports{device_id, device_name, feature}
tailscale_device_posture_serial_numbers_total{device_id, device_name}

Grafana Dashboards

Pre-built dashboards are available in the deploy/grafana/ directory:

TSMetrics Overview Dashboard

File: deploy/grafana/tsmetrics-overview.json
UID: tsmetrics-overview
Features: Network status, device count, performance KPIs, traffic analysis

TSMetrics Device Details Dashboard

File: deploy/grafana/tsmetrics-device-details.json
UID: tsmetrics-device-details
Features: Per-device metrics, connectivity analysis, route advertisements

Dashboard Import

Configure Prometheus data source in Grafana
Import dashboards:
- Via UI: + → Import → Upload JSON files from deploy/grafana/
- Via API: curl -X POST -H "Content-Type: application/json" -d @deploy/grafana/tsmetrics-overview.json http://admin:admin@localhost:3000/api/dashboards/db

Grafana Dashboards

Available Dashboards

1. Tailscale / Overview (deploy/grafana/tsmetrics-overview.json)

UID: tsmetrics-overview
Network status and health metrics
Device count and online status
Performance KPIs (latency, availability, bandwidth)
Exit nodes and subnet routers
Traffic analysis and error rates
Service monitoring

2. Tailscale / Device Details (deploy/grafana/tsmetrics-device-details.json)

UID: tsmetrics-device-details
Individual device metrics and status
Connectivity analysis (direct vs DERP)
Device-specific performance data
Route advertisements and configurations
Per-device traffic patterns

Installation

Prerequisites:

Grafana instance with Prometheus data source
TSMetrics exporter running and configured in Prometheus

Import via Grafana UI:

Go to + → Import
Upload the JSON files from deploy/grafana/
Select your Prometheus data source
Click Import

Import via API:

# Set your Grafana details
GRAFANA_URL="http://your-grafana-instance"
GRAFANA_TOKEN="your-admin-token"

# Import Overview Dashboard
curl -X POST "${GRAFANA_URL}/api/dashboards/db"
  -H "Authorization: Bearer ${GRAFANA_TOKEN}"
  -H "Content-Type: application/json"
  -d @deploy/grafana/tsmetrics-overview.json

# Import Device Details Dashboard
curl -X POST "${GRAFANA_URL}/api/dashboards/db"
  -H "Authorization: Bearer ${GRAFANA_TOKEN}"
  -H "Content-Type: application/json"
  -d @deploy/grafana/tsmetrics-device-details.json

Dashboard Features

Navigation:

Cross-links between dashboards
Device filtering in device details dashboard
Auto-refresh every 30 seconds

Variables:

$datasource: Prometheus data source selector
Device and time range filtering

Visual Indicators:

Offline devices
High error rates
Performance degradation
Network connectivity issues

Sample Dashboard Queries

Device Count Overview:

sum(tailscale_device_count)

Online vs Offline Devices:

sum by (online) (tailscale_device_info)

Network Traffic by Device:

rate(tailscaled_inbound_bytes_total[5m])
rate(tailscaled_outbound_bytes_total[5m])

Device Health Status:

sum by (device_name, type) (tailscaled_health_messages)

Subnet Router Status:

sum by (device_name) (tailscale_device_subnet_router)

Dashboard Creation

Import Prometheus data source in Grafana
Create dashboard with panels for:
- Device inventory and status
- Network traffic heatmaps
- Health monitoring alerts
- Route advertisement status
Set up alerts for offline devices or health issues

Monitoring and Alerting

Prometheus Alerting Rules

groups:
  - name: tailscale
    rules:
      - alert: TailscaleDeviceOffline
        expr: tailscale_device_info{online="false"} == 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Tailscale device {{ $labels.device_name }} is offline"

      - alert: TailscaleHighPacketLoss
        expr: rate(tailscaled_inbound_dropped_packets_total[5m]) > 100
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High packet loss on {{ $labels.device_name }}"

Deployment

tsmetrics supports modern Kubernetes deployment methods using industry-standard tools. Choose between Helm for template-based deployments or Kustomize for overlay-based configurations.

Helm Chart (Recommended for Production)

Template-based deployment with full lifecycle management:

# Install from OCI registry (recommended)
helm install tsmetrics oci://ghcr.io/sbaerlocher/charts/tsmetrics

# Or install from local chart
helm install tsmetrics deploy/helm

# Install with custom values
helm install tsmetrics oci://ghcr.io/sbaerlocher/charts/tsmetrics \
  --set tailscale.oauthClientId=your-client-id \
  --set tailscale.oauthClientSecret=your-client-secret \
  --set tailscale.tailnetName=your-company

# Or use a values file
helm install tsmetrics oci://ghcr.io/sbaerlocher/charts/tsmetrics -f my-values.yaml

Example values.yaml:

image:
  tag: "v1.0.0"

tailscale:
  oauthClientId: "k123abc..."
  oauthClientSecret: "tskey-client-..."
  tailnetName: "company.ts.net"
  tsnet:
    enabled: true
    hostname: "tsmetrics"
    tags: "exporter"

resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 1Gi

persistence:
  enabled: true
  size: 2Gi

# External secrets integration
externalSecret:
  enabled: true
  secretName: "my-tailscale-secrets"

# ServiceMonitor for Prometheus Operator
serviceMonitor:
  enabled: true
  interval: 30s

Helm Features:

Configurable values via values.yaml
Secret management with external secrets support
Resource limits and requests
Health checks and liveness probes
Optional persistence for tsnet state
ServiceMonitor for Prometheus Operator
OCI registry support

Kustomize (Recommended for GitOps)

Environment-specific deployments with overlay management:

# Development deployment
kubectl apply -k deploy/kustomize/overlays/development

# Production deployment
kubectl apply -k deploy/kustomize/overlays/production

# Preview changes before applying
kubectl kustomize deploy/kustomize/overlays/production

Secret setup (required for Kustomize):

kubectl create secret generic tsmetrics-secrets \
  --from-literal=OAUTH_CLIENT_ID=your-client-id \
  --from-literal=OAUTH_CLIENT_SECRET=your-client-secret \
  --from-literal=TAILNET_NAME=your-company

Kustomize Structure:

deploy/kustomize/base/ - Base resources
deploy/kustomize/overlays/development/ - Development configuration
deploy/kustomize/overlays/production/ - Production configuration with HPA and ServiceMonitor

Deployment Comparison

Method	Best For	Pros	Cons
Helm	Production, Multi-env	OCI registry, lifecycle management, templating	Learning curve
Kustomize	GitOps, Environment overlays	Native k8s, patches, no templating	Limited logic

Deployment Features Comparison

Feature	Helm	Kustomize Base	Kustomize Dev	Kustomize Prod
ServiceMonitor	Optional	❌	❌	✅
External Secrets	✅	✅	✅	✅
HPA	Optional	❌	❌	✅
Persistence	Optional	❌	❌	✅
Resource Limits	Configurable	Basic	Reduced	Production

Available Commands

# Build and Test (CI/CD Pipeline Tasks)
make build                    # Build binary with GoReleaser
make test                     # Run test suite
make lint                     # Run Go linting (golangci-lint)

# Container Operations
docker build -t tsmetrics .   # Build container image
make container-test           # Run container structure tests

# Deployment Validation
helm lint deploy/helm                    # Validate Helm chart
helm template tsmetrics deploy/helm      # Test Helm templating
kubectl kustomize deploy/kustomize/overlays/production  # Test Kustomize

# Release Testing (Local)
goreleaser build --snapshot --clean  # Test multi-platform builds
goreleaser check                      # Validate .goreleaser.yaml

Prometheus Configuration

scrape_configs:
  - job_name: 'tailscale-metrics'
    static_configs:
      - targets: ['tsmetrics.tailnet.ts.net:9100']  # tsnet mode
    scrape_interval: 60s
    metrics_path: /metrics
    timeout: 30s

CI/CD Pipeline

tsmetrics uses a modern, automated CI/CD pipeline built with GitHub Actions for continuous integration, automated releases, and security scanning.

Pipeline Overview

The project uses a single, consolidated workflow (.github/workflows/main.yml) that handles:

Continuous Integration: Automated testing, linting, and security scanning
Container Registry: Multi-platform Docker builds with automatic pushes to GitHub Container Registry
Automated Releases: GoReleaser-powered releases with multi-platform binaries and checksums
Security: Vulnerability scanning with Trivy and dependency security checks
Quality Assurance: Go linting, container structure tests, and Helm chart validation

Workflow Triggers

# Automatic triggers
on:
  push:
    branches: [main]          # CI on main branch commits
    tags: ['v*']              # Releases on version tags
  pull_request:
    branches: [main]          # CI on pull requests
  schedule:
    - cron: '0 6 * * 1'       # Weekly security scans (Mondays 6 AM UTC)
  workflow_dispatch:          # Manual trigger support

Pipeline Stages

1. Code Quality & Testing

Go Linting: Uses golangci-lint with comprehensive rule set
Unit Tests: Runs complete test suite with coverage reporting
Security Scanning: SAST analysis with CodeQL and dependency scanning

2. Container Build & Security

Multi-Platform Builds: Linux AMD64/ARM64 using Docker Buildx
Registry Push: Automatic push to ghcr.io/sbaerlocher/tsmetrics
Container Security: Trivy vulnerability scanning
Structure Testing: Container structure validation with Google's container-structure-test

3. Release Automation

GoReleaser: Multi-platform binary builds (Linux, macOS, Windows)
Checksums: SHA256 checksums for all release artifacts
GitHub Releases: Automated release creation with changelogs
Container Tags: Semantic versioning with latest, vX.Y.Z, and vX.Y tags

4. Deployment Validation

Helm Linting: Chart validation with helm lint
Kustomize Testing: Kubernetes manifest validation
Template Rendering: Helm template generation testing

Container Registry

All container images are available from GitHub Container Registry:

# Latest release
docker pull ghcr.io/sbaerlocher/tsmetrics:latest

# Specific version
docker pull ghcr.io/sbaerlocher/tsmetrics:v1.0.0

# Development builds (from main branch)
docker pull ghcr.io/sbaerlocher/tsmetrics:main

Release Process

Releases are fully automated through GoReleaser:

Tag Creation: Push a version tag (e.g., v1.0.0)
```
git tag v1.0.0
git push origin v1.0.0
```
Automatic Build: Pipeline creates:
- Multi-platform binaries (Linux/macOS/Windows, AMD64/ARM64)
- Container images with proper tags
- SHA256 checksums
- GitHub release with auto-generated changelog
Artifact Distribution:
- Binaries available at GitHub Releases
- Container images pushed to ghcr.io
- Helm charts published to OCI registry

Security Features

Vulnerability Scanning: Daily Trivy scans for container vulnerabilities
Dependency Updates: Automated security updates via Dependabot
SAST Analysis: CodeQL static analysis for Go code
Supply Chain Security: SLSA-compliant builds with provenance attestation
Secrets Management: No hardcoded secrets, environment-based configuration

Local Pipeline Testing

Test pipeline components locally before pushing:

# Test Go linting (same as CI)
golangci-lint run

# Test Go builds with GoReleaser
goreleaser build --snapshot --clean

# Test container build
docker build -t tsmetrics:test .

# Test container structure
container-structure-test test --image tsmetrics:test --config tests/structure/container-test.yml

# Test Helm chart
helm lint deploy/helm
helm template tsmetrics deploy/helm

# Test Kustomize
kubectl kustomize deploy/kustomize/overlays/production

Performance & Efficiency

Caching: Aggressive Go module and Docker layer caching
Parallel Jobs: Independent jobs run concurrently
Conditional Execution: Smart job skipping based on changes
Optimized Builds: Multi-stage Docker builds with minimal final images

Monitoring & Observability

The pipeline includes comprehensive monitoring:

Build Metrics: Duration, success rates, artifact sizes
Security Metrics: Vulnerability counts, severity levels
Quality Metrics: Test coverage, linting issues
Performance Metrics: Build times, cache hit rates

Branch Protection

Main branch is protected with:

Required Status Checks: All CI jobs must pass
PR Reviews: Code review required before merge
No Force Push: History preservation enforced
Admin Enforcement: Rules apply to all contributors

For detailed pipeline documentation, see .github/workflows/README.md.

Development

Development Scripts

The scripts/ directory contains build and development scripts:

Script Overview:

setup-env.sh: Central environment variable configuration with build metadata
start-dev.sh: Development environment with live reload using air
build-app.sh: Production build with version metadata

Development Workflow:

# Start development environment (recommended)
make dev                    # Uses scripts/start-dev.sh with live reload

# Build application
make build                  # Uses scripts/build-app.sh

# Run directly
make run                    # Direct go run

# Load environment manually
source scripts/setup-env.sh

Environment Management:

All environment variables are centrally managed in setup-env.sh with:

Default development values
Override via .env file in project root
Override via system environment variables
Build metadata from Makefile variables

Prerequisites

Go 1.25+
Docker (optional)
air for live reload (optional)

Development Workflow

# Setup development environment
cp .env.example .env
# Edit .env with your Tailscale credentials
make dev-deps

# Start development server with live reload
# Environment variables are automatically loaded from .env and set via dev.sh
make dev

# Alternative development commands:
make dev-tsnet     # Same as dev (alias)
make dev-direct    # Direct go run (no live reload)

# Run tests
make test

# Build and run locally
make build
make run-tsnet

Scripts Overview

The scripts/ directory contains build and development scripts:

setup-env.sh: Central environment variable configuration
start-dev.sh: Development environment with live reload
build-app.sh: Production build script

All environment variables are centrally managed and can be overridden via:

.env file in project root
System environment variables
Makefile variables (for build metadata)

Environment Configuration

The development environment uses a dedicated dev.sh script that:

Loads .env file if present (automatically exports variables)
Sets sensible defaults for all configuration options
Ensures consistency between development runs
Manages air installation and execution

You only need to:

Copy the example: cp .env.example .env
Configure credentials: Edit .env with your Tailscale OAuth details
Run development: make dev

All environment variables are managed centrally through the dev.sh script, eliminating the need to maintain duplicated configurations.

Testing with Mock Devices

For development without real Tailscale credentials:

export TEST_DEVICES="gateway-1,gateway-2,server-3"
make run

Architecture

Project Structure

tsmetrics/
├── cmd/tsmetrics/          # Application entry point
│   └── main.go
├── internal/               # Private application packages
│   ├── api/               # Tailscale API client
│   ├── config/            # Configuration management
│   ├── errors/            # Error types and handling
│   ├── metrics/           # Metrics collection and definitions
│   └── server/            # HTTP server and handlers
├── pkg/device/            # Public device package
├── scripts/               # Build and development scripts
├── deploy/                # Deployment configurations
│   ├── docker-compose.yaml
│   ├── kubernetes.yaml
│   └── systemd.service
└── bin/                   # Compiled binaries

Operation Flow

tsmetrics operates in two phases:

Device Discovery: Fetches device inventory from Tailscale REST API
Metrics Collection: Concurrently scrapes client metrics from each online device with the "exporter" tag

Security Features

OAuth2 Flow: Uses client credentials for secure API access
Input Validation: Validates all hostnames to prevent injection attacks
Tag-Based Access: Only scrapes devices with the "exporter" tag
Rate Limiting: Configurable concurrent scraping limits
No Hardcoded Secrets: All credentials via environment variables

Performance Features

Connection Pooling: Reuses HTTP connections for efficiency
Concurrent Scraping: Parallel device metrics collection
Memory Management: Automatic cleanup of stale device metrics
Circuit Breaker: Protects against API failures (planned)

High Availability

Load Balancing

# Multiple tsmetrics instances with different hostnames
services:
  tsmetrics-1:
    image: ghcr.io/sbaerlocher/tsmetrics:latest
    environment:
      - TSNET_HOSTNAME=tsmetrics-1

  tsmetrics-2:
    image: ghcr.io/sbaerlocher/tsmetrics:latest
    environment:
      - TSNET_HOSTNAME=tsmetrics-2

Backup and Recovery

# Backup tsnet state
docker run --rm -v tsnet-state:/data -v $(pwd):/backup \
  alpine tar czf /backup/tsnet-backup.tar.gz -C /data .

# Restore tsnet state
docker run --rm -v tsnet-state:/data -v $(pwd):/backup \
  alpine tar xzf /backup/tsnet-backup.tar.gz -C /data

Troubleshooting

Common Issues

OAuth2 Authentication Failed

Verify OAUTH_CLIENT_ID and OAUTH_CLIENT_SECRET
Check that the OAuth client has appropriate scopes
Ensure TAILNET_NAME matches your tailnet exactly

No Devices Discovered

Confirm API credentials are correct
Check that devices are online in Tailscale admin console
Verify network connectivity to Tailscale API

Client Metrics Not Available

Enable metrics on target devices: tailscale set --metrics-listen-addr=0.0.0.0:5252
Ensure devices have the "exporter" tag
Check firewall rules allow HTTP access to port 5252

tsnet Authentication Issues

First run may require interactive authentication
Check tsnet state directory permissions
Verify TSNET_TAGS includes required tags

tsnet Startup Messages

Messages like "routerIP/FetchRIB: sysctl: cannot allocate memory" are normal internal tsnet logs during startup
These are not errors but informational messages from the Tailscale networking layer
Initial device scraping errors are expected until tsnet establishes connection
Connection typically stabilizes within 10-30 seconds

Debug Mode

Enable debug logging and access debug endpoint:

# Check application status
curl http://localhost:9100/debug

# View detailed logs
docker logs tsmetrics -f

# Enable debug logging
export LOG_LEVEL=debug
make run

# Test specific device
export TEST_DEVICES="specific-device-name"
make run

Performance Troubleshooting

High Memory Usage:

# Monitor memory usage
docker stats tsmetrics

# Reduce concurrent scrapes
export MAX_CONCURRENT_SCRAPES=5

# Increase cleanup frequency
export SCRAPE_INTERVAL=60s

Slow Device Discovery:

# Check API response time
curl -w "%{time_total}" https://api.tailscale.com/api/v2/tailnet/{tailnet}/devices

# Reduce timeout
export CLIENT_METRICS_TIMEOUT=5s

# Target specific devices only
export TARGET_DEVICES=critical-device-1,critical-device-2

Network Troubleshooting

Connection Issues:

# Test device connectivity
telnet device-ip 5252

# Check firewall rules
iptables -L | grep 5252

# Test metrics endpoint
curl http://device-ip:5252/debug/metrics

DNS Resolution:

# Test device hostname resolution
nslookup device-name.tailnet.ts.net

# Check tsnet connectivity
docker exec tsmetrics ping device-name

Container Troubleshooting

Permission Issues:

# Check container user
docker exec tsmetrics id

# Fix volume permissions
docker run --rm -v tsnet-state:/data alpine chown -R 65534:65534 /data

Resource Constraints:

# Increase container limits
docker run --memory=512m --cpus=1.0 ghcr.io/sbaerlocher/tsmetrics:latest

# Monitor resource usage
docker exec tsmetrics top

Migration Guide

Upgrading from v1.x

This project has been restructured to follow Go best practices. If you're upgrading from an older version:

Major Changes in v2.0

Project Structure: Migrated from monolithic to modular structure
- main.go → cmd/tsmetrics/main.go
- Split into logical packages under internal/ and pkg/

Package Organization:

Old Structure → New Structure
config.go     → internal/config/config.go
api.go        → internal/api/client.go
device.go     → pkg/device/device.go
metrics.go    → internal/metrics/{definitions,collector,scraper,tracker}.go
server.go     → internal/server/{server,handlers,tsnet}.go
errors.go     → internal/errors/types.go

Build Process: Now uses standard Go project layout

Migration Steps

Backup your current setup:

# Backup your environment configuration
cp .env .env.backup

Update to new version:
```
git pull origin main
make build
```

Verify functionality:

# Test with existing configuration
make run
curl http://localhost:9100/health

Update deployment scripts (if custom):
- Build commands: Use make build
- Run commands: Use make run or ./bin/tsmetrics

Compatibility

✅ Configuration: 100% compatible
✅ Metrics: Same Prometheus metrics output
✅ API: Same REST endpoints
✅ Docker: Same container interface
✅ Behavior: Identical runtime behavior

The new structure provides:

Better testability with isolated packages
Clearer dependencies and module boundaries
Improved maintainability
Enhanced IDE support
Standard Go project conventions

API Reference

Endpoints

Endpoint	Method	Description
`/metrics`	GET	Prometheus metrics
`/health`	GET	Health check
`/debug`	GET	Debug information

Health Check Response

{
  "status": "healthy",
  "timestamp": "2025-01-07T21:30:00Z",
  "version": "v1.0.0",
  "uptime": "2h15m30s",
  "devices_discovered": 15,
  "devices_scraped": 12,
  "last_scrape": "2025-01-07T21:29:45Z"
}

Debug Information

curl http://localhost:9100/debug

{
  "config": {
    "use_tsnet": true,
    "tsnet_hostname": "tsmetrics",
    "max_concurrent_scrapes": 10,
    "client_metrics_timeout": "10s"
  },
  "runtime": {
    "go_version": "go1.24.0",
    "num_goroutines": 25,
    "memory_usage": "45.2MB"
  },
  "metrics": {
    "devices_total": 15,
    "devices_online": 12,
    "scrape_errors": 0,
    "last_api_call": "2025-01-07T21:29:30Z"
  }
}

Advanced Usage

Custom Device Filtering

# Only monitor specific device types
export TARGET_DEVICES="gateway-*,router-*"

# Monitor by tag (requires API support)
export DEVICE_TAGS="production,critical"

# Exclude specific devices
export EXCLUDE_DEVICES="test-device,staging-*"

Integration Examples

Prometheus Configuration with Service Discovery

scrape_configs:
  - job_name: 'tailscale-metrics'
    kubernetes_sd_configs:
      - role: service
        namespaces:
          names:
            - monitoring
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

Grafana Alert Integration

{
  "alert": {
    "name": "Tailscale Device Offline",
    "message": "Device {{ $labels.device_name }} has been offline for > 5 minutes",
    "frequency": "30s",
    "conditions": [
      {
        "query": {
          "queryType": "",
          "refId": "A",
          "model": {
            "expr": "tailscale_device_info{online=\"false\"} == 1",
            "interval": "",
            "legendFormat": "",
            "refId": "A"
          }
        },
        "reducer": {
          "type": "last",
          "params": []
        },
        "evaluator": {
          "params": [1],
          "type": "gt"
        }
      }
    ]
  }
}

Contributing

Development Setup

Fork the repository

Clone your fork:

git clone https://github.com/yourusername/tsmetrics
cd tsmetrics

Set up development environment:

cp .env.example .env
# Edit .env with your Tailscale credentials
make dev-deps

Start development server:
```
make dev
```

Making Changes

Create a feature branch:

git checkout -b feature/your-feature-name

Make your changes
Add tests for new functionality

Ensure all tests pass:

make test
make lint
goreleaser check  # Validate release configuration

Update documentation if needed

Submitting Changes

Commit your changes:

git add .
git commit -m "feat: add your feature description"

Push to your fork:

git push origin feature/your-feature-name

Create a pull request

Code Guidelines

Follow Go best practices and idioms
Add tests for new functionality
Update documentation for user-facing changes
Use conventional commit messages (feat:, fix:, docs:, etc.)
Ensure code passes all linters and security scans
Test changes locally with goreleaser build --snapshot
Validate container changes with structure tests

Testing

# Run all tests
make test

# Run with coverage
go test -v -coverprofile=coverage.out ./...
go tool cover -html=coverage.out

# Run integration tests (planned)
make test-integration

License

MIT License - see LICENSE for details.

Changelog

v1.0.0 (2025-01-07)

Initial Release: Complete Tailscale Prometheus exporter
Modern Go Architecture: Standard Go project structure with clear package boundaries
Dual Data Sources: Combines Tailscale REST API metadata with live device client metrics
Production Ready: Docker/Kubernetes deployments with proper health checks
tsnet Integration: Optional Tailscale network integration for secure internal access
Concurrent Scraping: Configurable parallel device metrics collection
Security Features: OAuth2 authentication, tag-based access control, input validation
CI/CD Pipeline: GitHub Actions with automated releases via GoReleaser
Comprehensive Documentation: Complete setup, deployment, and troubleshooting guides

Disclaimer

Trademark Notice: Tailscale is a trademark of Tailscale Inc. This project is not affiliated with, endorsed by, or sponsored by Tailscale Inc.

Legal: This is an independent, community-developed tool that interfaces with Tailscale's public APIs. Use at your own risk.

Support: For Tailscale-related issues, please contact Tailscale Support. For issues specific to this exporter, please use the GitHub Issues.

Related Projects

Tailscale - Zero config VPN
Prometheus - Monitoring system
Grafana - Visualization platform

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
.vscode		.vscode
cmd/tsmetrics		cmd/tsmetrics
deploy		deploy
internal		internal
pkg/device		pkg/device
scripts		scripts
tests		tests
.air.toml		.air.toml
.devskim.json		.devskim.json
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yaml		.goreleaser.yaml
.markdownlint.json		.markdownlint.json
.mega-linter.yml		.mega-linter.yml
.yamlignore		.yamlignore
.yamllint		.yamllint
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
tsmetrics.code-workspace		tsmetrics.code-workspace

License

sbaerlocher/tsmetrics

Folders and files

Latest commit

History

Repository files navigation

tsmetrics

🚀 Features

Table of Contents

Installation

Binary Installation

Docker Installation

Quick Start

Prerequisites

Basic Setup

Project Structure

Package Overview

Docker Deployment

Configuration

Core Settings

tsnet Configuration

Performance Tuning

Advanced Configuration

Logging

Security

Development/Testing

Environment Variables Reference

Required Variables

Optional Variables

Metrics Reference

Device Management (from Tailscale API)

Network Configuration (from Tailscale API)

Network Performance (from device client metrics)

Connectivity & Performance (from Tailscale API)

Grafana Dashboards

TSMetrics Overview Dashboard

TSMetrics Device Details Dashboard

Dashboard Import

Grafana Dashboards

Available Dashboards

Installation

Dashboard Features

Sample Dashboard Queries

Dashboard Creation

Monitoring and Alerting

Prometheus Alerting Rules

Deployment

Helm Chart (Recommended for Production)

Kustomize (Recommended for GitOps)

Deployment Comparison

Deployment Features Comparison

Available Commands

Prometheus Configuration

CI/CD Pipeline

Pipeline Overview

Workflow Triggers

Pipeline Stages

1. Code Quality & Testing

2. Container Build & Security

3. Release Automation

4. Deployment Validation

Container Registry

Release Process

Security Features

Local Pipeline Testing

Performance & Efficiency

Monitoring & Observability

Branch Protection

Development

Development Scripts

Prerequisites

Development Workflow

Scripts Overview

Environment Configuration

Testing with Mock Devices

Architecture

Project Structure

Operation Flow

Security Features

Performance Features

Packages