Skip to content

A kubectl plugin that safely restarts Kubernetes nodes by draining pods, rebooting via SSH, verifying the reboot, and uncordoning the nodes.

License

Notifications You must be signed in to change notification settings

ayetkin/kubectl-reboot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

kubectl-reboot

Go Report Card License

A kubectl plugin that safely restarts Kubernetes nodes by draining pods, rebooting via SSH, verifying the reboot, and uncordoning the nodes.

Features

  • πŸ”„ Safe Node Restart: Automatically cordon, drain, reboot, and uncordon nodes
  • πŸš€ SSH Integration: Reboot nodes via SSH with customizable commands
  • πŸ” Reboot Verification: Verify successful reboots by monitoring Boot ID changes
  • 🌐 Cluster-wide Operations: Restart all nodes or specific subsets
  • πŸ§ͺ Dry-run Mode: Preview operations without making changes
  • ⚑ Flexible Configuration: Extensive customization options
  • πŸ“‹ Rich Logging: Detailed, emoji-rich logging for better visibility

Installation

Option 1: Install via Krew (Recommended)

Krew is the plugin manager for kubectl command-line tool.

If you haven't installed Krew yet, follow the official installation guide.

Once Krew is installed, install kubectl-reboot:

kubectl krew install reboot

Verify the installation:

kubectl reboot --help

Option 2: Manual Installation

Download Pre-built Binaries

Download the latest release for your platform from the releases page.

Linux/macOS:

# Download for your platform
curl -LO https://github.com/ayetkin/kubectl-reboot/releases/latest/download/kubectl-reboot-linux-amd64.tar.gz

# Extract
tar -xzf kubectl-reboot-linux-amd64.tar.gz

# Move to PATH
sudo mv kubectl-reboot /usr/local/bin/

# Make executable
sudo chmod +x /usr/local/bin/kubectl-reboot

Build from Source

git clone https://github.com/ayetkin/kubectl-reboot.git
cd kubectl-reboot
make build
sudo cp bin/kubectl-reboot /usr/local/bin/

Usage

Basic Examples

# Restart a single node
kubectl reboot node1

# Restart multiple nodes
kubectl reboot node1 node2 node3

# Restart all worker nodes (excluding control plane)
kubectl reboot --all --exclude-control-plane

# Dry run to see what would happen
kubectl reboot --all --exclude-control-plane --dry-run

# Restart nodes from a file
kubectl reboot --file nodes.txt

# Exclude specific nodes
kubectl reboot --all --exclude-nodes node1,node2

Advanced Examples

# Custom SSH configuration
kubectl reboot --ssh-user ubuntu --ssh-opts "-i ~/.ssh/my-key" node1

# Custom reboot command
kubectl reboot --reboot-cmd "sudo shutdown -r now" node1

# Custom timeouts
kubectl reboot --timeout-ready 300 --timeout-bootid 600 node1

# Custom SSH host template (useful for cloud providers)
kubectl reboot --ssh-host-template "%s.us-west-2.compute.internal" node1

# Allow uncordon without reboot verification
kubectl reboot --allow-uncordon-without-reboot node1

Configuration Options

Flag Short Default Description
--all false Restart all nodes in the cluster
--exclude-control-plane false Exclude control plane nodes when using --all
--exclude-nodes Comma-separated node names to exclude
--file -f Read node names from file (one per line)
--ssh-user -u root SSH username
--ssh-opts See below SSH connection options
--ssh-host-template %s SSH host template (e.g., %s.example.com)
--reboot-cmd See below Command to execute for reboot
--timeout-ready 180 Timeout waiting for node to become ready (seconds)
--timeout-bootid 300 Timeout waiting for boot ID change (seconds)
--poll-interval 10 Polling interval (seconds)
--allow-uncordon-without-reboot false Allow uncordon even if reboot verification fails
--dry-run false Show what would be done without executing
--context Kubeconfig context to use
--kubeconfig $KUBECONFIG Path to kubeconfig file

Default Values

  • SSH Options: -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=10
  • Reboot Command: sudo systemctl reboot || sudo reboot
  • Drain Arguments: --ignore-daemonsets --grace-period=30 --timeout=10m --delete-emptydir-data

How It Works

  1. Cordon: Mark the node as unschedulable to prevent new pods
  2. Drain: Evict all non-system pods from the node
  3. Reboot: Execute reboot command via SSH
  4. Wait: Monitor Boot ID change to verify reboot completion
  5. Ready: Wait for the node to become ready
  6. Uncordon: Mark the node as schedulable again

Prerequisites

  • Kubernetes cluster with SSH access to nodes
  • kubectl configured and authenticated
  • SSH access to target nodes (key-based authentication recommended)
  • Appropriate RBAC permissions for node operations

Required RBAC Permissions

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kubectl-reboot
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "patch"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "delete"]
- apiGroups: ["policy"]
  resources: ["poddisruptionbudgets"]
  verbs: ["get", "list"]
- apiGroups: ["apps"]
  resources: ["daemonsets", "replicasets"]
  verbs: ["get", "list"]

Cloud Provider Examples

AWS EKS

# Using private IPs with bastion host
kubectl reboot --ssh-user ec2-user \
  --ssh-opts "-i ~/.ssh/eks-key.pem -o ProxyCommand='ssh -i ~/.ssh/bastion.pem ec2-user@bastion-host -W %h:%p'" \
  ip-10-0-1-100

# Using public DNS names
kubectl reboot --ssh-user ec2-user \
  --ssh-host-template "%s.us-west-2.compute.amazonaws.com" \
  ip-10-0-1-100

Google GKE

# Using gcloud compute ssh wrapper
kubectl reboot --ssh-user $USER \
  --reboot-cmd "gcloud compute instances reset \$(hostname) --zone=us-central1-a" \
  gke-cluster-default-pool-12345678-abcd

Azure AKS

kubectl reboot --ssh-user azureuser \
  --ssh-host-template "%s.cloudapp.azure.com" \
  aks-nodepool1-12345678-vmss000000

Troubleshooting

Common Issues

  1. SSH Connection Failed

    # Test SSH connectivity first
    ssh -o StrictHostKeyChecking=no -o BatchMode=yes user@node
    
    # Check SSH key permissions
    chmod 600 ~/.ssh/your-key.pem
  2. Boot ID Not Changing

    # Use flag to skip boot verification if needed
    kubectl reboot --allow-uncordon-without-reboot node1
  3. Pod Eviction Timeout

    # Check for PodDisruptionBudgets that might block eviction
    kubectl get pdb --all-namespaces
  4. RBAC Permission Denied

    # Check your permissions
    kubectl auth can-i get nodes
    kubectl auth can-i patch nodes
    kubectl auth can-i delete pods

Logs and Debugging

The plugin provides detailed logging with emojis for better visibility:

  • πŸš€ Operation start
  • πŸ“‹ Configuration details
  • βœ… Successful operations
  • ⚠️ Warnings
  • ❌ Errors
  • πŸ§ͺ Dry-run operations

Development

Building

# Build for current platform
make build

# Build for all platforms
make release

# Run tests
make test

# Format and vet code
make check

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests if applicable
  5. Run make check to ensure code quality
  6. Commit your changes (git commit -am 'Add amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

Security Considerations

  • Use key-based SSH authentication instead of passwords
  • Limit SSH access to specific users and source IPs
  • Consider using SSH bastion hosts for additional security
  • Review and understand the reboot commands being executed
  • Test in non-production environments first

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • kubectl - The Kubernetes command-line tool
  • Krew - The kubectl plugin manager
  • Kubernetes - The container orchestration platform

About

A kubectl plugin that safely restarts Kubernetes nodes by draining pods, rebooting via SSH, verifying the reboot, and uncordoning the nodes.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published