A reusable sets of chaos tests for solo test network. It can be used to test the resilience and stability of the solo network by introducing various types of failures and disruptions.
solo-chaos/
├── chaos/ # Chaos experiments and taskfiles
│ ├── Taskfile.yml # Main chaos taskfile (renamed for independence)
│ ├── Taskfile.chaos.network.yml # Network chaos tasks
│ ├── Taskfile.chaos.pod.yml # Pod chaos tasks
│ ├── network/ # Network chaos experiment configs
│ │ ├── consensus-node-bandwidth.yml
│ │ ├── netem-800ms.yml
│ │ ├── netem-ap-melbourne.yml
│ │ ├── netem-eu-london.yml
│ │ └── netem-us-ohio.yml
│ └── pod/ # Pod chaos experiment configs
│ ├── consensus-node-failure.yml
│ └── consensus-node-kill.yml
├── dev/ # Development tools and configs
│ ├── taskfile/ # Task configuration files
│ └── k8s/ # Kubernetes manifests
└── cmd/ # Go applications
└── hammer/ # Chaos testing tool
- Docker Desktop (macOS: ensure at least 32GB RAM and 8 CPU cores configured)
- Helm
- Kubectl
- k9s
- Kind
- Task (install via Homebrew:
brew install go-task
) - solo
jq
(install via Homebrew:brew install jq
)
task setup
task deploy-network
task install-chaos-mesh
Run the chaos test to kill one of the nodes:
task chaos:pod:consensus-pod-kill NODE_NAMES=node5
Run the chaos test to trigger pod failure for some of the nodes:
task chaos:pod:consensus-pod-failure NODE_NAMES=node5 DURATION=60s
Run the chaos test to limit network bandwidth:
task chaos:network:consensus-network-bandwidth NODE_NAMES=node1 RATE=1gbps
Run network emulation chaos tests to simulate realistic global network conditions:
task chaos:network:consensus-network-netem
This applies comprehensive network latency emulation with proper round-trip times (RTT) between global regions:
- us ↔ us: 20ms RTT (10ms one-way)
- us ↔ eu: 100ms RTT (50ms one-way)
- us ↔ ap: 200ms RTT (100ms one-way)
- eu ↔ eu: 20ms RTT (10ms one-way)
- eu ↔ ap: 300ms RTT (150ms one-way)
- ap ↔ ap: 20ms RTT (10ms one-way)
- High latency test: 800ms delay
Each regional configuration file contains multiple NetworkChaos resources using the target
attribute:
netem-us-ohio.yml
: US intra-region + US→EU + US→AP latenciesnetem-eu-london.yml
: EU intra-region + EU→US + EU→AP latenciesnetem-ap-melbourne.yml
: AP intra-region + AP→US + AP→EU latenciesnetem-800ms.yml
: High latency testing
The task creates 10 NetworkChaos resources with fixed names (no UUIDs):
US region resources:
solo-chaos-network-netem-us-to-us
(10ms)solo-chaos-network-netem-us-to-eu
(50ms)solo-chaos-network-netem-us-to-ap
(100ms)
EU region resources:
solo-chaos-network-netem-eu-to-eu
(10ms)solo-chaos-network-netem-eu-to-us
(50ms)solo-chaos-network-netem-eu-to-ap
(150ms)
AP region resources:
solo-chaos-network-netem-ap-to-ap
(10ms)solo-chaos-network-netem-ap-to-us
(100ms)solo-chaos-network-netem-ap-to-eu
(150ms)
Test resource:
solo-chaos-network-netem-800ms
(800ms)
# Cross-region latency (us-to-eu)
spec:
selector:
labelSelectors:
'solo.hedera.com/region': 'us'
target:
selector:
labelSelectors:
'solo.hedera.com/region': 'eu'
delay:
latency: '50ms' # One-way latency for 100ms RTT
Note: Using fixed resource names ensures that subsequent runs replace existing resources rather than creating duplicates, preventing latency accumulation.
Deploy diagnostic pods to test network connectivity and analyze chaos experiment effects:
# Deploy cluster diagnostics pod (defaults to 'us' region)
task chaos:network:deploy-cluster-diagnostics
# Deploy cluster diagnostics pod with specific region
task chaos:network:deploy-cluster-diagnostics REGION=eu
task chaos:network:deploy-cluster-diagnostics REGION=ap
task chaos:network:deploy-cluster-diagnostics REGION=us
# Exec into the diagnostics pod for interactive testing
task chaos:network:exec-cluster-diagnostics
# Clean up diagnostics pod when done
task chaos:network:cleanup-cluster-diagnostics
The diagnostics pod includes useful network tools:
- Connectivity testing:
ping
,traceroute
,netcat
- Performance testing:
iperf3
for bandwidth/latency measurement - Packet analysis:
tcpdump
for network debugging - DNS testing:
dig
,nslookup
for DNS resolution - General utilities:
curl
,jq
for API testing
Example usage inside the diagnostics pod:
# Test connectivity to consensus nodes (update service names based on your Solo setup)
ping network-node1.solo.svc.cluster.local
# Measure latency with iperf3
iperf3 -c network-node2.solo.svc.cluster.local
# Check active NetworkChaos experiments
kubectl get networkchaos -n chaos-mesh
# Test HTTP connectivity
curl -I http://network-node3.solo.svc.cluster.local:8080
Region Configuration: The cluster-diagnostics pod is deployed in the solo namespace with a configurable solo.hedera.com/region
label (defaults to 'us' if not specified). This allows you to test network chaos effects from different regional perspectives by deploying the diagnostics pod with the appropriate region label to match your testing scenario.
Run network partition chaos tests to simulate network partitioning between nodes in different regions:
task chaos:network:network-partition-by-region SOURCE_REGION=us TARGET_REGION=eu
Examples of region-based partitioning:
# Partition between US and EU regions
task chaos:network:network-partition-by-region SOURCE_REGION=us TARGET_REGION=eu
# Partition between US and Asia-Pacific regions
task chaos:network:network-partition-by-region SOURCE_REGION=us TARGET_REGION=ap
# Partition between EU and Asia-Pacific regions
task chaos:network:network-partition-by-region SOURCE_REGION=eu TARGET_REGION=ap
This creates bidirectional network partitions between nodes based on their solo.hedera.com/region
labels, simulating scenarios where network connectivity is lost between different geographical regions.
To deploy the image, run:
task build:image
To deploy the solo-chaos-hammer job to your Kubernetes cluster, run:
task deploy-hammer-job
Introduce faults to the network while the hammer job is running. For example, you can kill a node pod (node5) by running:
task chaos:pod:consensus-pod-kill NODE_NAMES=node5
You can run chaos tests independently from the chaos directory. When you're in the chaos/
directory, you'll only see chaos-specific tasks:
# Navigate to chaos directory
cd chaos
# List available chaos tasks (shows only chaos tasks)
task --list
# Run specific chaos tests with simplified names
task pod:consensus-pod-kill NODE_NAMES=node5
task network:consensus-network-netem
task network:network-partition-by-region SOURCE_REGION=us TARGET_REGION=eu
task show-experiment-status NAME=<experiment-name> TYPE=<PodChaos|NetworkChaos>
Expected output when running task --list
from chaos/
directory:
* show-experiment-status: Show the status of the pod chaos experiment
* network:consensus-network-bandwidth: Run Network Chaos experiments (limited bandwidth)
* network:consensus-network-netem: Run Network Chaos experiments (network emulation)
* network:network-partition-by-region: Run Network Chaos partition experiments between regions
* pod:consensus-pod-failure: Run Pod Chaos experiments (failure)
* pod:consensus-pod-kill: Run Pod Chaos experiments (kill)
Run task --list
to see all available tasks:
task setup
- Initialize the environmenttask deploy-network
- Deploy a n-node Solo networktask destroy-network
- Destroy the Solo networktask install-chaos-mesh
- Install Chaos Meshtask uninstall-chaos-mesh
- Uninstall Chaos Mesh
task chaos:pod:consensus-pod-kill
- Kill consensus podstask chaos:pod:consensus-pod-failure
- Cause pod failures
task chaos:network:consensus-network-bandwidth
- Limit network bandwidthtask chaos:network:consensus-network-netem
- Apply network emulation for different latenciestask chaos:network:network-partition-by-region
- Create network partitions between regionstask chaos:network:deploy-cluster-diagnostics
- Deploy cluster diagnostics pod for network testing (supports REGION parameter)task chaos:network:exec-cluster-diagnostics
- Exec into cluster diagnostics podtask chaos:network:cleanup-cluster-diagnostics
- Remove cluster diagnostics pod
task chaos:show-experiment-status
- Show chaos experiment statustask deploy-hammer-job
- Deploy chaos testing jobtask destroy-hammer-job
- Remove chaos testing job