In large-scale Puppet deployments with multiple Puppet Masters, nodes may be dynamically reassigned between masters. PuppetDB isn't optimized for per-request lookups and does not provide dynamic routing out of the box.
This project uses HAProxy as an API Gateway with runtime map reloads, to avoid per-request database hits.
todo
- Dynamic Puppet Master assignment: users move nodes between Puppet Masters (
puppet agent -t), without coordination - High lookup frequency: Querying PuppetDB on every API call causes latency spikes and heavy DB load
- Static load balancers cannot handle per‑node master assignment
- HAProxy API Gateway: a single entry point for all requests, routing to the correct master based on a map file
- Map updater: a cron that polls PuppetDB every minute, writes the new mapping file, and uses HAProxy’s Runtime API to atomically reload the map
- Zero per-request DB hits: in memory map lookup is O(1)
- Runtime reloads: update mapping without process restarts
- Horizontal scalability: multiple HAProxy instances can share the same map file
- Built‑in monitoring: HAProxy provides dashboards and can be integrated with Prometheus
- No node side changes: clients only need to set a HTTP header (e.g.,
X-Node-Name)
- Map updates occur on every minute
- If a node moves to a different master, requests may still route to the old master until the next polling cycle updates the map (to be fixed with redispatch)
- Manual map reloads can be triggered via API if immediate consistency is required
- limits concurrent connections HAProxy accepts and manages
- each consumes a file descriptor and some memory
- hard limit is
ulimit -n, additional clients go to the kernel TCP accept queue
- maxconn limits concurrent connections to a backend server
- once the limit is reached, the request gets queued in the server's queue
- internal in memory FIFO queue for servers
- returns HTTP 503 if both the queue is full
- HTTP requests waiting in the queue can be prioritized
- Question can this be configured at runtime?
WIP todo: add redispatch to config, current version will FAIL if map isn't up-to-date
frontend gateway
use-server puppet %{var(txn.node)}
backend puppet
option redispatch
server master1 master1:3001 check maxconn 1000 maxqueue 500
server master2 master2:3002 check backup
To research: running HAProxy in multi thread / mutli process mode
- Current implementation uses short polling (every minute) to update the map
- Long polling PuppetDB for changes to the node list can give near real-time update (probably better to use redispatch with short polling)
| Approach | Pros | Cons |
|---|---|---|
| Query PuppetDB per request | Always up-to-date | High latency, DB load, scaling issues |
| NGINX | Advanced HTTP features | New workers have to be created on every change, no built‑in TCP queue |
| Service Registry (e.g., Consul, Eureka) | Always up-to-date | Additional infrastructure, complexity |
| Custom Gateway | Tailored | Significant development effort and maintenance overhead |
-
Build
docker-compose up --build -d
-
Verify mapping:
docker exec haproxy-server cat /usr/local/etc/haproxy/node-master.map -
Test routing:
curl -s -H "X-Node-Name: node1.example.com" http://localhost:8080/ curl -s -H "X-Node-Name: node2.example.com" http://localhost:8080/ curl -s -H "X-Node-Name: node3.example.com" http://localhost:8080/ curl -s -H "X-Node-Name: node4.example.com" http://localhost:8080/
-
Wait for cron to reload map:
Alternatively, trigger manual reload
docker exec map-updater /usr/local/bin/update-map.sh -
Verify mapping and Test routing again:
docker exec haproxy-server cat /usr/local/etc/haproxy/node-master.mapcurl -s -H "X-Node-Name: node1.example.com" http://localhost:8080/ curl -s -H "X-Node-Name: node2.example.com" http://localhost:8080/ curl -s -H "X-Node-Name: node3.example.com" http://localhost:8080/ curl -s -H "X-Node-Name: node4.example.com" http://localhost:8080/
Username: admin
Password: admin
- Add redispatch
- Add mock nodes
- Have a registration endpoint for nodes
- Have endpoint for command execution on a node
- Have endpoint to get all nodes
- Mock database should be updated by masters instead of setInterval
- Explore HTTP Rewrites
- Add Prometheus metrics
