Minimal eBPF-backed HTTP syscall and environment variable profiler written in Golang, plus a tiny test service and traffic generator.
It's a PoC in the service of this initiative.
cmd/server: basic HTTP service on port 8080 with/,/healthz,/echo,/slow.cmd/traffic: small Go script that repeatedly hits the service.cmd/profiler: eBPF-powered profiler that attaches to socket syscalls and writes request/response metadata to a local file.bpf/profiler.bpf.c: BPF program (compiled viabpf2goduring build).
- Sets up an eBPF profiler to listen for HTTP events and logs the origin PID, to IP and port, from IP and port, method, data, response code, etc.
- Classifies non-HTTP connections (databases, caches, message buses) using port-based heuristics and protocol fingerprinting
- For each PID found, pulls the environment variables assigned to the process
- Writes the output of each to an output file
Please feel free to make suggestions, either here or in the initiative's Slack discussion.
This project currently only runs on Linux. If you want to run it on a Mac, you'll need a VM. I could not get it working in a Linux container, although that could have something to do with the corporate security profile installed on my machine.
Since it leverages eBPF, I have strong doubts about it working on Windows.
There are probably better/smarter/faster/cooler ways to run this, but the way I pulled it off was to run a Lima VM on my Mac. Note that I'm running on an ARM64 Mac, and I have not tested this on an x86 machine of any sort. Which means I also haven't tested it on a real Linux box.
That said, if you'd like to run this:
- If you're using a VM, SSH into that and clone this repo
- Lima will mount your host machine's home directory as read-only
- But! you need to generate the Go/C bindings for the eBPF functionality.
- So! don't rely on the mounted home directory if you've cloned this to your host machine
- Install the Go toolchain 1.25+
- Make sure you've got an OCI container runtime installed
- Install C libraries (I had to sudo on a lima VM):
sudo apt-get update
# have not tested on x86,
# but I'd imagine you'll have less problems than I did
sudo apt-get install -y --no-install-recommends \
clang llvm make pkg-config libelf-dev zlib1g-dev linux-libc-dev libbpf-dev
sudo rm -rf /var/lib/apt/lists/*- set up necessary symlink:
# me and Claude trying to be arch-agnostic
arch="$(uname -m)" && \
case "${arch}" in \
x86_64) multiarch="x86_64-linux-gnu" ;; \
aarch64|arm64) multiarch="aarch64-linux-gnu" ;; \
*) echo "Unsupported architecture: ${arch}" >&2; exit 1 ;; \
esac && \
ln -sf /usr/include/${multiarch}/asm /usr/include/asm- Set environment variables:
export GOOS=linux
export GOARCH=arm64 # or, ya know, whatever
export CGO_ENABLED=1- Build the profiler:
# Linux only; requires clang/llvm and kernel headers
go mod download
go generate ./pkg/profiler # builds the BPF object via bpf2go (emits files under pkg/profiler with tag ebpf_build)
go build -tags ebpf_build ./cmd/profiler # profiler binary (uses generated bindings)These environment variables configure the profiler itself:
OUTPUT_PATH=/var/log/ebpf_http_profiler.log- File path for HTTP event logs (JSON format)ENV_OUTPUT_PATH=/var/log/ebpf_http_env.yaml- File path for environment variable logs (YAML format)SERVICE_MAP_PATH=""- File path for service integration map (YAML format). If not set, service map is disabled.ENV_PREFIX_LIST=""- Comma-separated list of environment variable key prefixes to include (case-sensitive). If not set, all environment variables are collected.ADI_PROFILE_ALLOWED=""- Comma-separated list of ADI_PROFILE values to profile (see Opt-In Profiling below). If not set, all processes withADI_PROFILEset (any value) will be profiled.CONTAINERD_SOCKET=""- Path to containerd or Docker socket. If not set, container metadata enrichment is disabled.- For Docker:
/var/run/docker.sock(probably?) - For rootless nerdctl:
/run/user/$UID/containerd/containerd.sock - For rootful nerdctl/containerd:
/run/containerd/containerd.sock - For rootless podman:
/run/user/$UID/podman/podman.sock - For rootful podman:
/run/podman/podman.sock
- For Docker:
CONTAINERD_NAMESPACE=default- Containerd namespace to use (nerdctl typically usesdefault)
The profiler uses an opt-in system to control which processes are profiled. Target processes (the ones being profiled) must set specific environment variables:
Target Process Environment Variables:
ADI_PROFILE=<environment>- Required for profiling. Indicates the process opts into profiling. The value should indicate the environment (e.g.,local,dev,staging,prod).ADI_PROFILE_NAME=<name>- Optional. A human-readable name for the service or process instance. This value will be included in the YAML output asadi_profile_namefor easier identification.ADI_PROFILE_DISABLED=1- Override to disable profiling. If set, the process will not be profiled even ifADI_PROFILEis present.
Profiling Logic:
A process is profiled only if:
ADI_PROFILE_DISABLEDis not set to1ADI_PROFILEis set- If
ADI_PROFILE_ALLOWEDis set on the profiler, theADI_PROFILEvalue must be in the allowed list
Default Behavior:
- If
ADI_PROFILE_ALLOWEDis not set: All processes withADI_PROFILEset (any value) are profiled - If
ADI_PROFILE_ALLOWEDis set: Only processes whoseADI_PROFILEvalue matches one in the list are profiled - Processes without
ADI_PROFILEare never profiled
Example Scenarios:
| Target Process Has | Profiler Config | Result |
|---|---|---|
ADI_PROFILE=local |
ADI_PROFILE_ALLOWED="" (not set) |
✅ Profiled |
ADI_PROFILE=local |
ADI_PROFILE_ALLOWED="local,dev" |
✅ Profiled |
ADI_PROFILE=prod |
ADI_PROFILE_ALLOWED="local,dev" |
❌ Not profiled |
ADI_PROFILE=localADI_PROFILE_DISABLED=1 |
ADI_PROFILE_ALLOWED="local" |
❌ Not profiled (override) |
| (no ADI_PROFILE) | ADI_PROFILE_ALLOWED="local" |
❌ Not profiled |
Step 1: Start the profiler
Profile all processes with ADI_PROFILE set (any value), get the full environment variable map for each process:
sudo OUTPUT_PATH="/some/path/ebpf_http_profiler.log" \
ENV_OUTPUT_PATH="/some/path/ebpf_env_profiler.yaml" \
./profilerProfile only specific processes that have ADI_PROFILE=local or ADI_PROFILE=dev set, still get the full environment variable map for each process:
sudo OUTPUT_PATH="/some/path/ebpf_http_profiler.log" \
ENV_OUTPUT_PATH="/some/path/ebpf_env_profiler.yaml" \
ADI_PROFILE_ALLOWED="local,dev" \
./profilerProfile with environment variable filtering:
sudo OUTPUT_PATH="/some/path/ebpf_http_profiler.log" \
ENV_OUTPUT_PATH="/some/path/ebpf_env_profiler.yaml" \
ENV_PREFIX_LIST="REVIEWS_,RATINGS_,MONGO_" \
ADI_PROFILE_ALLOWED="local,dev,staging" \
./profilerProfile with container metadata enrichment (service-to-service mapping):
# using nerdctl rootless as an example for the container enrichment flags
sudo OUTPUT_PATH="/some/path/ebpf_http_profiler.log" \
ENV_OUTPUT_PATH="/some/path/ebpf_env_profiler.yaml" \
SERVICE_MAP_PATH="/some/path/ebpf_service_map.yaml" \
CONTAINERD_SOCKET="$XDG_RUNTIME_DIR/containerd/containerd.sock" \
CONTAINERD_NAMESPACE="default" \
ADI_PROFILE_ALLOWED="local,dev" \
./profilerStep 2: Start services with opt-in flag
The included docker-compose.yml sets up a demo microservices architecture with the following profiled services (all have ADI_PROFILE=local):
- http-service: HTTP server that publishes request info to NATS
- request-logger: Subscribes to NATS and stores requests in Redis
- traffic-generator: Generates HTTP requests and database/cache operations
And supporting infrastructure (not profiled):
- postgres: PostgreSQL database
- redis: Redis cache
- nats-server: NATS message broker
Run them all with:
docker compose up -d
# podman compose up -d
# nerdctl compose up -dWhat Gets Profiled:
The profiler captures all HTTP traffic and non-HTTP connections from processes that meet the opt-in criteria:
-
HTTP endpoints (in service map
endpointsarray):traffic-generator→http-service(GET /, GET /healthz, POST /echo, GET /slow)
-
Database/Cache/Message Bus connections (in service map
connectionsarray):traffic-generator→ PostgreSQL (port 5432, category: database)traffic-generator→ Redis (port 6379, category: cache)http-service→ NATS (port 4222, category: message_bus)request-logger→ NATS (port 4222, category: message_bus)request-logger→ Redis (port 6379, category: cache)
The profiler also collects environment variables from each process making network calls. As soon as a new PID is observed, the profiler reads /proc/<pid>/environ and writes the results to a separate YAML file.
You can run the profiler first, and it'll hang out waiting for any network traffic to arrive via syscall. Or, if you start it while processes are sending traffic, it will profile for as long as it's running.
I've got another repo that puts the Istio Bookinfo demo into a docker-compose file.
To profile the Bookinfo services:
Step 1: Start the profiler with environment-specific filtering and container metadata
sudo OUTPUT_PATH="/home/lima.linux/http-profiler/output/ebpf_http_profiler.log" \
ENV_OUTPUT_PATH="/home/lima.linux/http-profiler/output/ebpf_env_profiler.yaml" \
SERVICE_MAP_PATH="/home/lima.linux/http-profiler/output/ebpf_service_map.yaml" \
ENV_PREFIX_LIST="REVIEWS_,RATINGS_,MONGO_,DETAILS_" \
ADI_PROFILE_ALLOWED="local,dev" \
CONTAINERD_SOCKET="$XDG_RUNTIME_DIR/containerd/containerd.sock" \
CONTAINERD_NAMESPACE="default" \
./profilerStep 2: Start the services and run traffic
From the other repo's project root:
docker compose up -d
# podman compose up -d
# nerdctl compose up -d
./scripts/run-traffic-gen.shThe profiler will capture HTTP traffic and environment variables from all services that have ADI_PROFILE=local or ADI_PROFILE=dev set. Because you specified ENV_PREFIX_LIST, you'll only see the filtered environment variables (not the full OCI runtime environment). Try it without ENV_PREFIX_LIST to see the full environment variable firehose.
Note that the NodeJS traffic generator app was not profiled at all, due to the fact that it does not have ADI_PROFILE set.
JSON lines with syscall-derived metadata and parsed HTTP fields:
{
"timestamp": "2024-04-08T18:24:10.123456789Z",
"pid": 1234,
"comm": "traffic-generat",
"cmdline": "/bin/traffic-generator",
"direction": "send",
"source_ip": "127.0.0.1",
"source_port": 54321,
"dest_ip": "127.0.0.1",
"dest_port": 8080,
"bytes": 89,
"method": "GET",
"url": "/echo",
"body": "{\"message\":\"hello\"}",
"headers": {
"Host": "127.0.0.1:8080",
"User-Agent": "Go-http-client/1.1",
"Content-Type": "application/json"
},
"raw_payload": "GET /echo HTTP/1.1\r\nHost: ...",
"source_container": {
"service": "productpage",
"image": "myorg/productpage:1.0.0",
"container_id": "abc123def456...",
"container_name": "productpage-1"
},
"destination_container": {
"service": "reviews",
"image": "myorg/reviews:2.1.0",
"container_id": "789xyz012...",
"container_name": "reviews-1"
},
"destination_type": "container"
}Fields include parsed HTTP method, URL, status code (for responses), headers, request/response bodies, plus the complete raw payload from the syscalls.
In addition to HTTP traffic, the profiler also classifies and logs non-HTTP connections to databases, caches, and message buses:
{
"timestamp": "2024-04-08T18:24:10.123456789Z",
"event_type": "connection",
"pid": 1234,
"comm": "traffic-generat",
"cmdline": "/bin/traffic-generator",
"direction": "send",
"source_ip": "10.4.2.60",
"source_port": 38338,
"dest_ip": "10.4.2.59",
"dest_port": 5432,
"protocol": "postgres",
"category": "database",
"confidence": 90,
"detection_reason": "port 5432, valid Postgres startup/SSLRequest header"
}Connection events include:
event_type: Always"connection"for non-HTTP trafficprotocol: Detected protocol (e.g.,postgres,mysql,redis,kafka)category: High-level classification (database,cache, ormessage_bus)confidence: Detection confidence score (0-100)detection_reason: Explanation of how the protocol was identified
Supported Protocols and Ports:
| Category | Protocol | Default Ports |
|---|---|---|
| database | PostgreSQL | 5432 |
| database | MySQL/MariaDB | 3306 |
| database | MongoDB | 27017 |
| database | MSSQL (TDS) | 1433 |
| cache | Redis | 6379, 26379 |
| cache | Memcached | 11211 |
| message_bus | Kafka | 9092, 19092, 29092, 9093 |
| message_bus | AMQP/RabbitMQ | 5672, 5671 |
| message_bus | NATS | 4222, 6222 |
Classification uses a combination of port-based heuristics and protocol fingerprinting on the first bytes of payload. When payload inspection confirms the protocol, confidence is high (90). When only port matching is available (e.g., TLS connections), confidence is medium (60).
When CONTAINERD_SOCKET is configured, HTTP events are enriched with container metadata:
source_container: Metadata about the container that sent the request (the requester)service: Docker Compose service name (fromcom.docker.compose.servicelabel)image: Container image with tagcontainer_id: Full container IDcontainer_name: Human-readable container name
destination_container: Metadata about the container that received the request (the responder)- Same fields as
source_container - Will be
nullfor external destinations
- Same fields as
destination_type: Either"container"(for container-to-container calls) or"external"(for calls outside the container runtime)
How it works:
- For
"send"events: Source resolved from PID (sender), destination from IP (receiver) - For
"recv"events: Source resolved from IP (sender), destination from PID (receiver) - This ensures
source_containeralways means "who sent it" anddestination_containeralways means "who received it"
Note: Some recv events may have source_ip: "invalid IP" when socket peer information isn't available. These events will only have destination_container populated. Full service-to-service mapping is captured in the corresponding send events.
When SERVICE_MAP_PATH is configured, a service map is maintained that tracks each profiled service's outbound HTTP endpoints and non-HTTP connections (databases, caches, message buses):
generated_at: "2024-04-08T18:30:00.000000000Z"
services:
- name: productpage
image: docker.io/istio/examples-bookinfo-productpage-v1:1.20.1
endpoints:
- destination: reviews
destination_type: container
method: GET
path: /reviews/0
request_schema: null
response_schema:
clustername: string
id: string
podname: string
reviews:
- reviewer: string
text: string
first_seen: "2024-04-08T18:24:10.000000000Z"
last_seen: "2024-04-08T18:30:00.000000000Z"
count: 150
connections:
- destination: postgres
destination_type: container
protocol: postgres
category: database
port: 5432
confidence: 90
reason: port 5432, valid Postgres startup/SSLRequest header
first_seen: "2024-04-08T18:24:10.000000000Z"
last_seen: "2024-04-08T18:30:00.000000000Z"
count: 1
- destination: redis
destination_type: container
protocol: redis
category: cache
port: 6379
confidence: 90
reason: redis RESP array
first_seen: "2024-04-08T18:24:10.000000000Z"
last_seen: "2024-04-08T18:30:00.000000000Z"
count: 1
first_seen: "2024-04-08T18:24:10.000000000Z"
last_seen: "2024-04-08T18:30:00.000000000Z"
- name: reviews
image: docker.io/istio/examples-bookinfo-reviews-v1:1.20.1
endpoints:
- destination: ratings
destination_type: container
method: GET
path: /ratings/0
request_schema: null
response_schema:
rating: number
first_seen: "2024-04-08T18:25:00.000000000Z"
last_seen: "2024-04-08T18:29:00.000000000Z"
count: 10
first_seen: "2024-04-08T18:25:00.000000000Z"
last_seen: "2024-04-08T18:29:00.000000000Z"Service Map Structure:
The map is organized by source service, with each service containing:
name: Service name (from Docker Compose label orADI_PROFILE_NAME)image: Container image with tag (when container metadata is available)endpoints: HTTP endpoints called by this serviceconnections: Non-HTTP connections (databases, caches, message buses) made by this servicefirst_seen/last_seen: Timestamps of activity
Endpoint fields:
destination: Target service namedestination_type:"container"or"external"method/path: HTTP method and URL pathrequest_schema/response_schema: JSON structure (keys and types) without valuescount: Number of times this endpoint was called
Connection fields:
destination: Target service/host namedestination_type:"container"or"external"protocol: Detected protocol (e.g.,postgres,redis,kafka)category:"database","cache", or"message_bus"port: Remote port numberconfidence: Detection confidence (0-100)reason: Explanation of protocol detectioncount: Number of connection events observed
How it works:
- Each unique
destination + method + pathcombination is tracked as a separate endpoint - Request/response correlation uses source port matching to pair requests with their responses
- JSON schemas are extracted showing structure (keys and types) without values
- Schema variants are preserved: if an endpoint returns different response shapes, each is tracked separately
- Non-JSON bodies are marked as
"non-json", empty bodies asnull - Non-HTTP connections are classified on first payload using port and protocol fingerprinting
- File is written with a 2-second debounce to coalesce rapid updates
- On SIGINT/SIGTERM, the map is flushed to disk before exit
Multi-document YAML with one document per PID:
---
adi_profile_pid: 12345
adi_profile_match: "local"
adi_profile_name: "reviews"
adi_profile_cmdline: "/usr/bin/python3 /app/server.py --port 9080"
adi_profile_env:
PATH: "/usr/local/bin:/usr/bin:/bin"
HOME: "/home/user"
REVIEWS_SERVICE_URL: "http://reviews:9080"
RATINGS_HOSTNAME: "ratings"
---
adi_profile_pid: 67890
adi_profile_match: "staging"
adi_profile_cmdline: "./ratings-service"
adi_profile_env:
MONGO_HOST: "mongodb://db:27017"
MONGO_DATABASE: "bookinfo"
---
adi_profile_pid: 99999
adi_profile_match: "local"
adi_profile_cmdline: "/app/service --config /etc/config.yaml"
error: "open /proc/99999/environ: no such file or directory"Each document includes:
adi_profile_pid: The process IDadi_profile_match: The value of theADI_PROFILEenvironment variable that qualified this process for profilingadi_profile_name: (Optional) The value ofADI_PROFILE_NAMEif set on the target process. Useful for identifying specific service instances.adi_profile_cmdline: The command line used to start the process (from/proc/<pid>/cmdline)adi_profile_env: Key-value pairs of environment variables (filtered byENV_PREFIX_LISTif specified)error: Error message if the process exits before the profiler can read its environment
Notes:
- When
ENV_PREFIX_LISTis used, only matching environment variables are included inadi_profile_env(PIDs may have emptyadi_profile_env: {}if no variables match) adi_profile_nameonly appears if the target process hasADI_PROFILE_NAMEset- All documents are separated by
---for proper YAML multi-document format