pve2otelcol is a program to monitor the VMs running on a Proxmox Virtual Environment (PVE) node, collecting their logs and sending them to an OpenTelemetry collector, trying to be as little intrusive as possible.
This means that no agent is needed on individual VMs: logs are collected by a program running on the PVE node itself.
pve2otelcol can monitor the journald logs of the PVE node and of each running VM, periodically monitoring them for start and stop events.
The logs, connected in JSON format, are parsed and sent to the OpenTelemetry collector where they can be easily routed, parsed, filtered, inspected and visualized directly in Grafana.
This software is in alpha state; ideas for improvements can be discussed on Github; in the same way, any bug report and pull request is welcome.
At the moment it can't monitor Qemu/KVM virtual machines, since running qm exec VMID -- journalctl --follow produces no output to be parsed (it's not a stream like the pct exec VMID -- journalctl --follow that's used to monitor LXC containers).
Just run:
go build .Copy it to a PVE node and run:
./pve2otelcol --verbose --otlp-grpc-url http://collector.address:4317where collector.address:4317 is the address and port of an OpenTelemetry gRPC collector.
A popular collector is Grafana Alloy, which is usually deployed along with Grafana Loki and the Grafana visualizer.
pve2otelcol has numerous other command line options, see ./pve2otelcol --help for more information. The defaults should be reasonable values in most of the cases.
To better integrate it with your PVE node, you can use the provided systemd unit file.
A quick guide, to be run as root (do not forget to edit the pve2otelcol.service beforehand, to point it to your OpenTelemetry collector):
cp pve2otelcol /usr/local/bin/
chmod 755 /usr/local/bin/pve2otelcol
cp goodies/pve2otelcol.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable pve2otelcol.service
systemctl start pve2otelcol.serviceWhile the setup of Alloy and Loki is well outside the scope of this document, here you can find a skeleton configuration file for both of them.
// Sample config for Alloy.
//
// For a full configuration reference, see https://grafana.com/docs/alloy
logging {
level = "warn"
}
prometheus.exporter.unix "default" {
include_exporter_metrics = true
disable_collectors = ["mdadm"]
}
prometheus.scrape "default" {
targets = array.concat(
prometheus.exporter.unix.default.targets,
[{
// Self-collect metrics
job = "alloy",
__address__ = "127.0.0.1:12345",
}],
)
forward_to = []
}
otelcol.receiver.otlp "default" {
grpc {
endpoint = "0.0.0.0:4317"
keepalive {}
}
http {
endpoint = "0.0.0.0:4318"
}
output {
logs = [otelcol.processor.batch.default.input]
}
}
otelcol.processor.batch "default" {
output {
logs = [otelcol.exporter.loki.default.input]
}
}
otelcol.exporter.loki "default" {
forward_to = [loki.write.local.receiver]
}
loki.write "local" {
endpoint {
url = "http://localhost:3100/loki/api/v1/push"
}
}
# Sample config for Loki 3.3.
# For a full configuration reference, see https://grafana.com/docs/loki/latest/configure/
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: warn
grpc_server_max_concurrent_streams: 1000
common:
instance_addr: 127.0.0.1
path_prefix: /var/lib/loki
storage:
filesystem:
chunks_directory: /var/lib/loki/chunks
rules_directory: /var/lib/loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
query_range:
results_cache:
cache:
embedded_cache:
max_size_mb: 100
schema_config:
configs:
- from: 2020-10-20
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
pattern_ingester:
enabled: true
metric_aggregation:
enabled: true
loki_address: localhost:3100
ruler:
alertmanager_url: http://localhost:9093
frontend:
encoding: protobuf
limits_config:
ingestion_rate_mb: 24
ingestion_burst_size_mb: 36After that, you can start pve2otelcol and point it to the Alloy collector and then add the Loki data source in Grafana and begin creating dashboards.
Query: count by(level) (count_over_time({exporter="OTLP"} [1h]))
Click for complete panel JSON
{
"id": 10,
"type": "timeseries",
"title": "Logs per hour by level",
"gridPos": {
"x": 0,
"y": 32,
"h": 8,
"w": 12
},
"fieldConfig": {
"defaults": {
"custom": {
"drawStyle": "line",
"lineInterpolation": "stepBefore",
"barAlignment": 0,
"barWidthFactor": 0.6,
"lineWidth": 1,
"fillOpacity": 25,
"gradientMode": "none",
"spanNulls": false,
"insertNulls": false,
"showPoints": "never",
"pointSize": 5,
"stacking": {
"mode": "normal",
"group": "A"
},
"axisPlacement": "auto",
"axisLabel": "",
"axisColorMode": "text",
"axisBorderShow": false,
"scaleDistribution": {
"type": "linear"
},
"axisCenteredZero": false,
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"thresholdsStyle": {
"mode": "off"
}
},
"color": {
"mode": "palette-classic"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"fieldMinMax": false
},
"overrides": []
},
"pluginVersion": "11.3.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "ce0gjtocsolq8f"
},
"editorMode": "code",
"expr": "count by(level) (count_over_time({exporter=\"OTLP\"} [1h]))",
"legendFormat": "{{.level}}",
"queryType": "range",
"refId": "A",
"step": ""
}
],
"datasource": {
"default": false,
"type": "loki",
"uid": "ce0gjtocsolq8f"
},
"options": {
"tooltip": {
"mode": "multi",
"sort": "desc"
},
"legend": {
"showLegend": true,
"displayMode": "list",
"placement": "bottom",
"calcs": []
}
}
}Query: count by(job) (count_over_time({exporter="OTLP"} [1h]))
Click for complete panel JSON
{
"id": 9,
"type": "timeseries",
"title": "Logs per hour by job",
"gridPos": {
"x": 12,
"y": 32,
"h": 8,
"w": 12
},
"fieldConfig": {
"defaults": {
"custom": {
"drawStyle": "line",
"lineInterpolation": "linear",
"barAlignment": -1,
"barWidthFactor": 0.6,
"lineWidth": 1,
"fillOpacity": 25,
"gradientMode": "none",
"spanNulls": false,
"insertNulls": false,
"showPoints": "auto",
"pointSize": 5,
"stacking": {
"mode": "normal",
"group": "A"
},
"axisPlacement": "auto",
"axisLabel": "",
"axisColorMode": "text",
"axisBorderShow": false,
"scaleDistribution": {
"type": "linear"
},
"axisCenteredZero": false,
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"thresholdsStyle": {
"mode": "off"
},
"lineStyle": {
"fill": "solid"
}
},
"color": {
"mode": "palette-classic"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"pluginVersion": "11.3.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "ce0gjtocsolq8f"
},
"editorMode": "builder",
"expr": "count by(job) (count_over_time({exporter=\"OTLP\"} [1h]))",
"legendFormat": "{{.job}}",
"queryType": "range",
"refId": "A"
}
],
"datasource": {
"default": false,
"type": "loki",
"uid": "ce0gjtocsolq8f"
},
"options": {
"tooltip": {
"mode": "multi",
"sort": "desc"
},
"legend": {
"showLegend": true,
"displayMode": "list",
"placement": "bottom",
"calcs": []
}
}
}Query:
{exporter="OTLP"} | json | __error__=`` | line_format `{{.service_name}} {{.severity}}: {{.body_MESSAGE}}`
Click for complete panel JSON
{
"id": 11,
"type": "logs",
"title": "Last logs",
"gridPos": {
"x": 0,
"y": 40,
"h": 8,
"w": 12
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"pluginVersion": "11.3.0",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "ce0gjtocsolq8f"
},
"editorMode": "builder",
"expr": "{exporter=\"OTLP\"} | json | __error__=`` | line_format `{{.service_name}} {{.severity}}: {{.body_MESSAGE}}`",
"maxLines": 100,
"queryType": "range",
"refId": "A"
}
],
"datasource": {
"default": false,
"type": "loki",
"uid": "ce0gjtocsolq8f"
},
"options": {
"showTime": true,
"showLabels": false,
"showCommonLabels": false,
"wrapLogMessage": false,
"prettifyLogMessage": false,
"enableLogDetails": true,
"dedupStrategy": "none",
"sortOrder": "Descending"
}
}2024 Davide Alberani [email protected]
Released under the Apache 2 license.