-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Component(s)
receiver/filelog
Describe the issue you're reporting
Problem Statement
The Collector's solution for collecting logs from Kubernetes is the Filelog Receiver and it can handle collection of Kubernetes Logs for most scenarios. But Filelog Receiver was created to be a generic solution and therefore does not take advantage of useful Kubernetes assumptions out-of-the-box.
At the moment to collector logs with the Filelog receiver the recommended configuration is:
receivers:
filelog:
exclude: []
include:
- /var/log/pods/*/*/*.log
include_file_name: false
include_file_path: true
operators:
- id: get-format
routes:
- expr: body matches "^\\{"
output: parser-docker
- expr: body matches "^[^ Z]+ "
output: parser-crio
- expr: body matches "^[^ Z]+Z"
output: parser-containerd
type: router
- id: parser-crio
regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
timestamp:
layout: 2006-01-02T15:04:05.999999999Z07:00
layout_type: gotime
parse_from: attributes.time
type: regex_parser
- combine_field: attributes.log
combine_with: ""
id: crio-recombine
is_last_entry: attributes.logtag == 'F'
max_log_size: 102400
output: extract_metadata_from_filepath
source_identifier: attributes["log.file.path"]
type: recombine
- id: parser-containerd
regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
timestamp:
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
parse_from: attributes.time
type: regex_parser
- combine_field: attributes.log
combine_with: ""
id: containerd-recombine
is_last_entry: attributes.logtag == 'F'
max_log_size: 102400
output: extract_metadata_from_filepath
source_identifier: attributes["log.file.path"]
type: recombine
- id: parser-docker
output: extract_metadata_from_filepath
timestamp:
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
parse_from: attributes.time
type: json_parser
- id: extract_metadata_from_filepath
parse_from: attributes["log.file.path"]
regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
type: regex_parser
- from: attributes.stream
to: attributes["log.iostream"]
type: move
- from: attributes.container_name
to: resource["k8s.container.name"]
type: move
- from: attributes.namespace
to: resource["k8s.namespace.name"]
type: move
- from: attributes.pod_name
to: resource["k8s.pod.name"]
type: move
- from: attributes.restart_count
to: resource["k8s.container.restart_count"]
type: move
- from: attributes.uid
to: resource["k8s.pod.uid"]
type: move
- from: attributes.log
to: body
type: move
start_at: beginning
To a new user that is a lot of scary configuration that will take time to comprehend, and they also probably don't want to comprehend it, yet it has to live in their configuration. In the Collector Helm chart we hide this complexity behind a preset, but it can't handle all situations.
Here are a couple experiences I'd like to improve:
- "Multi-tenancy" support. The Filelog receiver is good at gathering all the logs at once and sending them down a pipeline, but it is not setup to collect logs for a specific namespace or pod and send that down a specific pipeline. To meet this requirement you must configure multiple instances of the filelogreceiver and duplicate all of the configuration, only changing the
include
section as needed.- I believe multiple instances of the receiver are needed, but it would be nice to reduce the amount of duplicate configuration. It would be nice to be able to quickly configure a k8s-specific filelogreceiver and add it to the appropriate pipeline
- No support for label selectors. Although you can specify specific namespaces/pods/containers to collect by taking advantage of the log path, using label selectors to identify an object is a common practice in k8s.
Some of the solution might be in the helm chart and some might be in the file log receiver itself. It is also possible this spawns a new k8s-specific receiver that is using stanza behind the scenes.
Ultimately, I am looking to improve the "easy path" solution for most users in Kubernetes. I want to make it easier for user to collect logs for a specific subset of all the logs in the cluster and for it to be easier to configure multiple instances of the receiver to support different destinations. Packaging up all the Kubernetes assumptions into something like:
receivers:
filelog:
forKubernetes: true
include:
- /var/log/pods/my-namespace/*/*.log
or
receivers:
filelog/api-server:
forKubernetes: true
labelSelectors:
- component=kube-apiserver,tier=control-plane
would be great.