Skip to content

Improve Kubernetes Logs Collection Experience #25251

@TylerHelmuth

Description

@TylerHelmuth

Component(s)

receiver/filelog

Describe the issue you're reporting

Problem Statement

The Collector's solution for collecting logs from Kubernetes is the Filelog Receiver and it can handle collection of Kubernetes Logs for most scenarios. But Filelog Receiver was created to be a generic solution and therefore does not take advantage of useful Kubernetes assumptions out-of-the-box.

At the moment to collector logs with the Filelog receiver the recommended configuration is:

receivers:
filelog:
exclude: []
include:
- /var/log/pods/*/*/*.log
include_file_name: false
include_file_path: true
operators:
- id: get-format
  routes:
  - expr: body matches "^\\{"
    output: parser-docker
  - expr: body matches "^[^ Z]+ "
    output: parser-crio
  - expr: body matches "^[^ Z]+Z"
    output: parser-containerd
  type: router
- id: parser-crio
  regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
  timestamp:
    layout: 2006-01-02T15:04:05.999999999Z07:00
    layout_type: gotime
    parse_from: attributes.time
  type: regex_parser
- combine_field: attributes.log
  combine_with: ""
  id: crio-recombine
  is_last_entry: attributes.logtag == 'F'
  max_log_size: 102400
  output: extract_metadata_from_filepath
  source_identifier: attributes["log.file.path"]
  type: recombine
- id: parser-containerd
  regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
  timestamp:
    layout: '%Y-%m-%dT%H:%M:%S.%LZ'
    parse_from: attributes.time
  type: regex_parser
- combine_field: attributes.log
  combine_with: ""
  id: containerd-recombine
  is_last_entry: attributes.logtag == 'F'
  max_log_size: 102400
  output: extract_metadata_from_filepath
  source_identifier: attributes["log.file.path"]
  type: recombine
- id: parser-docker
  output: extract_metadata_from_filepath
  timestamp:
    layout: '%Y-%m-%dT%H:%M:%S.%LZ'
    parse_from: attributes.time
  type: json_parser
- id: extract_metadata_from_filepath
  parse_from: attributes["log.file.path"]
  regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
  type: regex_parser
- from: attributes.stream
  to: attributes["log.iostream"]
  type: move
- from: attributes.container_name
  to: resource["k8s.container.name"]
  type: move
- from: attributes.namespace
  to: resource["k8s.namespace.name"]
  type: move
- from: attributes.pod_name
  to: resource["k8s.pod.name"]
  type: move
- from: attributes.restart_count
  to: resource["k8s.container.restart_count"]
  type: move
- from: attributes.uid
  to: resource["k8s.pod.uid"]
  type: move
- from: attributes.log
  to: body
  type: move
start_at: beginning

To a new user that is a lot of scary configuration that will take time to comprehend, and they also probably don't want to comprehend it, yet it has to live in their configuration. In the Collector Helm chart we hide this complexity behind a preset, but it can't handle all situations.

Here are a couple experiences I'd like to improve:

  • "Multi-tenancy" support. The Filelog receiver is good at gathering all the logs at once and sending them down a pipeline, but it is not setup to collect logs for a specific namespace or pod and send that down a specific pipeline. To meet this requirement you must configure multiple instances of the filelogreceiver and duplicate all of the configuration, only changing the include section as needed.
    • I believe multiple instances of the receiver are needed, but it would be nice to reduce the amount of duplicate configuration. It would be nice to be able to quickly configure a k8s-specific filelogreceiver and add it to the appropriate pipeline
  • No support for label selectors. Although you can specify specific namespaces/pods/containers to collect by taking advantage of the log path, using label selectors to identify an object is a common practice in k8s.

Some of the solution might be in the helm chart and some might be in the file log receiver itself. It is also possible this spawns a new k8s-specific receiver that is using stanza behind the scenes.

Ultimately, I am looking to improve the "easy path" solution for most users in Kubernetes. I want to make it easier for user to collect logs for a specific subset of all the logs in the cluster and for it to be easier to configure multiple instances of the receiver to support different destinations. Packaging up all the Kubernetes assumptions into something like:

receivers:
  filelog:
    forKubernetes: true
    include: 
      - /var/log/pods/my-namespace/*/*.log

or

receivers:
  filelog/api-server:
    forKubernetes: true
    labelSelectors:
      - component=kube-apiserver,tier=control-plane

would be great.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions