Improve Kubernetes Logs Collection Experience

### Component(s)

receiver/filelog

### Describe the issue you're reporting

## Problem Statement

The Collector's solution for collecting logs from Kubernetes is the [Filelog Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver) and it can handle collection of Kubernetes Logs for most scenarios. But Filelog Receiver was created to be a generic solution and therefore does not take advantage of useful Kubernetes assumptions out-of-the-box.

At the moment to collector logs with the Filelog receiver the recommended configuration is:

```yaml
receivers:
filelog:
exclude: []
include:
- /var/log/pods/*/*/*.log
include_file_name: false
include_file_path: true
operators:
- id: get-format
  routes:
  - expr: body matches "^\\{"
    output: parser-docker
  - expr: body matches "^[^ Z]+ "
    output: parser-crio
  - expr: body matches "^[^ Z]+Z"
    output: parser-containerd
  type: router
- id: parser-crio
  regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
  timestamp:
    layout: 2006-01-02T15:04:05.999999999Z07:00
    layout_type: gotime
    parse_from: attributes.time
  type: regex_parser
- combine_field: attributes.log
  combine_with: ""
  id: crio-recombine
  is_last_entry: attributes.logtag == 'F'
  max_log_size: 102400
  output: extract_metadata_from_filepath
  source_identifier: attributes["log.file.path"]
  type: recombine
- id: parser-containerd
  regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
  timestamp:
    layout: '%Y-%m-%dT%H:%M:%S.%LZ'
    parse_from: attributes.time
  type: regex_parser
- combine_field: attributes.log
  combine_with: ""
  id: containerd-recombine
  is_last_entry: attributes.logtag == 'F'
  max_log_size: 102400
  output: extract_metadata_from_filepath
  source_identifier: attributes["log.file.path"]
  type: recombine
- id: parser-docker
  output: extract_metadata_from_filepath
  timestamp:
    layout: '%Y-%m-%dT%H:%M:%S.%LZ'
    parse_from: attributes.time
  type: json_parser
- id: extract_metadata_from_filepath
  parse_from: attributes["log.file.path"]
  regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
  type: regex_parser
- from: attributes.stream
  to: attributes["log.iostream"]
  type: move
- from: attributes.container_name
  to: resource["k8s.container.name"]
  type: move
- from: attributes.namespace
  to: resource["k8s.namespace.name"]
  type: move
- from: attributes.pod_name
  to: resource["k8s.pod.name"]
  type: move
- from: attributes.restart_count
  to: resource["k8s.container.restart_count"]
  type: move
- from: attributes.uid
  to: resource["k8s.pod.uid"]
  type: move
- from: attributes.log
  to: body
  type: move
start_at: beginning
```

To a new user that is a lot of scary configuration that will take time to comprehend, and they also probably don't want to comprehend it, yet it has to live in their configuration. In the Collector Helm chart we hide this complexity behind a preset, but it can't handle all situations.

Here are a couple experiences I'd like to improve:
-  "Multi-tenancy" support.  The Filelog receiver is good at gathering all the logs at once and sending them down a pipeline, but it is not setup to collect logs for a specific namespace or pod and send that down a specific pipeline. To meet this requirement you must configure multiple instances of the filelogreceiver and duplicate all of the configuration, only changing the `include` section as needed.
    - I believe multiple instances of the receiver are needed, but it would be nice to reduce the amount of duplicate configuration. It would be nice to be able to quickly configure a k8s-specific filelogreceiver and add it to the appropriate pipeline
- No support for label selectors.  Although you can specify specific namespaces/pods/containers to collect by taking advantage of the log path, using label selectors to identify an object is a common practice in k8s.

Some of the solution might be in the helm chart and some might be in the file log receiver itself.  It is also possible this spawns a new k8s-specific receiver that is using stanza behind the scenes.

Ultimately, I am looking to improve the "easy path" solution for most users in Kubernetes.  I want to make it easier for user to collect logs for a specific subset of all the logs in the cluster and for it to be easier to configure multiple instances of the receiver to support different destinations. Packaging up all the Kubernetes assumptions into something like:

```yaml
receivers:
  filelog:
    forKubernetes: true
    include: 
      - /var/log/pods/my-namespace/*/*.log
```

or

```yaml
receivers:
  filelog/api-server:
    forKubernetes: true
    labelSelectors:
      - component=kube-apiserver,tier=control-plane
```

would be great.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Kubernetes Logs Collection Experience #25251

Component(s)

Describe the issue you're reporting

Problem Statement

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve Kubernetes Logs Collection Experience #25251

Description

Component(s)

Describe the issue you're reporting

Problem Statement

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions