Skip to content

Filelog receiver does not stop reading when exporterfails. #40741

@gabrielSanchezCampello

Description

@gabrielSanchezCampello

Component(s)

receiver/filelog

What happened?

Description

The filelog documentation states the following:

  • retry_on_failure.enabled: If true, the receiver will pause reading a file and attempt to resend the current batch of logs if it encounters an error from downstream components.

I wanted to validate that it worked as expected so I did the following test:

I set up a source collector that reads with the filelogreceiver a basic file in which the current date is written every 10s. In the filelog I activate retry_on_failure and deactivate it in the exporter (so that it does not affect the test).
The logs will be sent by otlphttp to a bridge collector which in turn will forward it to elasticsearch.

After the outage it is observed that the filelog receiver continues to accept log_records despite the fact that the exporter sends are failing, and when the connection with the bridge collector is reopened and the logs arrive again, it is detected that all the logs corresponding to the outage have been lost.

Therefore the receiver does not seem to be stopping reading.

Steps to Reproduce

To reproduce it, you could simplify it to a single collector that uses the filelog receiver and sends to any log backend with the exporter's retry_on_failure disabled.

A basic example would look like this:

  • Source collector:
    • Filelog receiver configuration with retry_on_failure.enabled=true
filelog/prueba:
    include:
      - /tmp/prueba.log
    storage: file_storage
    include_file_path: true
    include_file_name: true
    retry_on_failure:
      enabled: true
      max_elapsed_time: 0
  • Otlphttp exporter configuration with retry_on_failure.enabled=false and pointing to a backend
  otlphttp/backend:
    endpoint: "http://ip:port"
    tls:
      insecure: true
    retry_on_failure:
      enabled: false

Then, in the middle of the test, you have to force sending errors. For example by cutting off communication between the two.

Expected Result

The expected result is that the filelog receiver will stop reading due to errors in the exporter, and when this is fixed it will resume reading the file where it left off.

Actual Result

The current behaviour is that the filelog continues reading as if nothing happened and when the exporter recovers from the error it starts sending the logs wherever it is, thus losing all the logs related to the outage.

Collector version

v0.114.0

Environment information

Environment

OS: Red Hat Enterprise Linux Server 7.5 (Maipo)

OpenTelemetry Collector configuration

receivers:
  batch:
  filelog/prueba:
    include:
      - /tmp/prueba.log
    storage: file_storage
    include_file_path: true
    include_file_name: true
    retry_on_failure:
      enabled: true
      max_elapsed_time: 0

exporters:
  otlphttp/backend:
    endpoint: "http://ip:port"
    tls:
      insecure: true
    retry_on_failure:
      enabled: false

extensions:
  file_storage:
    directory: /tmp/storage
    compaction:
      directory: /tmp/storage
      on_start: true
      on_rebound: true
      rebound_trigger_threshold_mib: 10
      rebound_needed_threshold_mib: 100
      max_transaction_size: 256
    timeout: 2s
    fsync: true

service:
  pipelines:
    logs/prueba:
      receivers: [filelog/prueba]
      processors: [batch]
      exporters: [otlphttp/backend]

Log output

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions