-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Component(s)
receiver/filelog
What happened?
Description
The filelog documentation states the following:
- retry_on_failure.enabled: If true, the receiver will pause reading a file and attempt to resend the current batch of logs if it encounters an error from downstream components.
I wanted to validate that it worked as expected so I did the following test:
I set up a source collector that reads with the filelogreceiver a basic file in which the current date is written every 10s. In the filelog I activate retry_on_failure and deactivate it in the exporter (so that it does not affect the test).
The logs will be sent by otlphttp to a bridge collector which in turn will forward it to elasticsearch.
After the outage it is observed that the filelog receiver continues to accept log_records despite the fact that the exporter sends are failing, and when the connection with the bridge collector is reopened and the logs arrive again, it is detected that all the logs corresponding to the outage have been lost.
Therefore the receiver does not seem to be stopping reading.
Steps to Reproduce
To reproduce it, you could simplify it to a single collector that uses the filelog receiver and sends to any log backend with the exporter's retry_on_failure disabled.
A basic example would look like this:
- Source collector:
- Filelog receiver configuration with retry_on_failure.enabled=true
filelog/prueba:
include:
- /tmp/prueba.log
storage: file_storage
include_file_path: true
include_file_name: true
retry_on_failure:
enabled: true
max_elapsed_time: 0
- Otlphttp exporter configuration with retry_on_failure.enabled=false and pointing to a backend
otlphttp/backend:
endpoint: "http://ip:port"
tls:
insecure: true
retry_on_failure:
enabled: false
Then, in the middle of the test, you have to force sending errors. For example by cutting off communication between the two.
Expected Result
The expected result is that the filelog receiver will stop reading due to errors in the exporter, and when this is fixed it will resume reading the file where it left off.
Actual Result
The current behaviour is that the filelog continues reading as if nothing happened and when the exporter recovers from the error it starts sending the logs wherever it is, thus losing all the logs related to the outage.
Collector version
v0.114.0
Environment information
Environment
OS: Red Hat Enterprise Linux Server 7.5 (Maipo)
OpenTelemetry Collector configuration
receivers:
batch:
filelog/prueba:
include:
- /tmp/prueba.log
storage: file_storage
include_file_path: true
include_file_name: true
retry_on_failure:
enabled: true
max_elapsed_time: 0
exporters:
otlphttp/backend:
endpoint: "http://ip:port"
tls:
insecure: true
retry_on_failure:
enabled: false
extensions:
file_storage:
directory: /tmp/storage
compaction:
directory: /tmp/storage
on_start: true
on_rebound: true
rebound_trigger_threshold_mib: 10
rebound_needed_threshold_mib: 100
max_transaction_size: 256
timeout: 2s
fsync: true
service:
pipelines:
logs/prueba:
receivers: [filelog/prueba]
processors: [batch]
exporters: [otlphttp/backend]
Log output
Additional context
No response