Skip to content

Commit ece52ed

Browse files
authored
Add alert when data too far in the future is ingested (grafana#5822)
* Add alert IngestedDataTooFarInTheFuture. Signed-off-by: Peter Štibraný <[email protected]>
1 parent 61a2bb9 commit ece52ed

File tree

7 files changed

+81
-1
lines changed

7 files changed

+81
-1
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@
122122
* [ENHANCEMENT] Dashboards: add native histogram active series and active buckets to "tenants" dashboard. #5543
123123
* [ENHANCEMENT] Dashboards: add panels to "Mimir / Writes" for requests rejected for per-instance limits. #5638
124124
* [ENHANCEMENT] Dashboards: rename "Blocks currently loaded" to "Blocks currently owned" in the "Mimir / Queries" dashboard. #5705
125+
* [ENHANCEMENT] Alerts: Add `MimirIngestedDataTooFarInTheFuture` warning alert that triggers when Mimir ingests sample with timestamp more than 1h in the future. #5822
125126
* [BUGFIX] Alerts: fix `MimirIngesterRestarts` to fire only when the ingester container is restarted, excluding the cases the pod is rescheduled. #5397
126127
* [BUGFIX] Dashboards: fix "unhealthy pods" panel on "rollout progress" dashboard showing only a number rather than the name of the workload and the number of unhealthy pods if only one workload has unhealthy pods. #5113 #5200
127128
* [BUGFIX] Alerts: fixed `MimirIngesterHasNotShippedBlocks` and `MimirIngesterHasNotShippedBlocksSinceStart` alerts. #5396

docs/sources/mimir/manage/mimir-runbooks/_index.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1178,6 +1178,22 @@ How to **investigate**:
11781178
{name="rollout-operator",namespace="<namespace>"}
11791179
```
11801180
1181+
### MimirIngestedDataTooFarInTheFuture
1182+
1183+
This alert fires when Mimir ingester accepts a sample with timestamp that is too far in the future.
1184+
This is typically a result of processing of corrupted message, and it can cause rejection of other samples with timestamp close to "now" (real-world time).
1185+
1186+
How it **works**:
1187+
1188+
- The metric exported by ingester computes maximum timestamp from all TSDBs open in ingester.
1189+
- Alert checks this exported metric and fires if maximum timestamp is more than 1h in the future.
1190+
1191+
How to **investigate**
1192+
1193+
- Find the tenant with bad sample on ingester's tenants list, where a warning "TSDB Head max timestamp too far in the future" is displayed.
1194+
- Flush tenant's data to blocks storage.
1195+
- Remove tenant's directory on disk and restart ingester.
1196+
11811197
## Errors catalog
11821198
11831199
Mimir has some codified error IDs that you might see in HTTP responses or logs.

operations/helm/tests/metamonitoring-values-generated/mimir-distributed/templates/metamonitoring/mixin-alerts.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,20 @@ spec:
201201
for: 1h
202202
labels:
203203
severity: warning
204+
- alert: MimirIngestedDataTooFarInTheFuture
205+
annotations:
206+
message: Mimir ingester {{ $labels.pod }} in {{ $labels.cluster }}/{{ $labels.namespace
207+
}} has ingested samples with timestamps more than 1h in the future.
208+
runbook_url: https://grafana.com/docs/mimir/latest/operators-guide/mimir-runbooks/#mimiringesteddatatoofarinthefuture
209+
expr: |
210+
max by(cluster, namespace, pod) (
211+
cortex_ingester_tsdb_head_max_timestamp_seconds - time()
212+
and
213+
cortex_ingester_tsdb_head_max_timestamp_seconds > 0
214+
) > 60*60
215+
for: 5m
216+
labels:
217+
severity: warning
204218
- alert: MimirRingMembersMismatch
205219
annotations:
206220
message: |

operations/mimir-mixin-compiled-baremetal/alerts.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,20 @@ groups:
189189
for: 1h
190190
labels:
191191
severity: warning
192+
- alert: MimirIngestedDataTooFarInTheFuture
193+
annotations:
194+
message: Mimir ingester {{ $labels.instance }} in {{ $labels.cluster }}/{{ $labels.namespace
195+
}} has ingested samples with timestamps more than 1h in the future.
196+
runbook_url: https://grafana.com/docs/mimir/latest/operators-guide/mimir-runbooks/#mimiringesteddatatoofarinthefuture
197+
expr: |
198+
max by(cluster, namespace, instance) (
199+
cortex_ingester_tsdb_head_max_timestamp_seconds - time()
200+
and
201+
cortex_ingester_tsdb_head_max_timestamp_seconds > 0
202+
) > 60*60
203+
for: 5m
204+
labels:
205+
severity: warning
192206
- alert: MimirRingMembersMismatch
193207
annotations:
194208
message: |

operations/mimir-mixin-compiled/alerts.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,20 @@ groups:
189189
for: 1h
190190
labels:
191191
severity: warning
192+
- alert: MimirIngestedDataTooFarInTheFuture
193+
annotations:
194+
message: Mimir ingester {{ $labels.pod }} in {{ $labels.cluster }}/{{ $labels.namespace
195+
}} has ingested samples with timestamps more than 1h in the future.
196+
runbook_url: https://grafana.com/docs/mimir/latest/operators-guide/mimir-runbooks/#mimiringesteddatatoofarinthefuture
197+
expr: |
198+
max by(cluster, namespace, pod) (
199+
cortex_ingester_tsdb_head_max_timestamp_seconds - time()
200+
and
201+
cortex_ingester_tsdb_head_max_timestamp_seconds > 0
202+
) > 60*60
203+
for: 5m
204+
labels:
205+
severity: warning
192206
- alert: MimirRingMembersMismatch
193207
annotations:
194208
message: |

operations/mimir-mixin/alerts/alerts.libsonnet

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,27 @@ local utils = import 'mixin-utils/utils.libsonnet';
291291
message: '%(product)s ruler %(alert_instance_variable)s in %(alert_aggregation_variables)s has no rule groups assigned.' % $._config,
292292
},
293293
},
294+
{
295+
// Alert if a ruler instance has no rule groups assigned while other instances in the same cell do.
296+
alert: $.alertName('IngestedDataTooFarInTheFuture'),
297+
'for': '5m',
298+
expr: |||
299+
max by(%(alert_aggregation_labels)s, %(per_instance_label)s) (
300+
cortex_ingester_tsdb_head_max_timestamp_seconds - time()
301+
and
302+
cortex_ingester_tsdb_head_max_timestamp_seconds > 0
303+
) > 60*60
304+
||| % {
305+
alert_aggregation_labels: $._config.alert_aggregation_labels,
306+
per_instance_label: $._config.per_instance_label,
307+
},
308+
labels: {
309+
severity: 'warning',
310+
},
311+
annotations: {
312+
message: '%(product)s ingester %(alert_instance_variable)s in %(alert_aggregation_variables)s has ingested samples with timestamps more than 1h in the future.' % $._config,
313+
},
314+
},
294315
] + [
295316
{
296317
alert: $.alertName('RingMembersMismatch'),

pkg/ingester/tenants_http.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ func (i *Ingester) TenantsHandler(w http.ResponseWriter, req *http.Request) {
8787
s.MaxTime = formatMillisTime(maxMillis)
8888

8989
if maxMillis-nowMillis > i.limits.CreationGracePeriod(t).Milliseconds() {
90-
s.Warning = "maxT too far in the future"
90+
s.Warning = "TSDB Head max timestamp too far in the future"
9191
}
9292

9393
tss = append(tss, s)

0 commit comments

Comments
 (0)