Read errors in prod-eu-west-0

Incident Report for Grafana Cloud

Resolved

Many ingesters were evicted from nodes in cortex-prod-01 at once causing a read path outage. Once the ingesters were rescheduled the read path recovered. The errors lasted about 10 minutes.
Posted Mar 06, 2025 - 16:49 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Mar 06, 2025 - 16:17 UTC

Investigating

As of 3:35pm UTC, we were alerted to an issue with a Mimir read path outage in prod-01-eu-west-0. Users experiencing this issue may have encountered timeouts on Prometheus/metrics queries.

Service has recovered but we are monitoring.
Posted Mar 06, 2025 - 16:16 UTC
This incident affected: Grafana Cloud: Prometheus (GCP Belgium - prod-eu-west-0: Querying, GCP Belgium - prod-eu-west-0: Ingestion).