Metrics Write Outage in Multiple Cells

Incident Report for Grafana Cloud

Resolved

As of 23:07 UTC we are considering this incident as resolved.

Mitigation efforts have restored normal write performance, and error rates have returned to expected levels.

We have confirmed stability across the affected areas and continue to monitor, but no further impact is expected.

If you continue to experience any issues, please reach out to support.
Posted Nov 17, 2025 - 23:08 UTC

Monitoring

We are seeing improvement on the metrics side, with write performance recovering.

We continue to investigate the remaining impact to Synthetic Monitoring and are working to determine the underlying cause.

Monitoring will continue as recovery progresses, and we’ll provide further updates as we learn more.
Posted Nov 17, 2025 - 21:22 UTC

Update

Our teams have been alerted that Synthetic Monitoring will also be affected by this outage.

Users may see gaps in their Synthetic Monitoring metrics as well as missed alerts as a result of this.

We continue to investigate and will provide further updates as they become available.
Posted Nov 17, 2025 - 20:50 UTC

Identified

We’ve re-evaluated the situation and this issue is still ongoing. Although we initially observed signs of recovery, write errors continue to occur in the affected cells.

Mitigation work is still in progress, and we’re treating the incident as identified again while we work toward a sustained resolution. We’ll provide further updates as we confirm stabilization.
Posted Nov 17, 2025 - 20:34 UTC

Monitoring

Mitigation has been applied and Mimir write performance is beginning to recover in the affected cells.

prod-us-central-0.cortex-prod-10 appears to have recovered as of 19:52 UTC, and prod-us-central-5.cortex-dedicated-06 is showing signs of recovery as of 20:00 UTC.

We are continuing to monitor both cells closely to ensure the mitigation is effective and that the systems remain stable.
Posted Nov 17, 2025 - 20:11 UTC

Investigating

We are investigating a partial write outage affecting multiple metrics cells, beginning around 19:30 UTC. Some customers may see intermittent write failures or delays, but most requests should succeed after retries and recent metrics may appear late as a result.

Querying previously ingested data remains unaffected. Engineering is continuing to investigate and will provide further updates as more information becomes available.
Posted Nov 17, 2025 - 20:02 UTC
This incident affected: Grafana Cloud: Prometheus (GCP US Central - prod-us-central-0: Ingestion, GCP US Central - prod-us-central-5: Ingestion) and Grafana Cloud: Synthetic Monitoring (GCP US Central - prod-us-central-0: Public Probes).