Issues with us-prod-central-0 Cluster

Incident Report for Grafana Cloud

Resolved

We have continued to observe stable workloads and the cluster remains healthy. Engineering has completed coordination and communication efforts with the third-party vendor and, as such, we are considering this issue resolved.

No further updates.

Posted Mar 08, 2023 - 13:19 UTC

Update

We're continuing to monitor our prod-us-central-0 cluster. The cluster is healthy, workloads are stable.

Posted Mar 07, 2023 - 00:18 UTC

Monitoring

Engineering has made a change to mitigate the issue. As of 16:22 UTC, pods in the us-prod-central-0 cluster have entered into an appropriate state and are exhibiting normal behavior. Instances in the us-prod-central-0 cluster should no longer experience adverse behavior.

We will continue to monitor at this time.

Posted Mar 03, 2023 - 16:30 UTC

Identified

As of 15:45 UTC, we have observed pods re-entering a poor state within the us-prod-central-0 cluster, only. The underlying issue is stemming from a third-party vendor, and Engineering is actively engaged to restore functionality.

Instances within the us-prod-central-0 cluster may experience an inability to initialize or start. We will continue to provide updates as we learn more and undertake mitigation efforts.

Posted Mar 03, 2023 - 16:01 UTC

Update

Engineering continues to engage with the third-party vendor while we await further details surrounding root cause.

We've continued to observe normal pod behaviors and will continue to monitor.

Posted Mar 02, 2023 - 20:36 UTC

Update

We continue to observe normal pod behavior and instance functionality for the us-prod-central-0 cluster.

We will continue to monitor the state of this issue, pending information from the third-party vendor surrounding details of the poor cluster behavior.

Posted Mar 01, 2023 - 22:58 UTC

Monitoring

Engineering has made a change to mitigate the issue. As of 21:40 UTC, pods in the us-prod-central-0 cluster have entered into an appropriate state and are exhibiting normal behavior. Instances in the us-prod-central-0 cluster should no longer experience adverse behavior.

We will continue to monitor the state of this issue and relay any additional updates.

Posted Mar 01, 2023 - 21:48 UTC

Identified

As of 20:44 UTC, we were alerted to a quantity of pods entering a poor state within the us-prod-central-0 cluster, only. The underlying issue is stemming from a third-party vendor, and Engineering is actively engaged to restore functionality.

Instances within the us-prod-central-0 cluster may experience an inability to initialize or start. We will continue to provide updates as we learn more.

Posted Mar 01, 2023 - 20:56 UTC

This incident affected: Grafana Cloud: Hosted Grafana (GCP US Central - prod-us-central-0).