Completed -
The scheduled maintenance has been completed.
Feb 19, 13:34 UTC
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Feb 19, 13:00 UTC
Scheduled -
Possible user impact: Alert instances for Synthetic Monitoring ProbeFailedExecutionsTooHigh provisioned alert rule that are firing during the maintenance might resolve and fire again in the next evaluation.
Feb 19, 09:46 UTC
Resolved -
We experienced an issue impacting a cell within the Azure prod-us-central-7 region, which occurred between 14:26 and 14:36. Affected users may have noticed increased errors with rule evaluations, as well as a some read/write errors. We have resolved this issue, and will continue to monitor.
Feb 18, 14:00 UTC
Completed -
The scheduled maintenance has been completed.
Feb 18, 13:33 UTC
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Feb 18, 13:00 UTC
Scheduled -
Alert instances for Synthetic Monitoring ProbeFailedExecutionsTooHigh provisioned alert rule that are firing during the maintenance might resolve and fire again in the next evaluation. Only API is affected.
Estimated time window is 13:00–14:00 UTC
Impacted clusters are:
prod-me-central-1 prod-us-east-1 prod-ap-northeast-0 prod-gb-south-0 prod-us-east-3 prod-eu-central-0 prod-ap-south-1 prod-sa-east-1
Feb 18, 10:04 UTC
Resolved -
This incident has been resolved.
Feb 17, 16:27 UTC
Monitoring -
Alert instances for Synthetic Monitoring ProbeFailedExecutionsTooHigh provisioned alert rule that are firing during this maintenance might resolve and fire again in the next evaluation. Only the API is affected.
Estimated time window is 15:00–16:00 UTC
Impacted clusters are:
prod-eu-west-5 prod-us-east-4 prod-eu-west-6 prod-sa-east-0 prod-ap-south-0 prod-ap-southeast-0 prod-me-central-0 prod-au-southeast-0 prod-ap-southeast-2
Feb 17, 14:53 UTC
Resolved -
There was a service degradation today from ~12:09 UTC until ~12:35 UTC on the Public Probe of Calgary for Synthetic Monitoring. Impact may include SM check fails where the probe was used.
Feb 17, 12:47 UTC
Resolved -
This incident has been resolved.
Feb 13, 19:08 UTC
Investigating -
We are currently investigating an issue which is causing users the inability to sign up for self-serve Grafana. We will continue to update with more information as we progress our investigation.
Feb 13, 18:40 UTC
Resolved -
This incident has been resolved.
Feb 13, 17:03 UTC
Update -
We are continuing to work on a fix for this issue.
Feb 13, 07:56 UTC
Update -
A fix is being made to mitigate the issue. We will provide further updates accordingly.
Feb 13, 07:56 UTC
Identified -
As of 22:45 UTC, we have identified a serious bug affecting the delete endpoint for all Loki regions. As a precaution, the endpoint has been temporarily disabled.
Engineering is actively engaged and assessing the issue. We will provide updates accordingly.
Feb 12, 23:16 UTC
Resolved -
We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.
Feb 13, 07:29 UTC
Monitoring -
We have scaled up to handle the increased traffic and are seeing marked improvement. We will continue to monitor and provide updates.
Feb 13, 07:09 UTC
Investigating -
We have been alerted to an ongoing Loki writes outage in the prod-ca-east-0 region. Our Engineering team is actively investigating this.
Feb 13, 06:59 UTC
Resolved -
This incident has been resolved.
Feb 12, 19:07 UTC
Monitoring -
We are undergoing essential maintenance for Faro services. Users may experience a short outage for the service of <1 minute during this time. We expect this to be finished within an hour.
Feb 12, 16:09 UTC
Resolved -
We no longer observed any problems with our services - this incident has been resolved.
Feb 12, 14:30 UTC
Monitoring -
The fix has been implemented and services are back to normal. We're currently monitoring health of the services before resolving this incident.
Feb 12, 12:50 UTC
Identified -
The issue has been identified and our team is currently working on a fix.
Feb 12, 12:40 UTC
Investigating -
Since 12:17 UTC, we're observing an increased latency for data ingestion and rule evaluation in Grafana Cloud Metrics, prod-eu-west-2 region. We're currently investigating the issue.
Feb 12, 12:33 UTC
Resolved -
This incident has been resolved.
Feb 11, 21:47 UTC
Monitoring -
We are in the process of rolling out the fix.
Feb 11, 18:20 UTC
Identified -
We have identified the issue, and are working on a fix.
Feb 11, 16:22 UTC
Investigating -
We are aware of an issue that is preventing the installation of the Slack integration. We are currently investigating this, and will provide updates as they become available.
Feb 11, 14:21 UTC
Resolved -
We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.
Feb 10, 01:45 UTC
Investigating -
As of 00:10, we are currently experiencing write failures in a single cell affecting customers in prod-us-central-0. Impacted customers may see failed or dropped writes.
Engineering is actively engaged and assessing the issue. We will provide updates accordingly.
Feb 10, 00:39 UTC
Resolved -
This incident has been resolved.
Feb 9, 11:21 UTC
Update -
We are continuing to monitor for any further issues.
Feb 9, 10:36 UTC
Monitoring -
Between 09:47 and 10:14 UTC, Grafana Cloud Logs within a single cell residing in the prod-ap-southeast-1 region experienced an issue affecting write ingestion only. During this time, some log writes may have failed or been delayed. Log reads were not impacted and remained fully available throughout the incident.
Our engineering team quickly identified the cause of the issue and are monitoring the service. The service has been operating normally since 10:14 UTC.
Feb 9, 10:32 UTC
Resolved -
Between 18:32 and 18:46 UTC, Grafana Cloud Metrics within a single cell residing in the prod-us-west-0 region experienced an issue affecting write ingestion only. During this time, some metric writes may have failed or been delayed. Metric reads were not impacted and remained fully available throughout the incident.
Our engineering team quickly identified the cause of the issue and implemented mitigation steps to restore normal write ingestion. The service has been operating normally since 18:46 UTC.
Feb 5, 18:30 UTC
Resolved -
From 17:43 UTC to 18:05 UTC, a subset of customers experienced elevated latency and a peak error rate of approximately 22% for trace ingestion.
Feb 5, 18:00 UTC
Resolved -
This incident has been resolved.
Feb 5, 17:41 UTC
Monitoring -
Services recovered and there's no active issue anymore. We're still monitoring the overall health.
Feb 5, 14:40 UTC
Investigating -
We're experiencing an issue in us-central-0 region for Hosted Metrics offering - the issue manifest in rule evaluations failing, and possibility of queries returning stale data. We're actively investigating the cause of the issue.
Feb 5, 14:14 UTC
Resolved -
This incident has been resolved.
Feb 5, 15:31 UTC
Monitoring -
The issue causing the incident has been identified, and the fix has been deployed. All new test runs work consistently
Feb 5, 09:36 UTC
Update -
We are continuing to work on a fix for this issue.
Feb 4, 19:57 UTC
Identified -
We encountered a subtle bug which caused our test-run finalization process to read from stale threshold status because of a synchronization issue.
We have since resolved the bug, and new test runs will work properly. Impacted test runs will need to be fixed via further correction on our end. We will continue to provide updates on the progress of the fix for impacted test runs.
Feb 4, 17:20 UTC