Monitoring - A fix has been implemented and we are monitoring the results.
Apr 05, 2024 - 15:08 UTC
Update - Our team has initiated a process that we believe will effectively fix the issue. However, this process may take some time to complete.
Apr 05, 2024 - 12:57 UTC
Investigating - We're investigating reports of users unable to download Grafana-Agent packages from the APT repository. Our engineers are actively working to restore missing objects and ensure package availability. Further updates will follow as the investigation progresses.
Apr 05, 2024 - 10:50 UTC
Completed -
The scheduled maintenance has been completed.
Apr 17, 16:00 UTC
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 17, 14:30 UTC
Scheduled -
We are upgrading the probe infrastructure of our New york and San Francisco probes to improve their reliability and performance. Synthetic tests may experience higher latencies for a brief time as we migrate checks.
Apr 12, 16:25 UTC
Resolved -
This incident has been resolved.
Apr 12, 15:20 UTC
Update -
We are continuing to investigate this issue.
Apr 12, 12:02 UTC
Investigating -
We have identified an incident that is being investigated by our team; some read queries, including range queries, are returning 5xx errors, and there is partial impact on write queries as well.
Apr 12, 09:28 UTC
Completed -
The scheduled maintenance has been completed.
Apr 11, 17:30 UTC
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 11, 17:00 UTC
Scheduled -
Beginning at 17:00 UTC on April 11, we will be completing some maintenance on our Support Ticketing secure file transfer system (SendSafely). During this time, users submitting support tickets to Grafana Support will be unable to attach a secure file.
We anticipate this period to last approximately 30 minutes, after which secure attachments may be uploaded.
Apr 11, 14:37 UTC
Resolved -
This incident has been resolved.
Apr 11, 12:32 UTC
Monitoring -
At approximately 15:00 UTC, we received reports that the pending periods and group intervals of alerts are being incorrectly displayed in the alert rule editing UI. We have determined that this issue is only impacting the frontend presentation of the pending periods and group intervals, they are being correctly applied on the backend.
We have identified the cause and are working to implement a fix.
Apr 10, 17:20 UTC
Resolved -
We had identified an incident that was investigated by our team; all read queries, including range queries, were returning 5xx errors, and there was partial impact on write queries as well. This issue is now resolved and normal service functionality has been restored
Apr 11, 10:00 UTC
Resolved -
As of 22:20 UTC, querying has been restored for both Graphite and Prometheus, and users should no longer be experiencing the 500 internal Server Error.
Apr 10, 22:26 UTC
Monitoring -
As of 19:17 UTC, we have observed marked recovery and users in the `prod-us-central-0` cluster should now be able to query Graphite and Prometheus metrics without errors.
We will monitor this issue and provide updates as more information is shared.
Apr 10, 19:41 UTC
Investigating -
As of 18:30 UTC, we were alerted to a high volume of errors querying metrics in both Graphite and Prometheus within the prod-us-central-0 cluster only. Users encountering this issue may experience an error stating "500 Internal Server Error: the bucket index is too old" when querying metrics.
Engineering is actively engaged and assessing possible remediation paths. We will provide updates as more information is shared.
Apr 10, 19:05 UTC
Resolved -
This incident has been resolved.
Apr 10, 13:17 UTC
Investigating -
We have identified an incident that is currently being investigated by our team; metric range queries are returning 5xx errors, impacting both read, parallel-read, and write paths. We are actively working to resolve the issue and restore normal service functionality.
Apr 10, 12:04 UTC
Resolved -
Engineering has released a fix for all affected plugin versions and customers should no longer experience PDC certificate errors from this cause. At this time, we are considering this issue resolved. No further updates.
Apr 5, 15:02 UTC
Update -
Engineering has identified this behaviour affects all regions using the following plugin versions:
We are continuing to work to fix this issue and will provide updates as more information is shared.
Apr 5, 10:08 UTC
Identified -
Engineering has identified the issue and is currently exploring remediation options. At this time, users may continue to experience certificate errors on PDC in prod-au-southeast-1
We will continue to provide updates as more information is shared.
Apr 5, 09:18 UTC
Investigating -
We are experiencing connectivity issues with Private Datasource Connect for instances running in Australia. Users experiencing this issue may encounter errors on their root ca certificate for PDC connections
Engineering is actively engaged and assessing the issue. We will provide updates accordingly.
Apr 5, 09:10 UTC
Resolved -
This incident has been resolved.
Apr 5, 13:53 UTC
Monitoring -
A fix has been implemented and we are monitoring the results.
Apr 5, 13:08 UTC
Investigating -
We are currently experiencing an issue impacting both read and write operations in the US Central region for our Loki-prod service. Customers may experience delays or errors when accessing our services in the affected region. Our team is actively investigating the issue and working to restore normal service as quickly as possible.
Apr 5, 12:28 UTC