Investigating - We are currently investigating an issue that is causing the Loki read path to experience periodic panics on certain queries. Users may notice some queries failing. Our team is actively working to identify the root cause as well as a resolution.
Jul 25, 2024 - 14:59 UTC
There will be scheduled maintenance activities to enhance our database infrastructure. During this period, there may be brief disruptions to some of our services, particularly those involving Cloud Access Policies and token-based activities. However, frequently used tokens are expected to remain operational, thereby minimizing disruptions to data ingestion processes such as metrics and logs.
We appreciate your understanding and patience during this maintenance. Our team will work diligently to ensure any downtime is minimized. Posted on
Jul 15, 2024 - 09:01 UTC
There will be scheduled maintenance activities to enhance our database infrastructure. During this period, there may be brief disruptions to some of our services, particularly those involving Cloud Access Policies and token-based activities. However, frequently used tokens are expected to remain operational, thereby minimizing disruptions to data ingestion processes such as metrics and logs.
We appreciate your understanding and patience during this maintenance. Our team will work diligently to ensure any downtime is minimized. Posted on
Jul 15, 2024 - 09:00 UTC
Resolved -
There were no observed instances of this issue today. We're considering the incident resolved. Please contact our Support team if you believe you're impacted by this incident.
Jul 22, 21:46 UTC
Monitoring -
Systems are stable again and we are continuing to monitor.
We will provide another status update if there are any changes.
Jul 20, 00:25 UTC
Update -
If you are experiencing queries taking longer to complete than expected, please cancel and retry them.
We will continue to investigate and provide updates as more information becomes available.
Jul 19, 23:18 UTC
Update -
Around 17:00 UTC we had noticed an impact that we are currently investigating with degraded performance with rule evaluation in the following clusters: us-central-0/loki-prod us-central-5/loki-prod-017 us-central-5/loki-prod3
We will continue to investigate and provide updates as more information becomes available.
Jul 19, 22:30 UTC
Investigating -
We are currently investigating an issue with degraded performance with rule evaluation in us-central-0.loki. We will continue to investigate and provide updates as more information becomes available.
Jul 19, 22:06 UTC
Resolved -
We recently encountered an issue where newly-created Grafana Cloud API keys were not authenticating correctly with backend gateway services (e.g., Mimir, Loki, Tempo). This was due to a change implemented late last week. A hotfix has been applied and this issue is fully resolved.
Jul 22, 14:00 UTC
Resolved -
Engineering has released a fix and as of 16:07 UTC, customers should no longer experience read/write path degradation on us-central2 cloud metrics and traces. At this time, we are considering this issue resolved. No further updates.
Jul 19, 17:48 UTC
Monitoring -
A fix has been implemented and we are currently monitoring the results.
Jul 19, 15:07 UTC
Update -
We are continuing to investigate this issue.
Jul 19, 10:48 UTC
Update -
We are continuing to investigate this issue.
Jul 19, 10:46 UTC
Investigating -
Starting with 08:20 UTC we've see degraded read/write path on us-central2 cloud metrics & traces. We're currently investigating the issue.
Jul 19, 08:20 UTC
Resolved -
Engineering has released a fix and as of 10:30 UTC, customers should no longer experience hosted services issues. At this time, we are considering this issue resolved. No further updates.
Jul 19, 10:30 UTC
Update -
As of 4:45 UTC, we have observed marked improvement with Prometheus, Loki, and Tempo components. We are continuing to monitor for signs of improvement in other components as this fix rolls out across our cloud service provider's services.
Jul 19, 04:49 UTC
Identified -
Our cloud service provider has identified the cause of the outage and are applying mitigations. We expect to see signs of improvement as this fix rolls out across their services.
Jul 19, 02:22 UTC
Investigating -
We are currently investigating issues with hosted services in us-central2.
Jul 18, 23:13 UTC
Resolved -
The incident began at 01:30 UTC. We saw a 100% failure rate for multiHTTP checks in Sao Paulo, Seoul, and Mumbai. All other check types are running normally. The incident was resolved at 15:00 UTC
Jul 19, 05:30 UTC
Resolved -
This incident has been resolved.
Jul 19, 01:45 UTC
Update -
The incident began at 16:17 UTC. We are seeing a 100% failure rate for multiHTTP checks in Bangalore and Newark. All other check types in Bangalore and Newark are running normally. We have identified the problem and are working to resolve it.
Jul 19, 00:51 UTC
Identified -
We are seeing failures on synthetic monitoring MultiHTTP checks running out of Bangalore and Newark. Test executions for all other check types are functioning normally.
Jul 19, 16:17 UTC
Completed -
The scheduled maintenance has been completed.
Jul 18, 12:00 UTC
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Jul 18, 07:00 UTC
Scheduled -
Zendesk, our customer service provider, is performing scheduled database maintenance between 7:00-12:00 UTC on 2024-07-18. You might encounter issues submitting or replying to support tickets during this time. We expect any interruptions to be brief, but if you do experience any problems, please try again in 5-10 minutes.
Jul 18, 03:26 UTC
Resolved -
All clusters have been restarted and datasources should be fully operational.
Jul 17, 20:29 UTC
Identified -
At approximately 18:19 UTC, our team identified an issue with our datasource API servers, which required us to restart a number of clusters which may have affected the follow datasources:
Resolved -
Starting at ~21:11 UTC, Loki had a brief outage. Users may have experienced degraded write path performance for approximately 10 minutes, and 2-3 minutes of total write and read outage. This has since been resolved.
Jul 16, 21:45 UTC
Resolved -
This incident has been resolved.
Jul 15, 22:28 UTC
Investigating -
ConfluentCloud API is having high error rates and latencies, affecting the Confluent integration in all regions. The issue was detected on their side and a fix is being implemented.