Background queues have degraded performance
Incident Report for Aviator
Postmortem

Symptom and cause

From 2024-05-10 9:40 PT to 10:15 PT, the background queue service was degraded. This was caused by the database used for the task queue had exceeded the capacity.

Mitigation and fix

This issue was mitigated by scaling up the database. We have reviewed the substantial fix options, and now the database usage is significantly below the maximum capacity. We have also reviewed and updated the monitoring for the same situation, and alerting rules are also configured now. We do not anticipate the same issue will happen in the near future.

Posted May 13, 2024 - 18:01 UTC

Resolved
This incident has resolved. We will continue monitoring the queues for rest of the day, and will share some postmortem notes.
Posted May 10, 2024 - 17:33 UTC
Monitoring
The background workers should be alive now. It should clear out the backlog in the next 20-30 mins.
Posted May 10, 2024 - 17:19 UTC
Investigating
We are investigating issue where the background queues may have degraded performance for queueing a PR
Posted May 10, 2024 - 16:54 UTC
This incident affected: Background queues.