Background queues have degraded performance

Incident Report for Aviator

Postmortem

Symptom and cause

From 2024-05-10 9:40 PT to 10:15 PT, the background queue service was degraded. This was caused by the database used for the task queue had exceeded the capacity.

Mitigation and fix

This issue was mitigated by scaling up the database. We have reviewed the substantial fix options, and now the database usage is significantly below the maximum capacity. We have also reviewed and updated the monitoring for the same situation, and alerting rules are also configured now. We do not anticipate the same issue will happen in the near future.

Posted 11 months ago. May 13, 2024 - 18:01 UTC

Resolved

This incident has resolved. We will continue monitoring the queues for rest of the day, and will share some postmortem notes.
Posted 11 months ago. May 10, 2024 - 17:33 UTC

Monitoring

The background workers should be alive now. It should clear out the backlog in the next 20-30 mins.
Posted 11 months ago. May 10, 2024 - 17:19 UTC

Investigating

We are investigating issue where the background queues may have degraded performance for queueing a PR
Posted 11 months ago. May 10, 2024 - 16:54 UTC
This incident affected: Background queues.