On the morning of May 4th, 2021, during routine scaling, Interact servers experienced latency due to an abnormal increase in traffic. The infrastructure was unable to scale at a satisfactory rate, which resulted in flooded request queues that were unable to recover in a timely fashion even after reaching full capacity. A similar situation occurred on the morning of May 5th when traffic demands again overloaded servers during active scaling.
Resolution
To fix both of these issues, our engineering team restarted the impacted servers and services returned to normal.
Remediation
Interact engineers are adjusting the infrastructure to capably serve a higher volume of traffic through pre-provisioning servers ahead of scaling and ensuring that queues are self-cleaned in the event of long-running requests or overloaded queues.