Performance issues impacting a subset of EU Customers

Incident Report for Interact

Postmortem

On 29th August 2019 at 10:00am (UTC) Interact engineers identified an issue with increased latency within the EU pod meaning a subset of users were unable to access their Intranet.

‌

Investigation and Root Cause

Interact’s alerting and monitoring alerted our infrastructure team to a spike in performance. Our web server infrastructure was immediately investigated and logs identified that a sudden increase in load caused ‌service disruption for a subset of users.

‌

Resolution and Mitigation Steps

The issue was identified to be an increased load on the NFS drive which acts as a document store for Interact as a result of unforeseen load spike. This has caused an increase in latency in the EU region. All requests using the data on the NFS drive were affected by the sub-optimal performance. The service was restored fully by restarting the NFS service and rotating the web server instances to drain the requests and reset the internal flow of traffic. All services were reporting healthy and no errors were thrown during the incident. We have not experience this issue since restarting the NFS service and rotating the web boxes, even with increased loads, therefore it suggests this issue was an intermittent issue at a hardware level which was resolved by rotating the instances and restarting the NFS service.

Interact continues to monitor this closely and our engineers rotate the instances on a regular basis to mitigate the risk of hardware level issues or fluctuations on long running instances.

Posted Oct 01, 2019 - 09:17 UTC

Resolved

This incident has been resolved. A full post mortem will be issued shortly.

Posted Aug 29, 2019 - 10:50 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted Aug 29, 2019 - 10:28 UTC

Investigating

Interact Engineers are currently investigating issues within the EU of higher than normal latency which is causing slowness and service disruption across all EU customers.

Posted Aug 29, 2019 - 10:06 UTC

This incident affected: EMEA Public Cloud.