Service Disruption impacting our US customers

Incident Report for Interact

Postmortem

On Monday morning June 14th, our cloud team was alerted to higher than average 500 errors around 10:30 AM ET. Engineers quickly identified the issue with Interact's Network File System (NFS). This system is responsible for storing file-based content. The team subsequently restarted NFS while checking for any unhealthy web instances. Any servers in an unhealthy state were detached and replaced after an initial IIS reset.

By 11:00 AM ET, Interact's status page notified customers of the known outage, and SRE's were actively working on a fix. By this point, the team had already observed latency subsiding and web servers showing healthy. The team advised systems were back and operational at 12:30EST.

We have a project underway to fully replace our NFS solution with a more robust AWS S3 solution for file storage. We hope to have this project completed in the Q3 2021 timeframe, and, once in place, this type of issue should be minimized if not eliminated.

Posted Jun 21, 2021 - 16:26 UTC

Resolved

The incident has been resolved.

Posted Jun 14, 2021 - 16:29 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Jun 14, 2021 - 15:31 UTC

Investigating

Interact Engineers are currently investigating issues within the US of higher than normal latency which is causing slowness and service disruption across some of our US customers.

Engineers are working on a resolution at highest priority and will continue to update this page with updates.

We apologise for the inconvenience.

Posted Jun 14, 2021 - 14:57 UTC

This incident affected: North America / HIPAA Public Cloud.