Increased Error Rates in US pod
Incident Report for Interact
Postmortem

Summary

On 3rd April 2019 at 6:40am (UTC) Interact engineers began rolling out a small update to the application across the US pod. Following this, between 6:40am (UTC) and 08:10am (UTC) customers hosted in the US Pod experienced a number of failed requests for static assets resulting in broken themes (due to missing CSS, JS and images).

Investigation and Root Cause

During the release process it was identified that a number of servers failed to come into rotation, and those servers that successfully came into rotation had failed to map to the underlying asset store. Assets including CSS, JS and images are stored on a shared file storage system that each web server maps too as part of the initialisation process.

Interact believes that an underlying hardware issue with the asset store itself, caused the failure, and consequently rotated the asset store system too. Once this was completed, all application servers were rotated again and the asset store was correctly mapped, effectively restoring the service.

Resolution and Mitigation Steps

Subsequent to the issue, Interact has updated its deployment run book, adding a check to ensure that connectivity to the asset store is active as expected.

Posted Apr 03, 2019 - 10:36 UTC

Resolved
This issue has now been resolved and service has been restored. We apologise for the inconvenience this has caused and will publish a full post mortem once the root cause analysis is complete..
Posted Apr 03, 2019 - 08:25 UTC
Identified
Engineers have identified the issue and are working on restoring the service. This appears to be underlying hardware and we are working with cloud providers to understand the issue
Posted Apr 03, 2019 - 08:11 UTC
Investigating
Interact engineers are currently investigating issues in the US pod. Styling is not being correctly applied and search requests are failing
Posted Apr 03, 2019 - 08:01 UTC
This incident affected: North America / HIPAA Public Cloud.