Our Technical Support team is fully operational on weekdays from 3am to 8pm ET. For Platinum customers, we provide 24x7 coverage for Urgent tickets via pager notifications. Our Cloud team monitors the production environment 24x7.
At 20:53 on 22 April 2020, a US Platinum Support customer entered an Urgent ticket citing header, top menu and search problems in their system. This triggered a pager alert to our Technical Support team who immediately started to look into the matter. Through troubleshooting in conjunction with our Cloud team, we discovered that, for a subset of US clients on a specific Database server, the “System-Text” service was not working properly. The System-Text service controls the feature where clients can change field names in Analytics, titles of modules, etc. for the purposes of language translation or naming fields/modules in a company-specific manner. This System-Text functionality was recently broken out from our Web services in order to improve the overall performance of this function and to minimize the impact to the overall Web service. We discovered that within the System-Text service, a couple of the servers were unhealthy and not responding effectively. We also discovered that the alerts/alarms/actions that were supposed to trigger when this service was unhealthy did not fire. After restarting the unhealthy services around 23:07, the problem was corrected. You were impacted by this issue because your Interact instance used the particular System-Text servers/services that were in an unhealthy state.
Since this incident arose, we modified the alerts/alarms/actions for the System-Text process to harden them in a similar fashion to other services such as Web, SQL, Login, etc. We have continued to monitor these endpoints directly so that we do not run into this situation again.