Post Mortem of Outage – 12/23/2019
Description of Outage:
At 12 PM ET on 12/23/2019 we identified that users on one of our Standard server pool servers had an unexpected disconnect from the file server. Anyone on that server were unable to connect to their files and all work in progress was closed as well. This issue was isolated only to users on that specific server in the pool as we run small redundant pool servers for optimal performance.
Outage Review:
The outage affected users for approximately half an hour. After that affected users were immediately able to log back in and access all files. However, any open work in progress had been closed. We determined that the cause of the shutdown was caused by an automatic reboot of that server.
Remediation:
We are continuing to work with Amazon engineers to determine if the sudden reboot was a fluke or if there is something specific that cuased this to ensure it does not happen again. We will update this post after that additional research.