Primary database server became unavailable causing a site wide outage for 12 minutes. Site was restored after a server reboot.
All times are EST
Date/Time | Activity |
---|---|
2021-02-16 18:47 | Issues reported with site unavailability. |
2021-02-16 18:51 | Issue identified as an increased load on the primary database. |
2021-02-16 18:53 | A clean stop of database processes was attempted. |
2021-02-16 18:55 | Database process was unable to stop cleanly, and server was rebooted. |
2021-02-16 18:58 | Crash recovery process started. |
2021-02-16 18:59 | Site access was restored. Status was updated to “monitoring”. |
2021-02-16 19:02 | Celery workers restarted. |
Primary database became unresponsive and couldn’t serve requests.
Primary database server was restarted to recover from a process crash.
The site was unavailable for 12 minutes.
Operating system version was downgraded and an additional server was added to allow for clustering to limit impact in the event of a database failure.