Issues occurred on October 6th, 2021. During a routine software deployment, an error condition caused a partial outage of features for the DrChrono platform.
All times are EST.
|2021-10-06 16:30||Production canary deployment and monitoring completed.|
|2021-10-06 17:04||Production deployment migrations completed.|
|2021-10-06 17:06||Initial issues with DrChrono application reported.|
|2021-10-06 17:08||Issues identified to impact a subset of features.|
|2021-10-06 17:09||Initial reports from customers received.|
|2021-10-06 17:09||Initial pages to Ops received.|
|2021-10-06 17:11||Issues identified as related to production database migrations.|
|2021-10-06 17:13||Production canary nodes confirmed as not experiencing the issue; accelerated production deployment identified as the quickest and safest measure to resolve partial outage.|
|2021-10-06 17:16||Production deploy accelerated.|
|2021-10-06 17:18||Initial Status Page message published.|
|2021-10-06 17:24||Partial outage largely resolved.|
|2021-10-06 17:28||Partial production outage resolved.|
|2021-10-06 17:31||Status page updated to “Identified”.|
|2021-10-06 17:32||Partial outage confirmed resolved by Engineering.|
|2021-10-06 17:39||Status page updated to “Monitoring/Operational”.|
|2021-10-06 17:40||Fix verified by Customer Success / multiple parties.|
During a routine software deployment, an errant change was made to a data model. This change is typically made in two steps but was moved to a single step on this deployment. DrChrono’s software is designed to protect the integrity of our data and functioned as expected, causing an error when the errant condition was detected. This error caused a partial outage of features for our customers.
The error was confirmed to be resolved with the new code release that was part of the ongoing Deployment. The feature outage time was reduced by accelerating the deployment process.
All customers experienced an outage of several features on the DrChrono platform for approximately 20 minutes.
DrChrono has completed an analysis of methodologies to better detect these conditions in the future. We are developing methods to further shorten the deployment window time, which would more quickly resolve any occurrences of this issue in the future.