Site Outage
Incident Report for DrChrono
Postmortem

Root Cause Analysis: 10/06/2021 Deployment Fault

Summary

Issues occurred on October 6th, 2021. During a routine software deployment, an error condition caused a partial outage of features for the DrChrono platform.

Timeline (EST, 24-hour clock)

All times are EST.

Date/Time Activity
2021-10-06 16:30 Production canary deployment and monitoring completed.
2021-10-06 17:04 Production deployment migrations completed.
2021-10-06 17:06 Initial issues with DrChrono application reported.
2021-10-06 17:08 Issues identified to impact a subset of features.
2021-10-06 17:09 Initial reports from customers received.
2021-10-06 17:09 Initial pages to Ops received.
2021-10-06 17:11 Issues identified as related to production database migrations.
2021-10-06 17:13 Production canary nodes confirmed as not experiencing the issue; accelerated production deployment identified as the quickest and safest measure to resolve partial outage.
2021-10-06 17:16 Production deploy accelerated.
2021-10-06 17:18 Initial Status Page message published.
2021-10-06 17:24 Partial outage largely resolved.
2021-10-06 17:28 Partial production outage resolved.
2021-10-06 17:31 Status page updated to “Identified”.
2021-10-06 17:32 Partial outage confirmed resolved by Engineering.
2021-10-06 17:39 Status page updated to “Monitoring/Operational”.
2021-10-06 17:40 Fix verified by Customer Success / multiple parties.

Contributing Factor(s)

During a routine software deployment, an errant change was made to a data model. This change is typically made in two steps but was moved to a single step on this deployment. DrChrono’s software is designed to protect the integrity of our data and functioned as expected, causing an error when the errant condition was detected. This error caused a partial outage of features for our customers.

Stabilization Steps

The error was confirmed to be resolved with the new code release that was part of the ongoing Deployment. The feature outage time was reduced by accelerating the deployment process.

Impact

All customers experienced an outage of several features on the DrChrono platform for approximately 20 minutes.

Corrective Actions

DrChrono has completed an analysis of methodologies to better detect these conditions in the future. We are developing methods to further shorten the deployment window time, which would more quickly resolve any occurrences of this issue in the future.

Posted Oct 21, 2021 - 07:12 PDT

Resolved
This incident has been resolved.
Posted Oct 06, 2021 - 14:44 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 06, 2021 - 14:39 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted Oct 06, 2021 - 14:31 PDT
Update
We are continuing to investigate this issue.
Posted Oct 06, 2021 - 14:19 PDT
Investigating
We are currently investigating reports of the DrChrono platform being inaccessible. We will post an update with additional information as soon as possible.
Posted Oct 06, 2021 - 14:18 PDT
This incident affected: drchrono.com, drchrono iPad EHR, drchrono iPad Check-In Kiosk Application, DrChrono Telehealth Platform, onpatient.com, and onpatient iPhone PHR.