Disruption in Service

Incident Report for DrChrono

Postmortem

Incident Overview

On July 16th, 2025, the DrChrono application experienced a temporary service outage following a scheduled release the evening prior. The release itself was successful; however, during post-release monitoring the morning of July 17th we observed slightly elevated memory pressure across application servers. This memory pressure was not causing end user experience impact but was identified due to increased observability following the deployment. In response, a proactive configuration change was made to improve memory usage. Unfortunately, this adjustment unintentionally restricted the system’s ability to allocate sufficient resources for application processes, resulting in a temporary outage. Due to this occurring during core business hours, it took some time to restore enough resources to support application traffic, but services returned to normal operation once resources were restored.

How We Responded

The configuration change was reverted, and traffic was temporarily paused to allow the system to recover. Once the application was confirmed healthy traffic was resumed and the system became fully available.

Corrective and Preventative Actions

To prevent recurrence, we are taking the following steps:

Standard Operating Procedure (SOP) Enhancements: We are updating and reinforcing our internal SOPs to emphasize slow rollout and verification testing when applying infrastructure setting changes prior to rollout for all traffic – even when thought to be safe or simple.
Warm Resources on Standby: We have created and will continue to maintain a pool of separate warm servers so that we can restore previous configurations more quickly as well as spin up needed resources faster in cases of high traffic.

We know that many of you rely on DrChrono every day to support your operations. We sincerely apologize for this disruption and are committed to strengthening our systems to prevent it from happening again. Thank you for your patience and continued trust.

Posted Jul 18, 2025 - 14:12 PDT

Resolved

This incident has been resolved.

Posted Jul 16, 2025 - 10:20 PDT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Jul 16, 2025 - 10:03 PDT

Update

We are continuing to work on a fix for this issue. The next update will be in 20 minutes.

Posted Jul 16, 2025 - 09:44 PDT

Update

We are continuing to work on a fix for this issue. The next update will be in 20 minutes.

Posted Jul 16, 2025 - 09:17 PDT

Update

We are continuing to work on a fix for this issue.

Posted Jul 16, 2025 - 08:47 PDT

Identified

Our team has identified reports of system slowness across various areas of the DrChrono application. We are working to resolve this issue.

Posted Jul 16, 2025 - 08:24 PDT

This incident affected: drchrono.com.