DrChrono Performance Issues

Incident Report for DrChrono

Postmortem

Issue Summary

Customers experienced delays with items such as lab orders and referrals showing up in the message center between 9 AM EST and 6:45 PM EST. Between 3:30 PM EST and 6:45 PM EST, customers also experienced intermittent 503 errors while attempting to log into the system as well as slow/sluggish behavior while using the web application. During the latter incident window, 5% of all requests resulted in 503 errors while a total of 22% of requests failed to complete successfully (some due to user canceling/refreshing).

How were customers impacted?

Customers experience delays in receiving items in the message center as well as general application slowness and at times, trouble logging in.

Root Cause

It was determined that a production hotfix deployed on Friday evening at 9 PM EST resulted in a flood of messages that overwhelmed our background processing capabilities for some activities in the application. As a result, our Amazon AWS-hosted queuing infrastructure tipped over under resource pressure from the increased load and took longer than expected to failover and recover. However, this flood was not experienced until hitting Monday traffic levels.

Resolution

The hotfix code was reverted and the message backlog was drained.

Mitigation steps planned/ taken

Update background processes that share the same processing queue to have dedicated queues.
Evaluate and improve the scalability and resiliency properties of our queue infrastructure.
Identify and remove unnecessary coupling of the web app to the message broker.
Improve rollback/revert efficiency
Improve performance testing in lower environments to better simulate production workloads for message queueing

Posted Sep 15, 2023 - 13:13 PDT

Resolved

This incident is resolved. Thank you for your patience as we continued to monitor system performance over the last day. A Root Cause Analysis (RCA) will be available via our status page soon.

Posted Sep 13, 2023 - 06:25 PDT

Update

DrChrono believes all issues have been resolved with intermittent service outages, slow loading, reports not arriving to the message center, referrals not loading, lab order failure, and other related issues. If you are still trying these actions and not seeing the expected results please reach out to support.

Posted Sep 11, 2023 - 16:21 PDT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Sep 11, 2023 - 16:11 PDT

Update

Our engineering team continues to investigate today's ongoing performance issues including reports from some users that they are not able to log in to DrChrono. Thank you for your continued patience. We apologize for this inconvenience.

Posted Sep 11, 2023 - 13:56 PDT

Update

We are continuing to investigate this issue with high priority. Additional updates will continue to be posted on our status page.

Posted Sep 11, 2023 - 11:20 PDT

Investigating

We have received customer feedback that these issues have not been completely mitigated. Our engineering team is continuing to investigate.

Posted Sep 11, 2023 - 10:05 PDT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Sep 11, 2023 - 09:05 PDT

Update

Our engineering team is continuing to work to implement a fix to resolve the performance issues. Thank you for your patience during this time.

Posted Sep 11, 2023 - 08:30 PDT

Identified

The issue has been identified and a fix is being implemented.

Posted Sep 11, 2023 - 07:48 PDT

Update

We are continuing to investigate this issue.

Posted Sep 11, 2023 - 07:23 PDT

Update

We are continuing to investigate this issue.

Posted Sep 11, 2023 - 07:20 PDT

Investigating

DrChrono is currently experiencing intermittent service outages, slow loading, reports not arriving to the message center, referrals not loading, lab order failure, and other related issues. Please be aware that we take this very seriously and our engineering team is actively investigating and working to resolve this situation as quickly as possible. We will provide another update via this status page as soon as possible.

Posted Sep 11, 2023 - 07:19 PDT

This incident affected: drchrono.com and drchrono iPad EHR.