Delay in claims submission and processing

Incident Report for DrChrono

Postmortem

Description

Starting Friday, 3/4/22 we saw a slowdown in 835 & 837 file processing. On Tuesday, 3/8/22, the RCM team also reported ERA verifications were behind.

Timeline

All times are PST

Date/Time	Activity
2022-03-04	A misconfiguration on a user account on the server led to a large number of open files. The file limit was increased via configuration management tools.
2022-03-08 11:00:00	Our team conducted some investigation on our cron-job settings on various aspects, including # of connections, total packets, ulimit settings, max connections, celery worker max memory and time-limits, and checked out celery memory usage in NewRelic. We freed up some memory by killing old Django shell instances.
2022-03-08 13:00:00	DevOps verified that there were no servers decommissioned or added from our last deployment schedule (3/4/22), except for the addition and then removal of the Cloudwatch logging info in one of the cron-jobs.
2022-03-08 14:00:00	The team began investigating jump errors that were found beginning 2/28/22. This was later determined to be irrelevant to the issue.
2022-03-08 15:00:00	Debugged with added NewRelic traces. Did not find any relevant identifiers of the slowdown.
2022-03-08 17:00:00	Identified one of the batch files for 837s was constrained.
2022-03-09 07:45:00	While looking at the load of each process running on the server, we found the batch_get_medical_reports process for 835s was running extremely slow. We killed the process and it restarted with the next schedule.
2022-03-09 07:48:00	Reviewed file sizes for Emdeon per RCM team’s callout that Emdeon processing slowed down recently. We pulled out a file from Mar 4 that was unusually large and was holding up other files from processing, then restarted the process for Emdeon.
2022-03-09 09:55:00	Checked current memory usage and identified Django shell processes to clean up.
2022-03-09 10:18:00	Reviewed the values we've been using to run the billing cron job. It was found to be set to the lowest priority. DevOps updated this and set to the highest priority.
2022-03-09 13:00:00	Investigated how much time Sentry read timeout errors are taking and how much CPU they’re consuming Reviewed Sentry configuration.
2022-03-11 15:00:00	DevOps team were able to get the new AWS Practice cron server set up. Practice team tested the daily_update_patient_last_appt_date cron on the new server.
2022-03-15 8:15:00	Status page update was added with file processing progress.
2022-03-15 9:58:00	Status page updated to identified status.
2022-03-15 10:00:00	Disabled clam anti-virus daemon on cron-02 which was at ~202GB CPU. After that, we saw an increase in CPU usage for the ERA cron and an influx in Claim submissions (up by 3K claims in 2 hours from 10am-12 pm PT).
2022-03-15 14:45:00	Practice cron server on the AWS instance has been created.
2022-03-16 08:30:00	DevOps, Payments & Practice team members met to sync on the ERA processing and ERA & EHR Cron Servers statuses. DevOps pointed out we are not using all the memory and CPU available on production-cron-02. The team agreed to perform a hot fix to add New Relic traces for the ERA/835, 277 and 837 processing because this parallelized process was working adequately through March 4, 2022 before we started to see a significant slowdown in processing.
2022-03-16 15:34:00	Status page update was added with file processing progress
2022-03-16 19:18:00	Hotfix was deployed to production.
2022-03-17 11:00:00	Identified the reason why the cron job stopped reporting on 3/4. There was an outdated copy of the chef recipe that enabled the New Relic agent. The changed was made to address issues with hitting open file limits on the cron-02 server.
2022-03-17 13:10:00	Status page update was added with file processing progress.
2022-03-17 14:25:00	DevOps identified a connection issue with Redis due to a missing firewall rule. The rule was added and communication started working. We started to see the `production-cron-02` server resources being used more.
2022-03-17	Stale configuration management changes were identified as the primary cause of the change in communication between the cron and Redis servers.
2022-03-18 14:19:00	Status Page updated to monitoring status with an update on file process progress.
2022-03-21 11:57:00	Status Page updated to resolved status.

Contributing Factor(s)

Application of a stale configuration caused a firewall rule allowing communication between the cron and Redis servers.
Cron jobs stopped logging to NewRelic on 03/04/22 that made it harder to see the cause of the issue.

Stabilization Steps

DevOps identified a connection issue with Redis due to a missing firewall rule. The rule was added and communication started working.

Impact

ERA/835, 277, and 837 Claim Submissions: We were backed up approximately 7 days on ERA/835, 277 acknowledgements, and 837 claim submission processing.

Corrective Actions

The stale configuration is managed via our legacy implementation. Once all of the servers are managed via the updated Salt configuration utility, the risk of maintaining multiple configuration management repos will be reduced significantly.

Posted Mar 30, 2022 - 17:06 PDT

Resolved

Our team has confirmed that this issue has been resolved. All claim submissions and processing (835, 837 and 277 files) are back to our normal schedule and the backlog of files have been fully processed. A post-mortem will be available via this status page incident within the week.

Please reach out to our support team if you are still observing any delays in processing.

Posted Mar 21, 2022 - 11:57 PDT

Monitoring

We have identified a connectivity issue with a service on one of our processing servers, which has now been resolved by our team as of 3/17.

The connectivity issue was the root cause of the delay in claims processing. The fix that was implemented is increasing the volume of claims being sent out, so claims submissions should completely catch up by this weekend. We will continue to monitor the claims batching process closely to ensure the issue has been rectified. For specific details on where we are with processing, please see below:

For claim submission (837 files):
Processing of files is ongoing - The backlog is expected to be completed over the weekend.

For ERA processing (835 files):
For Trizetto/Gateway, last processed date - UP TO DATE as of March 18th
For Change Healthcare/Emdeon - UP TO DATE as of March 18th

For status updates (277 files):
For Trizetto/Gateway, last processed date - UP TO DATE as of March 18th
For Change Healthcare/Emdeon - Processing of files is ongoing, including files from 3/9 and 3/11.

Should you have claims that remain unprocessed, please send us the claim number/s through support@drchrono.com.

We understand that this issue has caused some delays to your revenue processes, and we apologize for any stress and frustration you’ve felt as a result of this. If you have any further questions or issues, please do not hesitate to reach out.

Posted Mar 18, 2022 - 14:19 PDT

Update

For specific details on where we are with processing, please see below:

For claim submission (837 files):
Claims submitted since March 3, 2022 may be impacted. We have been working on getting more information around the percentage of claims impacted and we are optimizing our calculation script to ensure accuracy.

For ERA processing (835 files):
For Trizetto/Gateway, last processed date - 3/17/2022 (UP FROM PREVIOUS DAY)
For Change Healthcare/Emdeon, last processed date - 3/17/2022 (UP FROM PREVIOUS DAY)
This means that any ERA files received prior to the last processed dates have been processed.

For status updates (277 files):
For Trizetto/Gateway, last processed date - 3/17/2022 (UP FROM PREVIOUS DAY)
For Change Healthcare/Emdeon, last processed date - 3/9/2022, 3/11/2022 (SAME AS PREVIOUS DAY)
For Emdeon, we have a file for 3/9 that is now moved to its own session and files for 3/11 that are in processing.
This means that any 277 files received prior to the last processed dates have been processed.

We are working on providing more details as soon as possible outside these updates. Please continue to contact support if you have any examples outside these timeframes.

Posted Mar 17, 2022 - 15:10 PDT

Update

For specific details on where we are with processing, please see below:

For claim submission (837 files):
Claims submitted since March 3, 2022 may be impacted. We have been working on getting more information around the percentage of claims impacted and we are optimizing our calculation script to ensure accuracy.

For ERA processing (835 files):
For Trizetto/Gateway, last processed date - 3/15/2022
For Change Healthcare/Emdeon, last processed date - 3/9/2022
For Emdeon we have a file for 3/9 and files for 3/11 that are processing.
This means that any ERA files received prior to the last processed dates have been processed.

For status updates (277 files):
For Trizetto/Gateway, last processed date - 3/16/2022
For Change Healthcare/Emdeon, last processed date - 3/10/2022
This means that any 277 files received prior to the last processed dates have been processed.

Thank you for your patience as we continue to make changes to increase the processing power of our current servers. We appreciate the support tickets that have helped us identify the correct last processed dates. Please continue to contact support if you have any examples outside these timeframes.

Posted Mar 16, 2022 - 15:34 PDT

Identified

Our team has identified the issue and are working on expanding our server resources in order speed up the processing of files that are currently in the backlog.

Posted Mar 15, 2022 - 09:52 PDT

Update

Thank you for your patience as we continue to work on speeding up our claims & ERA processing! Specifically, the engineering and dev ops teams are working on freeing up more processing power with our current servers. For specific details on where we are with processing, please see below:

For claim submission (837 files):
Claims submitted since March 3, 2022 may be impacted. We are working on getting more information around the percentage of claims impacted and will share that as it becomes available.

For ERA processing (835 files):
For Trizetto/Gateway, last processed date - 3/13/2022
For Change Healthcare/Emdeon, last processed date - 3/10/2022
This means that any ERA files received prior to the last processed dates have been processed.

For status updates (277 files):
For Trizetto/Gateway, last processed date - 3/14/2022
For Change Healthcare/Emdeon, last processed date - 3/14/2022
This means that any 277 files received prior to the last processed dates have been processed.

We will post additional updates here each day until the issue is fully resolved. You may also find general claim information in our support article: https://support.drchrono.com/hc/en-us/articles/360059952791-Lifecycle-of-a-Claim

Posted Mar 15, 2022 - 08:15 PDT

Update

Our team is looking into expanding our server's memory capacity to speed up the ERA process.

We'll post another update once we have additional information to share.

Posted Mar 09, 2022 - 14:56 PST

Investigating

Our team is currently investigating the possible cause of the delay in the submission and processing of claims for both TriZetto and Emdeon clearinghouses.

We are looking into the cause and will provide more information as soon as it’s available.

Posted Mar 09, 2022 - 07:11 PST

This incident affected: drchrono.com.