Starting on 05/12 at 07:18PST, customers were unable to create, modify and archive tasks.
All times are in PST.
Date/Time | Activity |
---|---|
2022-05-06 | A code change to the tasks module is included in the staging release and is deployed to the staging environment. |
2022-05-11 | During the weekly deployment meeting, this code change on the staging environment is noted as having failed testing. Since this code change is behind a feature flag, it is concluded that it should not impact production traffic. |
2022-05-12 03:00 | An engineering team member starts noticing elevated error rates in Sentry and begins investigating. |
2022-05-12 03:53 | A hotfix is created to fix the issue related to the tasks module. |
2022-05-12 07:18 | The support team begins to receive tickets and notifies the engineering team. |
2022-05-12 08:00 | Other engineering team members review and approve the hotfix. |
2022-05-12 08:11 | A member of the DevOps team starts deploying the hotfix to the staging environment. |
2022-05-12 08:21 | An incident is posted to the status page. |
2022-05-12 08:29 | Deployment of the hotfix in the production environment is approved by engineering leadership as soon as it is confirmed to fix the issue. |
2022-05-12 08:31 | The staging environment now has the hotfix deployed and is available for testing. |
2022-05-12 09:09 | The QA team and the mobile team confirm that the hotfix works as intended on staging. |
2022-05-12 09:13 | A member of the DevOps team starts deploying the hotfix to the production environment. The deployment is intentionally slowed down to prevent any disruption to production traffic. |
2022-05-12 11:22 | Production deployment is finalized. |
2022-05-12 11:43 | Status page updated to monitoring. |
2022-05-12 12:08 | Status page updated to resolved. |
A code change had an issue that did not show up in testing or smoke tests while another issue in the same code change failed testing but was behind a feature flag. In addition, it took some time for the hotfix PRs (Pull Requests) to get approved which delayed getting the fix rolled out.
The engineering team created a hotfix PR fixing the root cause which was deployed to all servers.
Customers were unable to create, modify, assign or archive tasks.
The DevOps team will look more closely at failed tickets during deployment meetings as they can be signs of further issues.
The Engineering team will review failed tickets with feature flags to verify all code changes are behind the feature flag, and if not, review the changes with QA to verify they were tested adequately. If they were not, the engineering team will revert the PR from the deployment.
The QA team will add test cases for creating, assigning, modifying and archiving Tasks from iOS and DrChrono Web.