The site performance was degraded briefly because of a Redis node failure.
All times are PST.
Date/Time | Activity |
---|---|
2021-08-30 17:02:55 | Support team started receiving tickets related to site performance. |
2021-08-30 17:04:36 | DevOps team identified elevated database activity as a potential cause. |
2021-08-30 17:07:47 | Databases had started recovering from the increase in activity. |
2021-08-30 17:12:17 | An increase in the Redis cache response times was noted and was investigated further. |
2021-08-30 17:14:22 | Site had recovered while the investigation into the Redis cache continued. |
2021-08-30 17:23:29 | Redis node failure identified. |
2021-08-30 18:09:41 | Identified potential issue with code interacting with the Redis cache. |
There was a Redis node failure that led to a decrease in performance. Additionally several spots in the code executing potentially unnecessary queries against the Redis cache were identified.
The Redis node recovered on its own and site performance increased.
The majority of our customers would have experienced slowness from the increased database activity which is slower than the Redis cache.
Node recovery is a “self-healing” corrective action that occurred as expected. Additionally, the engineering team has an ongoing effort to identify and resolve potential areas in the codebase that could be problematic in a similar scenario. Finally, we will continue monitoring the infrastructure to expand the system's fault tolerance as necessary.