Date: May 14th
Start Time: 09:15 AM
End Time: 10:40 AM
Impact: Webserver performance degradation, Authentication service instability, and user login problems.
Summary
On May 14th, an update to our security infrastructure involving the renewal of a SSL certificate on the frontend load balancer inadvertently triggered a rebalancing process across backend load balancers. This unexpected behavior led to an excessive load on our authentication service, rendering it unresponsive and preventing all users from accessing their work.
Timeline of Events:
- Renewal of SSL Certificate: The certificate was successfully renewed on the front-end load balancer.
- Unintended Consequences: Subsequent to the renewal, the backend load balancer initiated a rebalance of customer loads.
- Authentication Service Overload: The rebalance resulted in a heavy load on the authentication service, leading to a system-wide inability for user operations.
- Resolution: The issue was addressed by performing a restart of the authentication service.
- Service Restoration: Normal service functionality was restored at approximately 10:50 AM.
Corrective Measures:
To prevent future occurrences of this nature, we are:
- Implementing additional monitoring alerts for early detection of abnormal load patterns.
- Reviewing the change-management procedures to ensure better handling of critical infrastructure updates.
- Conducting a thorough investigation to understand the interdependencies between service components during maintenance tasks.
We apologize for any inconvenience caused and appreciate your understanding as we continuously strive to improve our services.