Performance issue in login service. Some users are unable to connect.
Incident Report for SuperOffice
Postmortem

Date: May 14th

Start Time: 09:15 AM

End Time: 10:40 AM

Impact: Webserver performance degradation, Authentication service instability, and user login problems.

Summary

On May 14th, an update to our security infrastructure involving the renewal of a SSL certificate on the frontend load balancer inadvertently triggered a rebalancing process across backend load balancers. This unexpected behavior led to an excessive load on our authentication service, rendering it unresponsive and preventing all users from accessing their work.

Timeline of Events:

  • Renewal of SSL Certificate: The certificate was successfully renewed on the front-end load balancer.
  • Unintended Consequences: Subsequent to the renewal, the backend load balancer initiated a rebalance of customer loads.
  • Authentication Service Overload: The rebalance resulted in a heavy load on the authentication service, leading to a system-wide inability for user operations.
  • Resolution: The issue was addressed by performing a restart of the authentication service.
  • Service Restoration: Normal service functionality was restored at approximately 10:50 AM.

Corrective Measures:

To prevent future occurrences of this nature, we are:

  • Implementing additional monitoring alerts for early detection of abnormal load patterns.
  • Reviewing the change-management procedures to ensure better handling of critical infrastructure updates.
  • Conducting a thorough investigation to understand the interdependencies between service components during maintenance tasks.

We apologize for any inconvenience caused and appreciate your understanding as we continuously strive to improve our services.

Posted May 29, 2024 - 16:43 CEST

Resolved
This incident has been resolved.
Posted May 14, 2024 - 10:15 CEST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 14, 2024 - 09:58 CEST
Identified
The issue has been identified and a fix is being implemented.
Posted May 14, 2024 - 09:49 CEST
Investigating
We are currently investigating this issue.
Posted May 14, 2024 - 09:41 CEST