[Network] Increased internet network latencies and packet loss
All systems are back to nominal state. Failed router hardware (line card) has been replaced.
Today we experienced an internet connectivity blackout in CH-DK-2 of approximately 17 minutes. This blackout is related to the initial hardware failure we experienced over the course of last night.
An unfortunate human mistake during the hardware replacement led to a total loss of internet connectivity in the zone.
During the replacement operation, the wrong healthy line card has been mistakenly removed from our edge router. This line card was holding all the redundant backup connectivity for the zone. Everything was plugged back as soon as the technician realised the mistake. Unfortunately adding back a line card into the router goes through mandatory automatic hardware setup and testing steps before the network ports can be set back to online state. This process took several minutes to complete and finally restore back the connectivity. While the line card was initialising, we took the opportunity to replace the failed one in the other edge router in an attempt to speed-up connectivity recovery to whichever line card gets back up first.
Despite all the measures taken, this mistake led to a catastrophic connectivity loss. We are going to review our operational procedures and introduce safety checks In order to prevent a similar scenario from happening again.
We are deeply sorry for the inconvenience this outage has caused.
Should you have any questions feel free to get in touch with our support.
The Exoscale Team
Situation is back to nominal, we are monitoring the situation
Mitigation applied, traffic is starting to recover
Root cause has been identified, mitigation in progress
We are investigating massive connectivity issue on the zone
Incident is being reduce to minor level
We are expecting to get the part on 11th Sept during the day. Until that time our redundancy level will be N
The crash is related to a hardware issue. We are looking to get the required spare part on the site. Internet connectivity is fully available.
One of our core internet edge router experienced a crash. Impacted connectivity has automatically failed over alternate available paths.
We are investigating Increased internet network latencies and packet loss. We’ll post an update as soon as we have more information.