The incident has been resolved.
The root cause is related to a configuration change rollout against our CH-GVA-2 hypervisor fleet which unexpectedly impacted the host private networking related configuration.
Many of our components and services rely on the availability of the private networking and therefore were impacted by this outage.
The extended recovery time is due the fact that the incident cut us off some access to key systems for which we had to make use of alternate ways to revert to a working configuration across the whole fleet.
We are sorry for the inconvenience caused by this incident.
Services are back to nominal. We are monitoring the situation
Compute API has been restarted. We are monitoring the situation
We are performing a stop and start of the compute API. All current pending jobs will go into error state
We are still working on mitigating the latency on the compute API
We are still experiencing and working on increased latency with the compute API to process jobs
Services have been recovered, we are monitoring the situation
Service recovery is still ongoing
Services are currently recovering. Private networking should now be back available
Mitigation are currently applied. Service should be restored in a matter of minutes
At least the following services are currently impacted:
- Compute API globally
- Compute private networking CH-GVA-2
- NLB CH-GVA-2
- SKS API CH-GVA-2
Compute private networking is also impacted by this outage
Our Support ticketing system is also impacted by this outage therefore.
We are actively working on restoring the service ASAP
The root caused has been identified. We are working on restoring the impacted services. No service recovery ETA is available at the moment.
We are investigating several service being unavailable. Impacted service list has been extended