EUA

Microsoft reports 30% Azure Front Door capacity loss due to Kubernetes error

Microsoft
Foto: Microsoft - Foto: Jean-Luc Ichard/ istockphoto

A disruption in Microsoft Azure services occurred on Thursday morning, October 9, 2025, impacting access to the Azure Portal and Microsoft Entra across multiple regions. The company identified the issue as a 30% capacity loss in Azure Front Door instances, caused by failures in Kubernetes dependencies. The incident began at 7:40 UTC, with effects concentrated in Europe, the Middle East, and Africa.

Microsoft engineers detected the anomaly through internal monitoring and initiated immediate corrective actions. Regions like Northern Europe, Western Europe, and South Africa reported delays and timeouts in connections. Users faced difficulties logging into administrative portals and accessing Microsoft 365 resources.

The failure did not stem from a recent update but from underlying Kubernetes instances that collapsed. Microsoft ruled out deployments as the trigger and prioritized manual restoration.

Regions most affected by capacity loss

Clients in Western Europe experienced prolonged timeouts when accessing the Azure Portal. The Azure Front Door content delivery network, responsible for routing global traffic, saw instability in local points of presence.

South Africa and the Middle East faced similar impacts, with about 30% of instances inoperative. Services like load balancing and web application acceleration were directly affected.

Microsoft stated that continuous monitoring allowed quick identification of the problematic Kubernetes dependency.

Kubernetes
Kubernetes – Foto: Laylistique / Shutterstock.com

Engineering actions for restoration

Technical teams restarted the affected Kubernetes instances in coordinated phases. The process included checks to prevent failure propagation to other components.

  • Sequential node restarts to minimize downtime;
  • Real-time monitoring of capacity recovery;
  • Failover to redundant instances in adjacent regions.

Most services returned to normal within hours, with 98% of capacity restored by noon UTC.

Kubernetes dependency in Azure Front Door

Azure Front Door uses Kubernetes to orchestrate control and data components in its edge infrastructure. This setup enables global scalability but poses risks when instances fail simultaneously.

Kubernetes orchestrator failures can cascade to traffic routing and portal access. Microsoft noted that the design includes redundancies, but manual recovery was necessary in this case.

Engineers validated post-restart stability, ensuring traffic flows without further interruptions. The service handles billions of daily requests, making resilience critical for business operations.

Impacts on Microsoft 365 services

Microsoft 365 users reported connection errors in administrative tools. The Microsoft 365 Portal experienced delays, affecting tasks like subscription management.

  • Entra ID access showed timeouts in authentications;
  • Web applications like Outlook and Teams had intermittent slowdowns;
  • Service cancellations, such as Game Pass, were temporarily blocked.

The outage highlighted the interconnection between CDN and the productivity ecosystem.

Recovery and current service status

Microsoft executed a failover in the Microsoft 365 Portal to speed up restoration. By 12:33 UTC, only 4% of initially impacted clients remained restricted.

Engineers confirmed full recovery of most affected resources. Monitoring indicates stability, with full capacity in European regions.

The company plans an internal analysis to refine automatic recovery mechanisms for future Kubernetes instances.

Lessons from past cloud failures

Similar incidents occurred in July 2025, with Azure Front Door issues affecting global routes. That failure involved network configurations and led to mitigations in availability zones.

In September, an Azure Kubernetes Service outage impacted cluster operations across multiple regions. These events underscore the need for rigorous testing in orchestrators.

  • Throttling adjustments to control intra-service call spikes;
  • Internal tools for backlog drainage during incidents;
  • Proactive alerts via Azure Advisor for deprecated APIs.

The recurrence highlights the complexity of hybrid cloud architectures.