Azure AD - Service availability issues

Incident Report for Nutanix

Postmortem

Please refer to the summary of the incident and the preliminary root cause provided by Microsoft on their incident history page:
https://status.azure.com/en-us/status/history/

They have also indicated that a full Post Incident Report will be posted within 72 hours.

Summary of Impact to Frame Customers: Between approximately 21:25 UTC on Sep 28 2020 and 00:23 UTC on Sep 29 2020, a subset of customers using services in the Azure Public and Azure Government clouds may have encountered errors performing operations for a number of Microsoft or Azure services. For Frame customers using Azure AD for user authentication, this meant that some users may not have been able to login and access applications and desktops. For a subset of customers using Azure infrastructure to host Frame workload VMs, operations which require authentication such as provisioning and starting VMs may have been impacted. The Frame platform automatically handled these Azure error conditions and retried affected operations until Azure services recovered. No manual intervention was required by Nutanix Frame operations teams to recover affected customers once Azure services recovered.

Posted Sep 29, 2020 - 11:24 PDT

Resolved

Following the https://portal.office.com/servicestatus page, we can see that all of Microsoft's services are operational. According to Microsoft, the final update for the event is:
End time: Tuesday, September 29, 2020, at 2:25 AM UTC
Posted Sep 28, 2020 - 20:00 PDT

Update

We continue to monitor the situation and continue to see normal operations returning for accounts using Azure services. Microsoft's status page continues to indicate recovery as well. Note that other Microsoft services including Office365/Outlook have been impacted as well - you can see status for these services here: https://portal.office.com/servicestatus
Posted Sep 28, 2020 - 18:47 PDT

Monitoring

Microsoft has reported that: "Engineering teams have applied mitigation steps and customers in both the Azure Public and Azure Government clouds should see signs of recovery at this time." We have seen recovery across Frame accounts using Azure and we will continue to monitor the situation. For the latest updates, please continue to reference: https://status.azure.com/en-us/status
Posted Sep 28, 2020 - 17:54 PDT

Update

We are continuing to monitor the situation. Microsoft has expanded the scope of their status page to cover more broad service impacts on Azure Public and Government clouds. However, we have seen improvements in Azure service response with VMs now starting in most regions. We will continue to monitor the services and Azure's overall status.
Posted Sep 28, 2020 - 17:16 PDT

Update

Frame customers using Azure or Azure Government may experience issues booting VMs or starting new sessions. We have also received reports of other Microsoft services being impacted. Note that users already in a Frame session are not impacted. Also, customers running on AWS, GCP or AHV are not impacted.
Posted Sep 28, 2020 - 15:31 PDT

Update

We are seeing impacts to other Azure Services as well. Please continue to monitor our status page and the Azure status page for updates.
Posted Sep 28, 2020 - 15:22 PDT

Update

This issue also affects customers using Xi Government Cloud with Azure Active Directory.
Posted Sep 28, 2020 - 15:14 PDT

Identified

Microsoft Azure has reported that customers using Azure Active Directory may experience HTTP 503 errors when accessing the Azure portal. More information can be found on the Azure status page at https://status.azure.com/en-us/status

If you are using Azure AD with Frame please refer to the site above for updates.
Posted Sep 28, 2020 - 15:13 PDT