The World Health Organization has signalled that the epicentre of the COVID-19 outbreak is now Europe. These are challenging times for us all and, as a result, the demands placed on our cloud services are such that it is conceivable that instances of service incidents, such as an outage, may be experienced. However, it is important to note that Microsoft has advanced business continuity plans in place to minimize disruptions to our business operations, including those of our customers and partners. Microsoft’s global, all-hazards approach to its business continuity and crisis management planning includes a Pandemic Response Plan that connects to the enterprise response hierarchy.
How are you ensuring that business isn’t disrupted across your products and services?
Microsoft has a comprehensive service continuity plan to keep our services running smoothly during events such as COVID-19. The plan accounts for increased usage, the ability to remotely manage our services and a geographically diverse engineering workforce to support those services.
Does Microsoft have a Business Continuity Plan, and does it include Pandemic impact?
Microsoft has a global, all-hazards approach to its business continuity and crisis management planning. This includes a Pandemic Response Plan that connects to the enterprise response hierarchy.
Is working remotely part of Microsoft’s response strategy?
Microsoft’s continuity plan comprises multiple impact scenarios and response strategies that provides remote capabilities.
Are contingency plans in place to ensure all committed support Service Level Agreements (SLAs) and recovery time objectives (RTOs) are met?
There is no expected impact to Azure services. At this time, all of our datacenters are operating normally. In the unlikely event of an outage, we have standard operating procedures for how we manage service impacting events. There is nothing preventing us from executing against these. We are monitoring the situation closely and taking extra precautions to protect the health of our employees and ensure business continuity.
If any cloud services experience a service incident such as an outage, we follow our standard, well-established communications practices to inform impacted customers and partners. This is mainly achieved through the Service Health experience in the Azure portal and/or in the M365 admin center, depending on which services are impacted.
How can you assure Teams reliability at this time?
You and your team depend on our tools to stay connected and get work done. We take that responsibility seriously, and we have a plan in place to make sure services stay up and running during impactful events like this. Our business continuity plan anticipates three types of impacts to the core aspects of the service:
• Systems: When there’s a sudden increase in usage, like the surge we recently saw in China.
• Location: When there’s an unexpected event that is location-specific, such as an earthquake or a powerful storm.
• People: When there’s an event that may impact the team maintaining the system, like the COVID-19 outbreak in the Puget Sound area.
We’ve recently tested service continuity during a usage spike in China. Since January 31, we’ve seen a 500 percent increase in Teams meetings, calling, and conferences there, and a 200 percent increase in Teams usage on mobile devices. Despite this usage increase, service has been fluid there throughout the outbreak.
Our approach to delivering a highly available and resilient service centers on the following things.
• Active/Active design: In Microsoft 365, we are driving towards having all services architected and operated in an active/active design which increases resiliency. This means that there are always multiple instances of a service running that can respond to user requests and that they are hosted in geographically dispersed datacenters.
• Reduce incident scope: We seek to avoid incidents in the first place, but when they do happen, we strive to limit the scope of all incidents by having multiple instances of each service partitioned off from each other.
• Fault isolation: Just as the services are designed and operated in an active/active fashion and are partitioned off from each other to prevent a failure in one from affecting another, the code base of the service is developed using similar partitioning principles called fault isolation. continuity plan. For more information, please visit the Microsoft 365 blog