No company is spared the pain of outages. But their impact can be mitigated by how resilient you build your business architecture. And who you choose to partner with can significantly determine how effective that will be.
The biggest and the best fall victim to outages and downtime. Publicly available information from sites like StatusGator.com reveals the number of incidents attributed to several major cloud security service providers, with high incidents reported as outages.
Within a 30-day period spanning this past October and November, Imperva reported:
By comparison, other cloud security service providers reported:
While no one is exempt from outages, the way provider architecture is designed can really bolster or detract from resilience.
Operational resiliency is no longer a pie-in-the-sky security ideal as we focus on more pressing issues. It is the pressing issue.
The EU’s Digital Operational Resilience Act (DORA) was explicitly designed to improve the digital resilience of financial entities. To this end, it requires covered entities to perform regular digital operational resilience testing and adhere to strict guidelines, so they can “withstand, respond to, and recover from” operational disruptions like cyberattacks or system failures.
Even a minute of unavailable services can cause startling damage to a company; when things pass the Maximum Tolerable Outage (MTO), it gets even worse.
An Oxford Economics study estimated that the cost of sixty seconds of downtime is at least $9,000. That’s not including the added costs of damage control as PR campaigns haemorrhage money trying to restore good standing.
Organizations need to know where to stop the bleeding, determining MTO for themselves. Hence, they understand the point at which the outage will cause an unacceptable level of damage financially, to stakeholders, or legally.
This can be evaluated by determining:
Understanding that outages, like breaches, are a matter of when, not if, means that planning takes on a practical aspect.
Run your teams through tabletop exercises to familiarize them with specific scenarios that may arise when systems go down. Start by examining the worst possible thing that could happen and start from there.
Finally, apart from the administration of the outage, which teams are responsible for getting key systems back online, and do they know how? Make sure they understand the emergency procedures to operate on in the interim, and keep a copy of your plan, emergency hierarchy, and call tree offline and in a place where everyone can access it. Do not assume an online copy will be available in a crisis.
Just like you interview carefully for the people who will keep the ship running, you also need to have the right experts in your corner when things start to sink. Revisiting the idea of the call tree, make sure you include technology partners like Thales when facing issues like these.
Thales helps teams proactively build resilient architectures that shift application and data security measures left, apply AI and behavioral detection to avoid attacks that lead to downtime, and develop formal Business Continuity and Disaster Recovery plans. Business Continuity Planning (BCP) is also a requirement of key compliance frameworks like NIST CSF and ISO 27001.
Lastly, have an alternative provider to fall back on in times of crisis. This second line of defense, just like the first, should be chosen wisely.
Some cloud security service providers have chosen to expand into cloud computing and edge services to compete with hyperscalers. While providing wider offerings, this complicates the architecture and operations required to stay resilient.
Others, like Imperva, have chosen to double down on quality and optimization, prioritizing maximum uptime by carefully managing things like availability and capacity.
Attacks and outages come with the territory. But with the right know-how, plan, and partners in place, they don’t have to have a long-term effect.