A data center outage is one of a business leader’s worst nightmares, but few take the time to adequately prepare for such emergency situations. Even with experienced, in-house IT teams monitoring processes and systems and optimizing hardware/software reliability, companies can still be caught unaware. Knowledge gaps, natural disasters, or malicious actors can all result in devastating consequences for business operations.
In this blog, we share a worst-case scenario one of our clients experienced and how having proactively invested in high availability helped them avoid potential damage that could have been caused by the data center outage. The client partnered with Argano to upgrade their instance of Software AG’s webMethods and complete a four-environment build-out to achieve high availability.
Building a high availability environment to minimize disruption
High availability (HA) is the ability of a system to operate continuously without failing for a designated period of time, ensuring minimal downtime. HA works to ensure a system meets an agreed-upon operational performance level that ensures systems are always available.
To achieve high availability for this client’s production environment, it needed to ensure code existed on servers in two data centers in two locations, hundreds of miles apart. The client’s foresight and investment in high availability paid an almost immediate return. Three months after the upgrade, the client experienced an unplanned data center outage where backup power also failed. Servers were hung and network connections were unavailable. The HA-enabled infrastructure empowered the client to turn the situation around without significant consequences.
The stakes are high
The recently upgraded instance of webMethods is a core system within the Identity Access Management (IAM) ecosystem that consolidates and distributes identity information, connects duplicate identities, and facilitates single sign-on when multiple identities exist. The client’s leadership team knew that high availability was crucial for a system as significant as this. If IAM systems fail:
- Users are unable to authenticate and log in to the core systems
- Processing updates do not occur between data and directory services
- Registration and authentication interfaces are unreachable
- Specialized processing for third-party user onboarding stops
In other words, a data center outage is a major disruption for companies that rely on this kind of infrastructure to run their businesses. Before the investment in the second data center, the outage at the original data center would have included serious liabilities, including:
- No access to business processes, feeds, and application programming interfaces (APIs)
- Inability for end users to authenticate and log into core business systems
- Blocked access to connected, hardcoded systems
- Insufficiently knowledgeable server team for data restoration
- Limited documentation of system architecture and set up
- Inefficient process for identifying and locating affected system components
Minimizing the risk of data and productivity loss
The client’s investment in high availability returned the expected benefits of the second data center. Having built a high availability environment and greater core IAM resilience, the client avoided major consequences:
- No API data was lost in the outage
- User productivity continued without any critical lag
- API processes were functional prior to full data center restoration
- API feed/scheduled jobs were caught up within 45 min
This client’s upgrade and investment in high availability resulted in some unexpected benefits as well. When the servers were down, the Argano team’s deep knowledge and ready documentation of the client’s server architecture enabled the engineers to focus on preparing to rerun processes and catch up as quickly as possible. The data center outage revealed great collaboration between the client’s server team and the system engineers. Everyone knew the system, knew each other, and had access to critical documentation, which greatly reduced user downtime.
As this scenario shows, a data center outage is a risk that can be mitigated with the right technology strategy. If you are considering building a high availability environment for your company’s core systems, connect with us today.