Disaster Recovery (DR) provides infrastructure recovery so systems/applications can once again function following circumstances that prevent extended access to mission-critical technology systems. DR focuses on the IT infrastructure availability necessary to support critical business functions. DR is part of a larger Business Continuity (BC) effort, which involves keeping all essential aspects of a business functioning following significant disruption.
Alternate Data Center (ADC)
The Commonwealth Office of Technology (COT) has made significant facility and DR infrastructure investments to ensure Commonwealth of Kentucky business continuity in the event of a disaster. (2,000) square feet of secure, caged floor space is dedicated to COT at the CyrusOne data in Florence KY, maintaining (50) racks of server, storage, and network equipment. The ADC is a node on COT’s 100Gb fiber ring, with (2) separate network paths back to Frankfort’s main Commonwealth Data Center (CDC). For power and HVAC, CyrusOne guarantees a 150kW electrical commitment, with (2) generators, N+1 cooling, (2) grid connections, and battery transition. CyrusOne requires (2)-factor physical security for 24x7 facility entrance, including badge + bio (fingerprint). From a logical security compliance perspective, the ADC provides:
- SOC 1, Type II and ISAE 3402 reports
- AT 101/SOC 2, Type II reports
- ISO27001 and ISO22301 certifications
- PCI DSS assessment as Level 1 Service Provider validating Section 9 and 12 controls
- HIPAA/HITECH assessed as required by colocation providers with regard to physical infrastructure and control to protect electronic protected health information (ePHI)
- Federal Information Security Management Act (FISMA) assessed to ensure compliance with the applicable controls from NIST 800-53
- Federal Financial Institutions Examination Council (FFIEC)
- CSA Security, Trust and Assurance Registry (CSA STAR)
- The Health Information Trust Alliance (HITRUST) Common Security Framework (CSF)
RPO/RTO
Recovery Point Objective (RPO) is a measurement of time from the failure, disaster or comparable loss-causing event. RPO limits how far to roll back in time, defining the maximum allowable amount of lost data measured in time from a failure occurrence to the last valid backup.
Recovery Time Objective (RTO) relates to downtime and represents how long it takes to restore from the incident until normal operations are available to users. COT offers recovery targets of 30-minute RPO and 24-hour RTO.
To Participate in COT DR
For new servers, participation in DR is a question asked on the original New Server Request form. Please include the COT billing number for the DR server(s). For existing servers, an authorized Agency requestor submits a request to the Commonwealth Service Desk, providing the server name(s) to be included in Disaster Recovery.
Agency Disaster Recovery Participation Requirements
To ensure DR platform functionality, COT needs to be aware of all updates affecting Agency systems/applications. Unknown server, software, network or other infrastructure changes may negatively affect successful fail over to ADC platforms when necessary. For successful application failover in the DR environment, all servers required in the Production environment must participate in the DR program. COT invites and strongly encourages all application owners to participate in annual DR Test Exercises. There are no costs associated with DR testing. Currently, all servers participating in DR must be virtual.
Disaster Recovery Services
While COT DR is an optional service, it is highly recommended all Production workloads be included. Backup data for all servers is stored offsite, but only those participating in DR are guaranteed server resources.
Servers covered under DR provide the mission-critical applications Agencies must have quickly restored in the event of a declared disaster event. DR servers and their data continuously replicate from the CDC to the ADC at 5-second intervals. Nightly backups also replicate to the ADC. ADC servers will quickly restore system/application functionality at regular Production environment performance levels.
DR costs include an additional server, RAM, and CPU charge identical to that of the Production resource. New for FY23 are DR-specific COT Rated Service categories. These are WN55, LX15, UX15, and SQ15 for server and WN65/75, LX35/25, UX35/45, and SQ35/25 for RAM/CPU.
Backup-Only DR is for Agency applications/systems that can experience significant downtime waiting for infrastructure restoration. This scenario relies on nightly data backups (ST90) that can be restored in an alternate computing environment should the request be made. Procuring server/compute infrastructure must occur, software installed, backup data loaded, with additional configuration needed. Application recovery time estimates vary from 30 – 60 days from the time of disaster until system recovery.
Additional charges for Backup-Only DR apply.
System/application downtime, revenue loss, public safety and welfare are calculations to consider when deciding whether to cover servers under DR.
Testing/DR Exercises
COT conducts (2) primary DR Test events per year: a spring test focusing on mainframe systems and their associated distributed systems, and a fall test for all other distributed systems. COT invites and encourages all DR participants to attend at no cost. For Backup-Only customers, additional charges apply should testing be requested and available. A month before testing, Agencies verify servers and provide test plans for the application/system being tested. It is critical agencies participate in DR testing. This ensures infrastructure contingency planning efforts function properly should a failover event occur. Following testing, COT shares lessons learned with agencies to help correct any DR plan shortcoming.
COT also provides individualized DR testing for larger applications within State government, giving agencies the ability to focus on specific system recovery efforts. CHFS has successfully tested the Commonwealth’s largest application the last three years in this scenario. Please contact COT’s DR Coordinators for additional information.
Business Continuity Services
OpsPlanner
COT uses OpsPlanner Disaster Recovery and Business Continuity software. OpsPlanner’s multi-tenant database allows direct Agency input. Disaster Recovery, Business Continuity, Business Impact Analysis, Risk Assessment, system dependencies, and other related items are contained within a single system. OpsPlanner software provides COT DR audit requirement assistance and the ability to collaborate more easily with internal departments and the agency as a whole about DR information.
Additional Resources
Agencies are responsible for developing BC plans for mission-critical functions in the event COT services are not available. Kentucky Emergency Management provides guidance based on National Institute of Standards and Technology (NIST) guidance for developing Business Continuity Plans (BCP) as well as Continuity of Operations Plans (COOP):
Kentucky Emergency Management Planning Information and Resources
NIST Special Publication 800-34 Rev. 1