Disaster Recovery (DR) provides infrastructure recovery so
systems/applications can once again function following circumstances that
prevent extended access to mission-critical technology systems. DR
focuses on the IT infrastructure availability necessary to support critical
business functions. DR is part of a larger BC (Business Continuity) effort,
which involves keeping all essential aspects of a business functioning following
Alternate Data Center
The Commonwealth Office of Technology (COT) has made significant facility and DR infrastructure investments to ensure Commonwealth of Kentucky business continuity in the event of a disaster. (2,000) square feet of secure, caged floor space is dedicated to COT at the CyrusOne data in Florence KY, maintaining (50) racks of server, storage, and network equipment. The ADC is a node on COT's 100Gb fiber ring, with (2) separate network paths back to Frankfort's main Commonwealth Data Center (CDC). For power and HVAC, CyrusOne guarantees a 150kW electrical commitment, with (2) generators, N+1 cooling, (2) grid connections, and battery transition. CyrusOne requires (2)-factor physical security for 24x7 facility entrance, including badge + bio (fingerprint). From a logical security compliance perspective, the ADC provides:
- SSAE 16 (SOC I type II)
- PCI DSS (sec 9 & 12)
- ISO 27001
Recovery Point Objective (RPO) is a measurement of time from the failure, disaster or comparable loss-causing event. RPO limits how far to roll back in time, and defines the maximum allowable amount of lost data measured in time from a failure occurrence to the last valid backup.
Recovery Time Objective (RTO) relates to downtime and represents how long it takes to restore from the incident until normal operations are available to users.
COT offers recovery targets of 30 minute RPO and 24 hour RTO.
Participate in COT DR
For new servers, participation in DR is a question asked on the original New Server Request form. Please include the COT billing number for the DR server(s). For existing servers, an authorized Agency requestor submits a request to the Commonwealth Service Desk, providing the server name(s) to be included in Disaster Recovery.
Agency Disaster Recovery Participation Requirements
To ensure DR platform functionality, COT needs to be aware of all updates affecting Agency systems/applications. Unknown server, software, network or other infrastructure changes may negatively affect successful fail over to ADC platforms when necessary. For successful application failover in the DR environment, all servers required in the Production environment must participate in the DR program. COT invites and strongly encourages all application owners to participate in annual DR Test Exercises. There are no costs associated with DR testing. Currently, all servers participating in DR must be virtual.
Disaster Recovery Services
While COT DR is an optional service, it is highly recommended all Production workloads be included. Backup data for all servers is stored offsite, but only those participating in DR are guaranteed server resources.
- Disaster Recovery
Servers covered under DR provide the mission-critical applications Agency's must have quickly restored in the event of a declared disaster event. DR servers and their data continuously replicate from the CDC to the ADC at 5-second intervals. Nightly backups also replicate to the ADC. ADC servers will quickly restore system/application functionality at regular Production environment performance levels.
DR level costs include a quantity of (2) Production server charges (WN50, LX10, UX10, or SQ10), plus double the additional RAM and CPU charges associated with the server (WN60/70, LX20/30, UX30/40, SQ20/30).
Backup-Only DR is for Agency applications/systems that can experience significant downtime waiting for infrastructure restoration. This scenario relies on nightly data backups (ST90) that will be restored in an alternate computing environment should the request be made. Procuring server/compute infrastructure must occur, software installed, backup data loaded, with additional configuration needed. Application recovery time estimates vary from 30 – 60 days from the time of disaster until system recovery.
Additional charges for Backup-Only DR apply.
System/application downtime, revenue loss, public safety and welfare are calculations to consider when deciding whether to cover servers under DR.
COT conducts (2) primary DR Test events per year. A spring test focusing on mainframe systems and their associated distributed systems, and a fall test for all other distributed systems. COT invites and encourages all DR participants to attend at no cost. For Backup-Only customers, additional charges apply should testing be requested and available. A month before testing, Agencies verify servers and provide test plans for the application/system being tested. It is critical agencies participate in DR testing. This ensures infrastructure contingency planning efforts function properly should a failover event occur. Following testing, COT shares lessons learned with agencies and joint efforts established to correct any DR plan shortcoming.
COT provides individualized DR testing for larger applications within State government, giving agencies the ability to focus on specific system recovery efforts. CHFS has successfully tested the Commonwealth's largest application the last two years in this scenario. Please contact COT and the DR Coordinators for additional information.
COT is using OpsPlanner Disaster Recovery and Business Continuity software. OpsPlanner's multi-tenant database allows direct Agency input. Disaster Recovery, Business Continuity, Business Impact Analysis, Risk Assessment, system dependencies, and other related items are contained within a single system. OpsPlanner software provides DR audit requirement assistance and the ability to collaborate more easily with their internal departments and COT with DR-specific information.
Agencies are responsible for developing BC plans for mission-critical functions in the event COT services are not available. Kentucky Emergency Management provides guidance based on National Institute of Standards and Technology (NIST) guidance for developing Business Continuity Plans (BCP) as well as Continuity of Operations Plans (COOP):
Kentucky Emergency Management Planning Information and Resources
NIST Special Publication 800-34 Rev. 1