Disaster Recovery (DR) provides
infrastructure recovery so systems/applications can once again function following
circumstances that prevent extended access to mission-critical technology
systems. DR focuses on IT and associated
systems that support critical business functions, as opposed to Business Continuity
(BC). DR is part of the larger BC effort, which involves keeping all essential
aspects of a business functioning despite significant disruption.
Alternate Data Center
(ADC)
The
Commonwealth Office of Technology (COT) has made significant facility and DR
infrastructure investments to ensure Commonwealth of Kentucky business
continuity in the event of a disaster. (2,000) square feet of secure, caged floor
space is dedicated to COT at the CyrusOne data in Florence KY, maintaining (50)
racks of server, storage, and network equipment. The ADC is a node on COT’s
100Gb fiber ring, with (2) separate network paths back to Frankfort’s main
Commonwealth Data Center (CDC). For
power and HVAC, CyrusOne guarantees a 150kW electrical commitment, with (2)
generators, N+1 cooling, (2) grid connections, and battery transition. CyrusOne requires (2)-factor physical security for 24x7 facility
entrance, including badge + bio (fingerprint). From a logical security compliance
perspective, COT is provided:
·
SSAE
16 (SOC I type II)
·
PCI
DSS (sec 9 & 12)
·
HIPAA
·
ISO
27001
·
FISMA
To
Participate in COT DR
An authorized Agency requestor
should submit a request to the Commonwealth Service Desk asking for the
server(s) to be placed into one of the (3) DR categories listed below. For new
servers, this question should have been answered on the original New Server
Request form. Please include the COT billing number that the DR server(s) will
be billed to.
Agency
Disaster Recovery Participation Requirements
COT must be made aware of all
architectural updates affecting Agency systems/applications. Any type of
server, software, network or other infrastructure change may negatively impact
your ability to successfully fail over to ADC platforms when necessary. For
successful application failover in the DR environment, all servers required in
the Production environment must participate in the DR program. Application
owners are invited and encouraged to participate in annual DR Test Exercises. There
are no costs associated with DR testing. Currently, all servers participating
in DR must be virtual.
Disaster Recovery Services
COT
offers agencies (3) levels of DR services to choose from. Decisions must
consider how much system/application downtime can be incurred before
restoration in the event that Disaster Recovery procedures are initiated.
Hot DR is for mission-critical
applications that have top priority to be restored. Hot Production servers and
data storage are continuously replicated from the CDC to the ADC at 5-second
intervals. Nightly backups are also replicated to the ADC. ADC servers will
quickly restore system/application functionality at regular Production
environment performance levels.
Hot DR level costs include a
quantity of (2) Production server charges (WN50, LX10, UX10, or SQ10), plus
double whatever additional RAM and CPU charges associated with the server (WN60/70,
LX20/30, UX30/40, SQ20/30).
Warm DR allows Agency
systems/applications to be recovered, but does not guarantee regular Production
environment performance. While having a dedicated server allocated for DR, Warm
does not dedicate Production environment RAM and CPU allocations. Once
recovered, system/application performance will function at degraded levels,
perhaps significantly. Agencies will experience slower system and end-user
response times.
Warm DR level charges include a
quantity of (2) for the Production server. ADC RAM and CPU charges will be
billed only after Warm DR is initiated. COT may choose which Warm DR systems can
participate in annual DR Test Exercises.
Cold DR should only be used for
Agency applications/systems that can experience significant downtime waiting
for infrastructure restoration. This scenario relies on nightly data backups
(ST90) that will be restored in an alternate computing environment should the
request be made. Server infrastructure may have to be procured, software
installed, backup data loaded, with additional configuration needed.
Application recovery time estimates could vary up to 60 – 90 days from the time
of disaster until system recovery.
Cold DR level charges will be assessed
at the time an agency requests recovery.
Testing/DR
Exercises
COT conducts (2) primary DR
Test events per year; a spring test focusing on mainframe systems and their associated
distributed systems, and a fall test for all other distributed systems. All HOT
and WARM DR participants are automatically invited and strongly encouraged to
participate at no charge. Hot DR is given priority over other DR levels. Additional
charges will be incurred by Cold DR agencies, if testing is requested and
available. Prior to testing, Agencies provide test plans and application/system
testers. It is critical agencies participate in DR testing. This ensures
infrastructure contingency planning efforts function properly should a failover
event occur. Following testing, lessons learned are compiled, shared with
agencies, and joint efforts established to correct any DR plan shortcoming.
COT also provides
individualized DR testing for some of the larger applications within State government.
Please contact COT and the DR Coordinators for additional information.
Business
Continuity Services
Agencies are responsible for
developing BC plans for mission-critical functions in the event that COT
services are not available. Kentucky Emergency Management provides guidance
based on NIST guidance for developing Business Continuity Plans (BCP) as well
as Continuity of Operations Plans (COOP).
2.
NIST Special Publication 800-34 Rev. 1