UMUC
Abstract
Modern organizations increasingly
rely on information technology (IT) to conduct their daily activities. As a result, ensuring the resiliency of this
asset has become a critical component for most enterprises. From hurricanes and power outages to
cyberattacks, public agencies and private businesses alike face a myriad of
threats from both manmade and natural causes.
To mitigate these risks, it is imperative that organizations devise an
appropriate contingency plan that incorporates backups and safeguards for IT
infrastructure. The following paper
outlines the various planning steps, recovery operations and testing
requirements necessary to ensure a successful business continuity plan with a
24-month proposal to adequately test the preparations. Although maintaining a comprehensive
contingency plan requires a significant expenditure of personnel, equipment,
and production costs, not developing a backup often proves far more costly.
Introduction
According
to a study conducted by McGladrey and Pullen LLP, 43% of companies that
experience a disruptive event lasting 10 days never reopen. 51% of firms continue to operate for up to
two years following a major data outage, with only 6% of businesses surviving
in the long-term (Tittel & Korelc, 2013).
Given the necessity for business continuity as well as the increased
dependence on IT systems and services, ensuring the availability of these resources
has become paramount in the contingency planning cycle. While business continuity planning should be
tailored for each organization, a number of similarities exist throughout all
plans with the Disaster Recovery Institute International (DRII) identifying
these common tasks (Vacca, 2009). Among
these steps include planning activities such as conducting a business impact
analysis (BIA) and risk assessment.
Organizations must also determine recovery options by identifying
relevant risks, selecting appropriate strategies and developing a comprehensive
contingency plan. Finally, continuity
operations must also incorporate a verification component. This includes personnel training, periodic
testing, and maintenance of the plan as changes in the organizational mission
or structure occurs. Each step of the
contingency planning process is important to the overall success of the
enterprise. Ensuring continuity
throughout a disaster requires appropriate resources allocated to critical
systems within an organization which necessitates a strong commitment by a
firm’s senior management.
Planning
The first step in
designing a relevant business contingency arrangement is planning. This stage requires an organization to weigh
the returns from any proposed safeguards.
The frequency and severity of an outage should be assessed when
determining the amount of resources that should be devoted to this process. Applying these considerations to IT resources
may be difficult for some firms. It is
often complicated to assess the exact level of impact an intrusion can have on
a firm with cyberattacks ranging from amateur denial-of-service (DoS) attacks
to advanced persistent threats (APTs) perpetrated by nation states. Moreover, determining the rate of occurrence
of cyberattacks is difficult to estimate for organizations that have never
experienced one. In these instances, it
is the responsibility of the cybersecurity professional to make a convincing
case for the incorporation of IT resources into the business continuity
plan. This often requires computer security
personnel to demonstrate the anticipated return on investment (ROI) that
adequate planning will provide an organization (UMUC, 2013).
According to the DRII,
this stage should include both a BIA and a risk assessment. The BIA assesses the potential toll an outage
can take on a critical business area.
Conducting this analysis requires stakeholders to identify key value
drivers within the firm. These are the
elements within an organization deemed most critical to long-term
operations. Examples of value drivers
include components such as intellectual property or data operations (Vacca,
2009). The amount of resources a firm
allocates to system restoration depends on the level of impact an outage is
anticipated to have on daily operations.
The BIA assists managers in designing a hierarchy to determine which
activities or areas should be reestablished first (Slater, 2012).
The second component in
the planning stage is a risk assessment.
This step requires enterprises to perform an objective analysis of
probable and possible risks that could affect daily operations. This step should account for types of
disasters or outages historically encountered taking into account the
anticipated frequency of occurrence as well as the impact each incident is
expected to have on the organization.
With this data, managers can then make an educated decision on how much
investment is required to mitigate the impact from potential outages.
Recovery Operations
The second major component
of the contingency planning process that DRII identified deals with recovery
strategies. This includes identifying
continuity options based on various scenarios, selecting the strategy most
applicable to an organization’s needs and developing a continuity plan based on
this data (Vacca, 2009). Although the
details vary greatly depending on the incident, the general theme should always
focus on communication. Contingency plans
must include how organizations transmit information in the event of an emergency
as well as how employees will talk to each other if normal communication
channels are broken. While some
companies may value IT resources while other firms rely more heavily on supply
chain logistics, every contingency arrangement should be planned and
coordinated with business, security and IT managers working in conjunction to
ensure continuity of operations (Slater, 2012).
As IT resources become
increasingly more important in the modern business community, the amount and
types of disasters that an organization may encounter have risen
significantly. Where past disasters
included natural occurrences such as hurricanes or floods, enterprises today
must now also consider outages to their networks caused by manmade sources. This increase in the number of potential
outages has led to the creation of a variety of third-party service providers. Modern enterprises no longer have to create
contingency plans from scratch. A number
of companies offer specialized continuity planning software, while others
provide turnkey arrangements to facilitate backup operations. From data centers to mobile recovery services,
Gartner estimates this area represents a $3 to $4 billion dollar industry
(Collet, 2007). Although organizations
considering outsourcing this area have a number of options to consider, the primary
consideration for IT resources “…requires that the company install backup and
recovery systems to override any type of crisis in support of physical and
digital security” (UMUC, 2013, p. 8).
Physical
The physical security aspect of a contingency plan includes ensuring that
an alternate offsite location is available in the event of an emergency. This includes not only physical office space
but also the IT resources necessary to continue operations during an
outage. From servers and networks to
data backups, this component provides a means of ensuring parallel
operations. Backup sites can range from
physical locations with minimal infrastructure to sites that fully imitate
current operations. From locations owned
and operated by the enterprise to reciprocal agreements with similar firms,
organizations have a number of recovery options available. Like many aspect of business continuity, the
level of physical preparedness is often dictated by financial considerations.
Physical backup locations generally fall into three main categories
ranging from basic to advanced: cold sites, warm sites, and hot sites (Swanson, Bowen, Phillips, &
Gallup, 2010). Cold sites are facilities
with the lowest level of preparation and accordingly are often the least
expensive to maintain. These locations
usually have minimal infrastructure in place beyond electricity and
environmental controls. As a result,
cold sites require the longest amount of lead time to setup and become fully
operational. The next type of backup
facility is a warm site. These locations
have more preparations in place than cold sites and as such are also more
expensive to maintain. Warm sites are
usually partially furnished with some or all IT resources and telecommunication
equipment already in place. Accordingly,
these facilities require less time to activate than cold sites. The last category of physical backup locations
is a hot site. “Hot sites are facilities appropriately sized to support system
requirements and configured with the necessary system hardware, supporting
infrastructure, and support personnel” (Swanson et al., 2010, p. 22). These locations require the least amount of
time to become active with some maintaining a full-time staff. As a result, hot sites represent the most
expensive scenario for most organizations.
Digital
The second major component
in assessing recovery options revolves around digital security
considerations. Although infrastructure
and personnel are critical aspects in contingency planning, business continuity
must also take into account data backups.
Inherent in this process is a multitude of questions and technologies. Similar to physical security planning, this
area is also heavily influenced by cost considerations (UMUC, 2013).
Depending on mission requirements, enterprises may choose any number of
methods to backup digital media, databases, or proprietary data. Decisions on how often data is backed up and
to what extent should be guided by the critical nature of the information. Organizational policy should be clear in
dictating the frequency and scope of information archiving. Additional considerations should include the
location of media, frequency of data rotation and the data transmission method
to an offsite location. The National Institute of Standards
and Technology (NIST) issues the Federal Information Processing Standards
Publication (FIPS) 199, entitled the Standards for Security Categorization of
Federal Information and Information Systems.
FIPS 199 outlines the recommended recovery strategies depending on the
level of impact an outage is anticipated to have on an organization. NIST recommends tape backups and a cold site
for low priority events. Outages
anticipated to have a moderate effect on daily operations should be mitigated
with optical backups and WAN/VLAN replications as well as a cold or warm site. Finally, NIST recommends a backup strategy
that includes mirrored systems and a hot site location for severe disruptions
to an organization’s most mission critical systems (Swanson et al., 2010).
As more organizations chose to backup their critical data, this in turn
has led to an increase in the number of companies providing data
archiving. From data centers providing
cloud storage to commercial vendors offering full service transportation and
restoration services, modern organizations have a number of alternatives to
choose from. Enterprises who retain
third-party providers should weigh a variety of criteria. Considerations such as geographic location
could become an issue if the vendor is close enough to the customer to also be
affected by an outage. Other deciding
factors should include the accessibility of the stored data, security of the
archived media, environmental considerations and of course, cost (Swanson et
al., 2010).
Testing Requirements
The third major
category the DRII associates with business continuity is the verification,
maintenance, and personnel training associated with a disaster recovery
plan. Testing contingency preparations
is an important component in this process.
Ensuring relevant personnel are adequately trained for their role during
an outage helps guarantee a smooth operation during an actual event. Additionally, a business continuity plan
should be thought of as a living document.
Enterprises should periodically reassess and update contingency plans as
mission requirements or organizational structure changes. Finally, verifying the accuracy and
capability of a plan also provides an additional measure of preparedness prior
to an actual incident (Vacca, 2009).
Tabletop and Functional Exercises
According to NIST, the
two main evaluations are tabletop and functional exercises (Grance, Nolan,
Burke, Dudley, White & Good, 2006).
Tabletop exercises are discussion-based activities where participants
role-play their responsibilities during a simulated emergency. These types of evaluations are usually
conducted in an informal classroom setting with personnel discussing their
roles and actions during an outage. A
facilitator guides participants through one or more scenarios in the attempt at
meeting previously defined objectives.
Depending on the number of scenarios and the detail involved, tabletop
exercises can last anywhere from two to eight hours. This type of evaluation represents the most
cost effective means of testing the viability of a business continuity
plan. Tabletop tests provide a forum for
team members to demonstrate their emergency knowledge as well as give managers
the ability to review contingency plans for errors, missing information or
inconsistencies (Kirvan, 2009).
The other most commonly
utilized validation activity is a functional exercise. This evaluation is also scenario driven but
instead of discussion-based, functional exercises employ a simulated
operational environment. These types of
evaluations are designed to test various aspects of an IT plan to include
personnel, procedures or equipment.
Components to test can include recovery site operations, backup systems,
and any third-party continuity services (Kirvan, 2009). Functional or simulated exercises can vary in
size and scope and can cover a single component or a full-scale evaluation of
an enterprise. As a result, these tests
can last anywhere from several hours to several days and often represent the
most costly and time-consuming of the continuity evaluation tools (Grance et
al., 2006). Although they require a
significant amount of resource expenditures, functional exercises are also one
of the most effective methods of testing a disaster recovery plan prior to an
actual event.
Alternate Testing
Although tabletop and
functional exercises are the two most commonly utilized methods of evaluation,
the commercial vendor Search Disaster Recover also recommends a variety of
alternate tests to include plan reviews, orientation tests, and drills (Kirvan,
2009). In a plan review, participants
discuss the proposed business continuity plan in an informal setting. This step is similar to a tabletop exercise albeit
without a scenario. Orientation tests
introduce participants to the contingency plan and helps orient new staff to
the disaster recovery policies and procedures of an organization. Testing time for this evaluation can be as
little as an hour and should be considered as a component in the employee
training curriculum. Finally, drills
provide an impromptu method of testing staff on established emergency
procedures. These types of evaluations
provide training under realistic conditions and are routinely used for response
to natural disasters.
24-Month Testing Plan
Testing the veracity of
a continuity plan encompasses a number of different exercises. With a variety of activities available to an
organization, the key is to incorporate annual testing into the overall
disaster recovery process. From drills
to full-scale events, each activity possesses both merits in the form of
preparation and drawbacks in the form of time and financial expenditures. Finding a balance between an adequate amount
of testing and a sufficient level of resource allocation is often the primary
difficulty for organizations. In
addition to the actual amount of time needed to conduct the exercise, a far
greater amount of time is necessary for “preparation and execution, funding,
careful planning and a structured process from pre-test through test and
post-test evaluation” (Kirvan, 2009).
Optimally, the financial considerations of any continuity plan should be
based on organizational needs to include the “…maximum tolerable period of
disruption and recovery time from which the specific measures will be based on”
(Pinta, 2011, p. 57). To determine the
amount of money that should be spent on contingency planning and preparations,
enterprises must consider factors such as the maximum tolerable downtime (MTD),
recovery time objective (RTO), and recovery point objective (RPO). For most organizations, the longer an outage occurs,
the more costly it can become. As a
result, firms must balance the costs necessary to recover from an emergency
with the cost of disruption to daily operations. Plotting these two points on a graph allows
managers to visualize the optimal cost balance point that should be allocated
to business continuity planning (Swanson et al., 2010). In their Special Publication 800-53, NIST
requires federal agencies to test contingency plans on an annual basis at a minimum
(Grance et al., 2006). This provides a
solid starting point for the continuity planning cycle.
Full-scale and Functional Testing
Full-scale tests, which
represent the most comprehensive assessment tool, also require the greatest
amount of testing and planning time.
These exercises typically last anywhere from two to eight hours, but
require a minimum of four months to plan.
Full-scale tests are also expensive and may be disruptive to daily
operational activities (Kirvan, 2009).
As a result, a comprehensive test of all IT systems should take place
every one to two years. The exercise
should encompass all aspects of a business continuity plan from evacuating the
primary site to activating the backup location.
All IT and communication resources should be evaluated during this
process to include “…settings of backup policy, data replication, high
availability systems, active and passive devices, local mirror of systems
and/or data and use of disk protection technology such as RAID technology”
(Pinta, 2011, p. 61). Due to the cost
and time necessary to execute this type of plan, organizations should also
consider smaller scale functional tests.
These events exercise only a portion of the continuity operation and as
such may be planned in as little as three months. The actual testing usually lasts two to four
hours and causes less disruption to an organization’s daily activities (Kirvan,
2009).
Drills, Orientation and Tabletop
Testing
In addition to
full-scale and functional exercises, organizations should also consider limited
training events that require less planning and can be executed frequently
throughout the year. Orientation tests
should be given to all new personnel in order to provide a solid foundation of
an organization’s continuity operations and often only require a month to plan
and an hour to deliver. Drills on the
most likely emergency scenarios should be conducted quarterly. This includes exercises such as tornado or
earthquake tests, fire drills, and communication plans. Testing time for these events can be as
little as 10 minutes with a planning cycle of one month. Lastly, tabletop tests should be incorporated
into an organization’s contingency preparations to refine the overall
continuity plan. These events should be
conducted just prior to a functional or full-scale test every one to two
years. The planning cycle for these
events range from two to three months and can be executed in approximately
three hours depending on the size of the organization and the scope of the plan
(Kirvan, 2009). Integrating smaller
scale exercises into an enterprises’ planning process allows for more frequent
tests. This in turns gives managers more
opportunities to identify weaknesses in the continuity testing as well as
provides employees more opportunities to practice their assigned duties in the
event of an emergency.
As organizations increasingly rely
on IT resources for daily operations, the number and variety of potential risks
has risen significantly. Modern enterprises
must consider the impact a network outage would have on their business as well
as the effects from traditional natural and manmade disasters. Perhaps now more than ever, companies and
agencies alike must ensure they have adequate disaster recovery and contingency
plans in place prior to an actual emergency.
A business continuity plan should be tailored to meet an organization’s
specific mission and requirements.
Threats and critical assets should be objectively identified utilizing
tools such as business impact analysis and risk assessments. These evaluations can then be used to develop
a contingency plan and the necessary training and testing requirements to
maintain the emergency preparations. Finally,
a business continuity plan will only succeed if adequate resources, personnel,
and time are allocated to the practice.
This requires receiving support from senior management throughout the
entire contingency planning process.
References
Collett,
S. (2007). Evaluating business continuity services. CSO Security and
Risk. Retrieved
Grance,
T., Nolan, T., Burke, K., Dudley, R., White, G., & Good, T. (2006). Guide
to test,
training,
and exercise programs for IT plans and capabilities. NIST. Retrieved from http://csrc.nist.gov/publications/nistpubs/800-84/SP800-84.pdf
Recovery. Retrieved from http://searchdisasterrecovery.techtarget.com/feature/Business-continuity-and-disaster-recovery-testing-templates-A-free-download-and-guide
Pinta,
J. J. (2011). Disaster recovery planning as part of business continuity
management. Agris Online Papers in
Economics & Informatics, 3(4), 55-61.
Slater,
D. (2010). Business continuity and disaster recovery planning: The
basics. CSO
Security
and Risk.
Retrieved from http://www.csoonline.com/article/204450/Business_Continuity_and_Disaster_Recovery_Planning_The_Basics
Swanson,
M., Bowen, P., Phillips, A. W., & Gallup, D. (2010). Contingency planning
for federal
information
systems. NIST. Retrieved from http://csrc.nist.gov/publications/nistpubs/800-34-rev1/sp800-34-rev1_errata-Nov11-2010.pdf
Tittel, E., & Korelc, J. (2013).
Understanding the need for business continuity management and
disaster recovery planning. AICPA. Retrieved from http://www.aicpa.org/interestareas/informationtechnology/resources/businesscontinuitymanagementanddisasterrecoveryplanning/downloadabledocuments/understanding_drp_bcm.pdf
University of Maryland University College
(UMUC). (2013). Module 11: Service restoration and
business continuity. CSEC 650: Cybercrime
Investigation and Digital Forensics. Retrieved from http://tychousa1.umuc.edu
Vacca,
J. R. (2009). Computer and information security. Burlington, MA:
Morgan Kaufman Publishers.
Thanks for sharing valuable information for off site disaster recovery... Here you find more helpful information on disaster recovery plan example PDF.
ReplyDelete