Document Information
Version: v1
This Disaster Recovery Plan (DRP) captures, in a single repository, all of the information that describes XX’s ability to withstand a disaster as well as the processes that must be followed to achieve disaster recovery.
A disaster can be caused by man or nature and results in XX’s IT function not being able to perform all or some of their regular roles and responsibilities for a period of time. XX defines disasters as the following:
The following events can result in a disaster, requiring this Disaster Recovery document to be activated:
To ensure XX’s continuity of service and product to its’ clients, it is intended that the Business follow the steps in this DRP to restore services to business-as-usual as quickly as possible. This includes:
This DRP document will also detail how this document is to be maintained and tested.
The IT-DRP provides an IT risk and mitigation reference in addition to technical and operational information. Any processes and actions described are only applicable during a designated major incident which is to be determined in conjunction with XX’s Business Continuity Plan
This document is intended as a high-level approach and does not specifically itemise all of the steps required during response/recovery for any individual system.
This IT-DRP should consider all technologies used within XX including new and emerging ones. The top-level areas of consideration are:
Physical Infrastructure
5-Definition of Services & Systems
From an IT perspective, XX aims to operate as a 100% cloud-based organisation. This means that (wherever possible) applications, data & facilities are leveraged wholly through offsite mechanisms.
This approach enables an extremely light-weight and versatile infrastructure with no requirement for on-site servers or data centres, thus reducing maintenance and upgrade requirements. The implementation and operation of a zero-trust network further mitigates against certain types of risks.
Although currently XX does not operate servers or a data centre there may be a point in the future where this becomes a necessity and for this reason risks, mitigations and procedures should be mentioned and expanded upon when appropriate.
Telecommunications within XX are also provided through cloud services using IP connectivity.
Use of IT within XX is broadly attributable to two primary categories
By operating as a 100% cloud-based organisation XX has a very high degree of being able to continue functioning and providing these primary IT services through many physical disaster scenarios including total loss of buildings. The ability to utilise services from “any internet connection” provide a huge benefit in terms of both business continuity and educational delivery.
With this model, the emphasis for IT-DRP is therefore shifted to the appropriate protection of cloud services and internet provision
Definition of Mitigation Measures
6.1 Secondary Service provision
6.2 Resilient diverse connectivity
6.3 Alternate Location (Inherent ability)
6.4 Intrusion Detection, Conditional Access, Active Monitoring & Appropriate permissions
6.5 Physical protective measures
6.6 Backup & Replication
6.7 Policy
6.8 API Integration
6.9 Encryption
6.10 Remote Wipe
6.11 Advanced Threat Protection / Anti-Virus
7-Definition of Risks
Infrastructure | Description | Risk | Mitigation Measures |
Internet Connectivity | Various fibre connections to buildings | Service Provider Failure
Damage to fiber Cabling
Local Hardware Failure (routers, switches etc)
Natural Disaster / Loss of Building Integrity
Malicious attack (Denial of Service, Hacking etc)
|
Secondary service provision
Resilient diverse connectivity Secondary service provision
Multiple device utilisation Overcapacity / Redundancy
Alternate location/Inherent ability
Intrusion detection Conditional Access Appropriate permissions |
Data Centre and Servers | Virtual Machine Estate PaaS provision | Service provider failure
VM failure or corruption |
Backup & Replication
Secondary service provision High availability |
Wi-Fi and Lan Estate | IaaS provision | Hardware failure
Interference Overloading Electrical failure |
Multiple device utilisation Overcapacity / Redundancy Secondary service provision Resilient diverse connectivity |
Email & communication Primary Data Storage Operational Applications Faculty Applications | Various SaaS providers | Service provider failure Communication failure
Corruption of data Loss of data Data Exposure / Compromise Commercial exploitation Sabotage / Internal attack Erroneous or neglectful activity Inability to operate / application failure |
Backup & Replication
Litigation Hold Policy Retention Policy Secondary service provision Conditional Access API integration |
Organisational Devices Personal Devices | Laptops
Tablets Desktops Phones Wearable technology BYOD Home PC |
Hardware failure
Data exposure / Password compromise Loss / Theft Inability to operate External devices (USB stick) Virus / Malware / Attack |
Overcapacity / Redundancy
Local drive encryption Conditional Access Password policy Remote Wipe capability Device restriction policy Advanced threat protection Active monitoring Appropriate use policy |
Support Devices | Multi-Function Printers
Scanners Desktop Printers |
Hardware Failure | Overcapacity / Redundancy |
Business Devices | Dedicated Resources | Hardware Failure | Resilient diverse connectivity |
Security | Door Access
CCTV |
Hardware failure
Vandalism / sabotage |
Manual operation / override Physical protective measures |
8- Recovery Time Objectives / Recovery Point Objectives
8.1 For each service or data source, there are two parameters that determine the amount of time it takes to recover a service or data to the required operational state (RTO) and the potential maximum amount of data that will be lost as a result of the incident (RPO)
8.2 These two-time parameters are a property of the backup and replication processes that are undertaken and therefore in order to meet organisational requirements, there may be more than one type of process defined and implemented.
8.3 The RPO for a service is concerned with the allowable amount of data loss in the event of a disaster. If the RPO is 24 hours, then all data produced by the service must be backed up (including the time taken for the backup) at least every 24 hours to ensure that this objective is met.
8.4 RPOs are data only based and are generally automated. There is usually a direct cost relationship.
8.5 The RTO for a service or data is concerned with how long a service or data can be unavailable before causing irreparable damage to the organisation. If the RTO is 24 hours, then the service must be restored within that time in order be within requirement.
8.6 Since the RTO is generally associated with a whole operational capability rather than just the data, there is a higher cost relationship with more demanding RTO. The process is nearly always manual and due to the fact, that restore times can vary depending on the time of day and other business loads, it is important to ensure that an RTO is achievable. If an RTO is 2 hours and it takes 4 hours to restore a service at peak times, then it will never be achievable.
8.7 The RTO and RPO parameters for each service are documented below
8.7.1 Resiliency/Availability
8.7.2 Recovery/DR
8.8 Overriding security access & permissions are set according to job role within XX and practices are documented in the Security policy.
8.9 During a disaster recovery incident, the authority to action recovery of systems and data lies with the Head of IT/DR Team Leader. In certain severe cases where the impact to XX is high it may be necessary to elevate authority to the Senior Leadership Team following assessment and evaluation of the costs and timescales involved in the recovery. Prioritisation, Action & Communication
8.10 The CTO / CIO / Equivalent Person is responsible for setting the prioritisation and co-ordination of recovery tasks and has overriding control on the actions that are carried out on behalf of XX by any third parties.
9-Validation of Recovery
10-Service Level Agreements / Insurance
SLAs are to be sought from all primary providers and are either referenced within the contact details section of this document or held with the contract to supply. Insurance policies held by both XX and service providers should be checked prior to the financial commitment of any large-scale recovery operation to ensure any requirements are not overlooked which would invalidate such insurance policies.
11-Disaster Recovery Teams & Responsibilities
In the event of a disaster, a DMT will be required to assist the IT department in their effort to restore normal functionality to the employees of XX.
12-Disaster Management Team
The Disaster Management Team that will oversee the entire disaster recovery process. They will be the first team that will need to take action in the event of a disaster. This team will evaluate the disaster and will determine what steps need to be taken to get the organization back to business as usual.
This will be the team responsible for all communication during a disaster. Specifically, they will communicate with XX’s employees, clients, vendors and suppliers, banks, and even the media if required.
The Disaster Management Team will be led by the CTO / DR Lead – James Wilson
13- Role & Responsibilities
14- Contact Information
14.1 Disaster Management Team in XX.
Name | Role/Title | Work Phone Number | Mobile Phone Number |
14.2 Senior Management Team
The Senior Management Team will make any business decisions that are out of scope for the Disaster Recovery Lead. Decisions such as constructing a new data center, relocating the primary site etc. should be make by the Senior Management Team. The Disaster Recovery Lead will ultimately report to this team.
Role & Responsibilities
Contact Information
Add or delete rows to reflect the size of the Management Team in your organization.
Name | Role/Title | Work Phone Number | Home Phone Number | Mobile Phone Number |
15-Disaster Recovery Call Tree
In a disaster recovery or business continuity emergency, time is of the essence so XX will make use of a Call Tree to ensure that appropriate individuals are contacted in a timely manner.
Add as many levels as you need for your organization.
Contact | Office | Mobile | Home | ||
DR Lead | 111-222-3333 | ||||
DR Management Team Lead
R |
|||||
DR Management Team 1 | |||||
DR Management Team 2 | |||||
DR Management Team 3 | |||||
Management Team Lead
|
|||||
Management Team 1
|
|||||
A Disaster Recovery Call Tree Process Flow diagram will help clarify the call process in the event of an emergency.
16-Recovery Facilities
XX has adopted a blended approach to working. As such all employees have the necessary equipment, policies and processes to work from home. We do recognise however that a short term office solution may be required In order to ensure that XX is able to withstand a significant outage caused by a disaster, it would therefore provision separate dedicated standby facilities on a short term basis utilising the “we Space” model as it did in Stockport.
16.1 Description of Recovery Facilities (optional)The Disaster Command and Control Center or Standby facility will be used after the Disaster Recovery Lead has declared that a disaster has occurred and only if specifically deemed necessary and appropriate for a defined period of time and to house predominantly the outward call teams. All Any such location would be a separate location to the primary facility. The location to be defined as when. All IT function employees work from home currently.
The standby facility will be used by the Disaster Recovery teams; it will function as a central location where all decisions during the disaster will be made. It will also function as a communications hub for XX.
The standby facility must always have the following resources available:
17-Data and Backups
17.1 MandatoryData in Order of Criticality
The below list itemizes all of the data in our in order of their criticality.
Rank | Data | Data Type | Back-up Frequency | Backup Location(s) |
1 | <<Data Name or Group>> | <<Confidential, Public, Personally identifying information>> | <<Frequency that data is backed up>> | <<Where data is backed up to>> |
2 | ||||
3 | ||||
4 | ||||
5 | ||||
6 | ||||
7 | ||||
8 | ||||
9 | ||||
10 |
18-Azure Specific
XX uses Azure PaaS disaster recovery to secure our databases against catastrophic loss in the event of a major disaster. It provides a platform where our development teams can manage the applications without additional complexity or maintaining the infrastructure of an app.
XX utilises Azure disaster recovery plan template to secure our PaaS database.
18.1 Azure Disaster Recovery ScenariosThere are several possible factors related to PaaS disaster thar XX has considered applicable to its Business Operations. Out IT teams are aware of these so that in the event of a disaster, data recovery can be done efficiently. Region-wide service interruption is not only responsible for application-wide failures. Poor design and administrative errors can also be the cause of outage.
By using the Azure disaster recovery plan template we have assured that all data can be recovered.
We have defined the minimum level of functionality required during the disaster and will implement the DR plan to minimise the risk. Azure PaaS disaster recovery is focused on recovering from a catastrophic loss of application functionality. For that, we have been required to plan to run the application and access the data in another region.
18.2 Utilising Azure therefore allow XX to adopt:
18.3 How Azure Recovery Services Work
At the moment of disaster we can access the data from our portal. Azure works remotely and continually monitors the servers for data centre failure. A security key establishes the connection between cloud environment and on-premises
19-Communicating During a Disaster
In the event of a disaster XX will need to communicate with various parties to inform them of the effects on the business, surrounding areas and timelines. The Communications Team will be responsible for contacting all of XX‘s stakeholders.
19.1 Communicating with EmployeesThe Communications Team’s second priority will be to ensure that the entire company has been notified of the disaster. The best and/or most practical means of contacting all of the employees will be used with preference on the following methods (in order):
The employees will need to be informed of the following:
Employee Contacts
Add or delete rows to reflect the employees in your organization.
Name | Role/Title | Home Phone Number | Mobile Phone Number | Personal E-mail Address |
19.2 Communicating with Clients
After all of the organization’s employees have been informed of the disaster, the Communications Team will be responsible for informing clients of the disaster and the impact that it will have on the following:
19.3 Communicating with Vendors
After all of the organisation’s employees have been informed of the disaster, the Communications Team will be responsible for informing vendors of the disaster and the impact that it will have on the following:
Crucial vendors will be made aware of the disaster situation first. Crucial vendors will be E-mailed first then called after to ensure that the message has been delivered. All other vendors will be contacted only after all crucial vendors have been contacted.
Vendors encompass those organisations that provide everyday services to the enterprise, but also the hardware and software companies that supply the IT department. The Communications Team will act as a go-between between the DR Team leads and vendor contacts should additional IT infrastructure be required.
Company Name | Point of Contact | Phone Number | |
IT Provider | |||
Communication of relevant DR information throughout XX will also be provided by DR Team Leader and during severe instances to the Senior Leadership Team for appropriate communication to all employees contractors. It is essential that accurate information regarding expectations and realistic recovery timescales is conveyed through proper channels.
In some events such as that of personal data compromise or loss, the DPO must be informed in order to discharge the legal obligations of informing relevant authorities such as the Information Commissioners Office.
Press releases or external communication is strictly under the control of the Senior Leadership Team Dissemination of information by other paths will be viewed as inappropriate and may well instigate disciplinary action.
20- Activating the Disaster Recovery Plan
If a disaster occurs in XX, the first priority is to ensure that all employees are safe and accounted for. After this, steps must be taken to mitigate any further damage to the facility and to reduce the impact of the disaster to the organisation.
Regardless of the category that the disaster falls into, dealing with a disaster can be broken down into the following steps:
20.1 DRP Activation Once the Disaster Recovery Lead has formally declared that a disaster has occurred s/he will initiate the activation of the DRP by triggering the Disaster Recovery Call Tree. The following information will be provided in the calls that the Disaster Recovery Lead makes and should be passed during subsequent calls:
If the Disaster Recovery Lead is unavailable to trigger the Disaster Recovery Call Tree, that responsibility shall fall to the Disaster Management Team Lead
20.2 Assessment of Current and Prevention of Further Damage
Before any employees from XX can enter the primary facility after a disaster, appropriate authorities must first ensure that the premises are safe to enter.
The first team that will be allowed to examine the primary facilities once it has been deemed safe to do so will be the Facilities Team. Once the Facilities Team has completed an examination of the building and submitted its report to the Disaster Recovery Lead, the Disaster Management, Networks, Servers, and Operations Teams will be allowed to examine the building. All teams will be required to create an initial report on the damage and provide this to the Disaster Recovery Lead within 72 hours of the initial disaster.
During each team’s review of their relevant areas, they must assess any areas where further damage can be prevented and take the necessary means to protect XX’s assets. Any necessary repairs or preventative measures must be taken to protect the facilities; these costs must first be approved by the Disaster Recovery Team Lead.
21-Reporting & Incident Analysis
Full documentation of the incident must be completed in an appropriate timescale. The following must be included in the report:
22-Standby Facility Activation
The Standby Facility will be formally activated when the Disaster Recovery Lead determines that the nature of the disaster is such that the primary facility is no longer sufficiently functional or operational to sustain normal business operations.
Once this determination has been made, the Facilities Team will be commissioned to bring the Standby Facility to functional status after which the Disaster Recovery Lead will convene a meeting of the various Disaster Recovery Team Leads at the Standby Facility to assess next steps. These next steps will include:
During Standby Facility Operations, Networks, Servers, Applications, and Operations teams will need to ensure that their responsibilities, as described in the “Disaster Recovery Teams and Responsibilities” section of this document are carried out quickly and efficiently so as not to negatively impact the other teams.
23-Reestoring IT Functionality
Should a disaster actually occur and XX need to exercise this plan, this section will be referred to frequently as it will contain all of the information that describes the manner in which XX’s information system will be recovered.
This section will contain all of the information needed for the organisation to get back to its regular functionality after a disaster has occurred. It is important to include all Standard Operating Procedures documents, run-books, network diagrams, software format information etc. in this section.
24, Current System Architecture
In this section, include a detailed system architecture diagram. Ensure that all of the organization’s systems and their locations are clearly indicated.
<<System Architecture Diagram>>
Rank | IT System | System Components (In order of importance) |
1 | ||
2 | ||
3 | ||
4 | ||
5 | ||
6 | ||
7 | ||
8 | ||
9 |
24.2 Criticality Priority 1 System
This section ranks each system’s components in order of criticality, supplying the information that each system will require to bring it back online. First, vendor and model information, serial numbers and other component specific information will be gathered. Each component’s runbooks or Standard Operating Procedure (SOP) documents are attached as appendices at the end of the document
Each component must have a runbook or SOP document associated with it as below:
EXAMPLE:
System Name | <<State the name of the IT System here>> |
Component Name | <<State the name of the specific IT Component here>> |
Vendor Name | <<State the name of the IT Component’s vendor here>> |
Model Number | <<State the name of the IT Component’s model number here>> |
Serial Number | <<State the name of the IT Component’s serial number here>> |
Recovery Time Objective | <<State the IT Component’s Recovery Time Objective here>> |
Recovery Point Objective | <<State the IT Component’s Recovery Point Objective here>> |
Title: Standard Operating Procedures for <<Component Name>> |
Document No.: <<Number of the SOP document>> |
a) PurposeThis SOP outlines the steps required to restore operations of XXb) Scope
This SOP applies to the following components of XX
c) Responsibilities
The following individuals are responsible for this SOP and for all aspects of the system to which this SOP pertains:
For details of the actual tasks associated with these responsibilities, refer to section h) of this SOP.
d) Definitions
This section defines acronyms and words not in common use:
e) Changes Since Last Revision
f) Documents/Resources Needed for this SOP
The following documents are required for this SOP:
g) Related Documents
The following documents are related to this SOP and may be useful in the event of an emergency. Their documents below are hyperlinked to their original locations and copies are also attached in the appendix of this document:
h) Procedure
The following are the steps associated with bringing <<Component Name>> back online in the event of a disaster or system failure.
Security Level: << Public, Restricted, or Departmental (the specific department is named).>> | Effective Date: <<The date from which the SOP is to be implemented and followed>> | |
SOP Author/Owner: | SOP Approver: | Review Date: <<The date on which the SOP must be submitted for review and revision>> |
Step | Action | Responsibility |
1 | <<Step 1 Action>> | <<Person/group responsible>> |
2 | ||
3 | ||
4 | ||
5 | ||
6 | ||
7 |
24.3 Criticality Priority 2 System
Repeat as above for as many systems as the enterprise makes use of.
25-Plan Testing & Maintenance
While efforts will be made initially to construct this DRP is as complete and accurate a manner as possible, it is essentially impossible to address all possible problems at any one time. Additionally, over time the Disaster Recovery needs of the enterprise will change. As a result of these two factors this plan will need to be tested on a periodic basis to discover errors and omissions and will need to be maintained to address them.
26-Maintenance
The DRP will be updated annually or any time a major system update or upgrade is performed, whichever is more often. The Disaster Recovery Lead will be responsible for updating the entire document, and so is permitted to request information and updates from other employees and departments within the organization in order to complete this task.
Maintenance of the plan will include (but is not limited to) the following:
During the Maintenance periods, any changes to the Disaster Recovery Teams must be accounted for. If any member of a Disaster Recovery Team no longer works with the company, it is the responsibility of the Disaster Recovery Lead to appoint a new team member.
27-Testing
In order to be effective, regular testing of the disaster recovery plan should be undertaken. It is only during testing that parameters such as the RTO and RPO can be confirmed as appropriate and to ensure that the data backups are wholly capable of providing the required level of recovery. It is not sufficient to assume that just because data is being backed up that it will be recoverable.XX is committed to ensuring that this DRP is functional. The DRP should be tested annually in order to ensure that it is still effective. Testing the plan will be carried out as follows:
XX will employ the following to test the DR:
Any gaps in the DRP that are discovered during the testing phase will be addressed by the Disaster Recovery Lead as well as any resources that he/she will require.
There may be certain elements that are difficult to fully test in this situation, but these will also only be identified during testing.
28-Call Tree Testing
Call Trees are a major part of the DRP and XX requires that it is tested annually in order to ensure that it is functional. Tests will be performed as follows:
29-Policy and Procedure review
All policies including the DR Plan will be reviewed when there are changes in employment law that are relevant, where there is a change in the business need or when feedback from HR, line managers or Trade Unions suggest that the policy is either out of date or unfit for purpose.
30-Ownership and Revision
This Plan is owned by the Board of Directors of the Business who has delegated this task to the Chief Information Security Officer or other designated person. This policy shall be revised once in two years by the CISO or other designated person and every time that the Board of Directors of the Business decides to do so.
Version Control
Title | Disaster Recovery Plan | |||
Description | Policy and Process | |||
Created By | Xapads Media Pvt. Ltd, 5th Floor, Windsor IT Park, Tower B, Plot No, A1. | |||
Date Created | 14/09/2023 | |||
Maintained By | Xapads Media Pvt. Ltd, | |||
Version Number | Modified By | Modifications Made | Date Modified | Status |