Upload
thomas-bronack
View
125
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Description on implementing a recovery environment with VMware, vSphere, vConnect, and RPA as an initial training document to application DR Teams going through Application Recovery Certification with links to additional materials.
Citation preview
DR Planning Project
Training Document
Prepared by:
Thomas Bronack, CBCP
(917) 673-6992
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
What do we want to achieve?
• Fully converted Information Technology Environment.
• Savings through equipment, locations, and vendor contracts.
• Savings through better controls and efficiency.
• Continuity of Business achieved through Enterprise Resiliency.
• World-Wide Compliance achieved through Corporate Certification.
• Additional savings through integration with everyday functions.
• Improved Reputation and Higher Employee Morale.
• Better retention of staff and clients.
• More able to recruit new personnel and close client business.
• Costs go down and efficiency goes up.
• Improved Savings and Profitability.
2
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
Outsourcing Project Time Line of Events
Bid Inventory Build Regionals Transition Disaster RecoveryBuild Recovery Compliance
• Laws and Regulations;• Requirements to
Comply with;• Present Compliance;• Gaps & Exceptions;• Obstacles;• Domestic;• International; and• Cross Border
Requirements.
• “Proof of Concept”;• Infrastructure Readiness;• Disaster Recovery;• Application Recovery;• Business Recovery;• Workplace Safety and
Violence Prevention;• Emergency Management;• Crisis Management;• Protection, Salvage, and
Restoration;• Supply Chain
Management;• Insurance;• Community Relations;• Communications; and• Use of Social Media.
• What they Have;• Infrastructure;• Equipment;• Software;• Applications;• Locations;• Computer Sites;• Recovery Sites;• Applications with
Recovery Plans; and• Application that need
Recovery Plans.
• RFP;• Bid;• SOW;• Scope; • Goals and• Timeframe
• Prod 1;• Prod 2;• Prod 3.
Recovery Site
• Move Applications to Regional Data Center;
• Test Successful Operation; and
• Use Virtualization.
Phase I Phase II Phase III Phase IV Phase V Phase VI
3
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
4
Prod 1(Americas)
UserSites
UserSites
UserSites
Prod 2(Europe)
Prod 3(Asia Pacific)
Global Recovery Site
Cloudor
WAN
Cloudor
WAN
User Locations connected to Regional Data Centers and Global Recovery Site
Three Regional Data Centers and One Global Recovery Site
Phase V – Perform Application Recovery Certification
Phase V – Application Recovery Certification is accomplished; initially for selected applications to validate Regional Sites can recover to Global Recovery Site,
Recovery Site is Built so that existing recovery sites and vendor contracts can be eliminated.
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
5
ProductionSite
Old ID Address
RecoverySite
New IP Address
Failover
Failback
User 1 User 2 User 3 User n
Cloudor
WAN
Cloudor
WAN
Production Path
Disaster Recovery Path
Users Switched to Recovery Site
Failover / Failback DR Process
1. Users stay at their site, while Production is switched to Recovery Site.2. User has to move to a secondary site because User site is lost, connect to Region Site & Test Recovery.3. Users move to recovery site and production is switched to Recovery.
• Declare Disaster;• Failover to Recovery
Site;• Continue User
Processing within RTO;• Supplies are routed to
Recovery Site;• Original Site is
Safeguarded, Salvaged, and Restored;
• Failback to Original Site
• Use Existing Recovery Plan to Certify Application Recovery
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
Virtual Machines (VM) are maintained by the VMware vSphere system which manages a vCenter Server used for Site Recovery Management. VM can be considered as a Resource Manager that separates Real Equipment (Storage, Computer, Network, etc.) into Logical Equipment Sections. Each VM can represent a Real Server, but many VM can reside in a Real Server which will free up real servers presently used for disposal and a reduction in cost. VMs save space, power, and reduce environmental concerns, all of which affects the bottom line and reputation of the company. It also takes fewer people to manage a Virtual Environment that the number of people now required to manage a real environment. Servers are Rack Mounted in what is called a Blade to save floor space and infrastructure. Switches can re-route servers to the Recovery Point Application to the Recovery Site when a disaster event occurs.
Logical DR Environment 6
WAN
Array ReplicationOver WAN
Recovery Site
DR Logical Architecture from Prod to Recovery
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
This diagram describes what the DR Environment will look like when completed. Remote sites are transformed and virtualized via the Avamar Virtual Environment, which will allow for the removal of remote equipment and support personnel. Windows, UNIX, and ESX Operating Systems will be housed in an EMC VNX Unified Storage facility. Network Backup Servers will protect communications, and Data Domains will protect Remote Users. A Tape Library is provided for long term storage and electronic transfer to the Iron Mountain Tape Vault via encrypted communications. The System Attached Network (SAN) and EMC Unified Storage Facility are connected to the wide Area Network through EMC Recovery Point Applications (RPAs) that can automatically switch a failing location to the recovery site to continue processing.
DR Environment Target7
Recovery Site
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
Secondary Site
Primary Site
Primary Site
Disaster Event:• Event;• Analyze;• Declare;• Failover.
Primary Site
Safeguard:• Evacuate;• Protect Site;• First
Responders.
Primary Site
Salvage:• Clean
Facility;• Repair;• Resupply.
Primary Site
Restoration:• Restart;• Test;• Success;• Failback.
Primary Site
Resume:• Reload
Data;• Restart;• Continue.
Failover Production Recovery Processing
Failback from Secondary Site after Restoration
FailoverStart Up
FailbackShut Down
High Availability (HA) is RTO / SLA
based Switch
Continuous Availability (CA) is immediate Switch
Repair Primary Site to Resume Production via Failback
CA HA
Production Production
“The goal of Enterprise Resiliency is to achieve ZERO DOWNTIME by implementing Application Recovery Certification for HA and Gold Standard Recovery Certification for CA Applications”
Flip / FlopSwitch Over
Data Sync
Flip / FlopSwitch Over
RPO (Last Snapshot)
RTO
Point of Failure
Lifecycle of a Disaster Event (Why we create Recovery Plans) 8
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
Disaster Recovery Testing Process
The DR Testing process is illustrated here and includes:
1. Select Application for DR Testing;2. Define DR Testing Goals and
Objectives;3. Define Production Site where
application resides;4. Complete Pre-Staging form to
provide DR team with the information need to make the recovery site ready to perform DR Testing;
5. Complete DR Exercise Booklet for Application;
6. Conduct the Actual DR Exercise;7. DR Coordinator receives Work
Sheets and prepares a Report and Presentation of findings for the Post Mortem;
8. Implement recommendations for improvement,
9
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
Recovery Testing processA. Develop Recovery Objectives and Testing Schedule:
1. Create a Recovery Site DR Site Testing;2. Validate Production Site to Recovery Site Connectivity; 3. Disaster Recovery Plans for interruptions to Information technology;4. Application Recovery Certification (CA, HA, Best Effort, Deferred);5. Business Recovery for loss of a location;6. Emergency Management for Incidents and Natural Disasters; etc.
B. DR Testing is conducted in Five Steps, which are:1. DR Planning Meeting – to orientate Application DR Team;2. Infrastructure Readiness – To prepare the Recovery Site and Obtain Data;3. DR Pre-Test – To prepare the Recovery Site for Application DR Test:
a. Recovery Site establishes recovery environment for disaster event or test. b. Develop procedures for providing Recovery Site with the information they
need.4. Actual DR Recovery / Test – to DR Test the Application:
a. Follow the “Script of Actions” contained in the Recovery Plan.b. Record event times, comments, and encountered problems.
5. Post Mortem Meeting – Review of DR Test Results:a. To discuss recovery events and recommend improvements.
10
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
Where do we go from hereA. Develop Application Actual DR Testing process:
1. Application Actual DR Test Activities Sheet is completed with Estimated Times.2. Production Servers are brought-down in Production.3. Recovery Servers are brought-up in Recovery.4. Application is connected to Recovery Facility.5. Data is Synchronized to point just before failure.6. Application resumes normal processing like in Production Mode.7. Application connectivity and functionality is verified.8. Recovery Servers are Brought-Down.9. Production Servers are Brought-Up.10. Application resumes processing at Production Site and is verified.11. If Successful, Application receives Application Recovery Certification – otherwise Application
DR problems are repaired and the Application goes through DR Testing again until Application Recovery Certification is achieved.
B. Develop Application Work Sheet;
1. Same as Activities Sheets, but is used to record Actual Times, Durations, Encountered Problems, and Comments.
C. Post Mortem Meeting is conducted to review results, go over “Lessons Learned” and make “Recommendations for Improvement”.
1111
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
Disaster Recovery Dashboard and Documents
1. DR Planning Guide
2. DR Management Dashboard
3. DR Exercise Booklet Template
4. Planning Meeting Agenda
12
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
Microsoft Word Document
Microsoft Excel Worksheet
Microsoft Word Document
Microsoft Excel Worksheet
What should be accomplished during the Planning Meetings
1. Infrastructure Readiness Information
2. Contact List
3. EMC Disaster Recovery and Business Continuity Solutions.
4. VMware vSphere, vCenter Prep
5. VMware Usage and Recovery
13
Microsoft Excel Worksheet
Microsoft Excel Worksheet
Thomas Bronack © Initial Training Class Phone: (917) 673-6992 / Email: [email protected]
Microsoft PowerPoint Presentation
C:\Users\Thomas\Documents\Dashboard\Dashb
C:\Users\Thomas\Documents\Dashboard\Dashb