Upload
john-sing
View
283
Download
0
Tags:
Embed Size (px)
DESCRIPTION
In today's sophisticated IT Cloud world, how do I fuse multiple technologies, products, and clouds together to create a 2012 integrated High Availability, Disaster Recovery, Business Continuity IT solution? This session complements product-specific and Overview HA/DR/BC sessions by providing proven, product-agnostic methodology to architect such a solution, including petabyte-level considerations. We provide pragmatic industry-proven, step-by-step methodology / toolset for you to use to work directly with clients to a) crisply elicit, distill HA/DR/BC requirements b) efficiently organize, map requirements to c) design a integrated multi-product, phased-approach, IT HA/DR/BC solution which properly combines backup/restore software, tape, tape libraries, dedup, point-in-time and continuous disk replication, and storage virtualization products c) provide template to clearly communicate solution, gain consensus across multiple levels of operations and management. John Sing is author of 3 IBM Redbooks, including SG24-6547-03 IBM System Storage Planning for Business Continuity. My only request when referencing this material in your work, is that you give full credit to me, John Sing, and IBM, as the authors of this material, research, and methodology. That having been said, please spread the good word.
Citation preview
1
Architect’s Guide to Designing Integrated Multi-Product HA-DR-BC SolutionsJohn Sing, Executive Strategy, IBM Session E10
2
John Sing • 31 years of experience with IBM in high end servers, storage, and software
– 2009 - Present: IBM Executive Strategy Consultant: IT Strategy and Planning, Enterprise Large Scale Storage, Internet Scale Workloads and Data Center Design, Big Data Analytics, HA/DR/BC
– 2002-2008: IBM IT Data Center Strategy, Large Scale Systems, Business Continuity, HA/DR/BC, IBM Storage
– 1998-2001: IBM Storage Subsystems Group - Enterprise Storage Server Marketing Manager, Planner for ESS Copy Services (FlashCopy, PPRC, XRC, Metro Mirror, Global Mirror)
– 1994-1998: IBM Hong Kong, IBM China Marketing Specialist for High-End Storage– 1989-1994: IBM USA Systems Center Specialist for High-End S/390 processors– 1982-1989: IBM USA Marketing Specialist for S/370, S/390 customers (including VSE and
VSE/ESA)
• IBM colleagues may access my webpage:– http://snjgsa.ibm.com/~singj/
• You may follow my daily IT research blog– http://www.delicious.com/atsf_arizona
3
Agenda
• Understand today’s challenges and best practices
– for IT High Availability and IT Business Continuity
• What has changed? What is the same?
• Strategies for:– Requirements, design, implementation
• Step by step approach– Essential role of automation– Accommodating petabyte scale– Exploiting Cloud
3
2012 Clouddeployment
options
4
Agenda
1. Solving Today’s HA-DR-BC Challenges
2. Guiding HA-DR-BC Principles to mitigate chaos
3. Traditional Workloads vs. Internet Scale Workloads
4. Master Vision and Best Practices Methodology
5
Recovering today’s real-time massive streaming workflows is challenging
Chart in public domain: IEEE Massive File Storage presentation, author: Bill Kramer, NCSA: http://storageconference.org/2010/Presentations/MSST/1.Kramer.pdf:
n d
6
Today’s Data and Data Recovery Conundrum:
7
Many options, including many non-traditional alternatives for user deployments, workload hosting, and recovery models
Traditional alternatives:
• Other platforms
• Other vendors
• Non-traditional alternatives: – The Cloud, the Developing World
Illustrative Cloud examples onlyNo endorsement is implied
or expressed
Inter-
Disciplinary
8
Finally, we have this ‘little’ problem regarding Mobile proliferation
• From IT standpoint, we are clearly seeing “consumerization of IT”
• Key is to recognize and exploit hyper-pace reality of BYOD’s associated data
• Not just the technology
• Also the recovery model (“cloud), the business model, and the required ecosystem
Clayton ChristensenHarvard Business School
http://en.wikipedia.org/wiki/Disruptive_innovation
9
So how do we affordably architect HA / BC / DR in 2012?
10
What has remained the same?
Data Protection Service Management Storage Efficiency
(Continued good Guiding Principles that mitigate HA/DR/BC chaos)
11
Application 1Application 3Analytics
report
managementreports
http://xyz.xml
decisionpoint
MQseries
WebSphere
Application 2
SQL
db2
Businessprocess A
Businessprocess B
Businessprocess C
Businessprocess D
Businessprocess E
Businessprocess F
Businessprocess G
Infr
astr
uctu
reA
pp
licati
on
Bu
sin
ess
1. An error occurs on a storage device that correspondingly corrupts a database
2. The error impacts the ability of two or more applications to share critical data
3. The loss of both applications affects two distinctly different business processes
IT Business Continuity must recover at the business processlevel
The Business Process is still the Recoverable Unit
12
Application 1Application 3Analytics
report
managementreports
http://xyz.xml
decisionpoint
WebSphere
Application 2
SQL
db2
Businessprocess A
Businessprocess B
Businessprocess C
Businessprocess D
Businessprocess E
Businessprocess F
Businessprocess G
Infr
astr
uctu
reA
pp
licati
on
Bu
sin
ess
1. Data input to the cloud
2. Cloud provider outage
3. The loss of Cloud output affects two distinctly different business processes
Cloud is simply another deployment option
But doesn’t change HA/BC fundamental approach
Cloud does not change business process; still the recovery unit
STOP
13
When can Cloud recovery can provide extremely fast time to project completion?
• Where entire business process recoverable units can be out-sourced to Cloud provider
– Production example: Out-sourcing production, or backup/restore, or integrated, standalon, application to a provider
– Cloud application-as-a-service (AaaS) example: Salesforce.com, etc.
Application 1Application 3Analytics
reportmanagement
reports
http://xyz.xml
decisionpoint
MQseries
WebSphere
Application 2
SQL
db2
Businessprocess A
Businessprocess B
Businessprocess C
Businessprocess D
Businessprocess E
Businessprocess F
Businessprocess G
Tech
nic
al
Ap
plicati
on
Bu
sin
ess
14
The trick to leveraging Cloud is:
Understanding that Cloud is simply another (albeit powerful) deployment choice
Good news:
Fundamental principles for HA/DR/BC haven’t changed
It’s only the deployment options that have changed
15
Still true: synergistic overlap of valid data protection techniques
Protection of critical Business data Operations continue after a disaster
Costs are predictable and manageableRecovery is predictable and reliable
Fault-tolerant, failure-resistant streamlined infrastructure
with affordable cost foundation
1. High Availability Non-disruptive backups and
system maintenance coupled with continuous availability of
applications
2. Continuous Operations Protection against unplanned
outages such as disasters through reliable, predictable
recovery
3. Disaster Recovery
IT DataProtection
16
Four Stages of Data Center Efficiency: (pre-req’s for HA/BC/DR)
http://public.dhe.ibm.com/common/ssi/ecm/en/rlw03007usen/RLW03007USEN.PDF http://www-935.ibm.com/services/us/igs/smarterdatacenter.html
April 2012
17
Done?
?
Still true: Timeline of an IT Recovery ==>
Production ☺ Network Staff
Operations StaffOperations Staff
Data
Operating System
Physical Facilities
Telecom Network
Management Control
Execute hardware, operating system, and data integrity recovery
AssessRPO
Application transactionintegrity recovery
Applications
Now we're done!
Applications Staff
Recovery Time Objective (RTO)of transaction integrity
Recovery Time Objective (RTO)of hardware data integrity
Recovery Point Objective
(RPO)
How much datamust be
recreated?
Outage!
RPO
Telecom bandwidth still the major delimiterfor any fast recovery
18
?
Still true: value of Automation for real-time failover ===>
Production ☺ Network StaffOperations StaffOperations Staff
Data
Operating System
Physical Facilities
Telecom Network
Management Control
AssessRPO
Trans.Recov.
Applications
Now we're done!
Applications Staff
RTO trans. integrity
RTO H/W
Recovery Point Objective
(RPO)
How much datamust be
recreated?
Outage!
RPO
HW
•Reliability
•Repeatability
•Scalability
•Frequent Testing
Value of automation
19
Recovery Time Objective (guidelines only)
15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days
Co
st
/ Va
lue
BC Tier 4 – Add Point in Time replication to Backup/Restore
BC Tier 3 – VTL, Data De-Dup, Remote vault
BC Tier 2 – Tape libraries + Automation
BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery
BC Tier 6 – Add real-time continuous data replication, server or storage
BC Tier 1 – Restore from Tape
Still true: Organize High Availability, Business Continuity Technologies Balancing recovery time objective with cost / value
BC Tier 5 – Add Application/database integration to Backup/Restore
Recovery from a disk image Recovery from tape copy
20
Tape Backup
SecsMinsHrsDays Wks Secs Mins Hrs Days Wks
Recovery PointRecovery Point Recovery TimeRecovery Time
Synchronous replication / HA
Periodic Replication
Asynchronous replication
Still true: Replication Technology Drives RPO
For example:
21
Recovery Time includes:
– Fault detection
– Recovering data
– Bringing applications back online
– Network access
Manual Tape Restore
SecsMinsHrsDays Wks Secs Mins Hrs Days Wks
Recovery PointRecovery Point Recovery TimeRecovery Time
End to end automated clustering
Storage automation
Still true: Recovery Automation Drives Recovery Time
For example:
22
Integration into IT ManageBusiness Prioritization
StrategyDesign
riskassessment
businessimpactanalysis
Risks,
Vulnerabilities
and Threats
programassessment
Impacts
of
Outage
RTO/RPO
•Maturity Model
•Measure ROI
•Roadmap for Program
ProgramDesign
Current
Capability
Implement programvalidation
Estimated
Recovery Tim
e
ResilienceProgram
Management
Awareness, Regular Validation, Change Management, Quarterly Management Briefings
Business processes drive strategies and they are integral to the Continuity of Business Operations. A company cannot be resilient without having strategies for alternate workspace, staff members, call centers and communications channels.
crisis team
businessresumption
disasterrecovery
highavailability
1. People2. Processes3. Plans4. Strategies5. Networks6. Platforms7. Facilities
Database andSoftware design
High Availability Servers
Storage, Data Replication
High Availabilitydesign
Source: IBM STG, IBM Global Services
Still true: “ideal world” construct for IT High Availability and Business Continuity
23
The 2012 Bottom line: (IT Business Continuity Planning Steps)
For today’s real world environment……….
Integration into IT ManageBusiness Prioritization
StrategyDesign
riskassessment
businessimpactanalysis
Risks,
Vulnerabilities
and Threats
programassessment
Impacts
of
Outage
RTO/RPO
• Maturity Model
• Measure ROI
• Roadmap for Program
ProgramDesign
Current
Capability
Implement programvalidation
Estimated
Recovery Tim
e
ResilienceProgram
Management
Awareness, Regular Validation, Change Management, Quarterly Management Briefings
crisis team
businessresumption
disasterrecovery
highavailability
1. People2. Processes3. Plans4. Strategies5. Networks6. Platforms7. Facilities
Database andSoftware design
High Availability Servers
Data Replication
high availabilitydesign
i.e. how to streamline this “ideal” process?1. Collect information for prioritization 2. Vulnerability, risk assessment, scope3. Define BC targets based on scope4. Solution option design and evaluation5. Recommend solutions and products 6. Recommend strategy and roadmap
4. Solution option design and evaluation5. Recommend solutions and products 6. Recommend strategy and roadmap
2012 key #2:
Workload type
2012 key #1:
need a basicData Strategy
Need faster way than even this simplified 2007 version:
24
Streamlined BC ActionsInput Output
2. Vulnerability / Risk Assessment
List of vulnerabilities Defined vulnerabilities
3. Define desired HA/BC targets based on scope
Existing BC capability, KPIs, targets, and success rate
Defined BC baseline targets, architecture, decision and success criteria
4. Solution design andevaluation
Technologies and solution options
Business process segmentsand solutions
5. Recommend solutions and products
Generic solutions that meet criteria
Recommended IBMSolutions and benefits
1. Collect info forprioritization
Business processes, Key Perf. Indicators, IT inventory
Scope, Resource Business Impact
Component effect on business processes
6. Recommend strategy and roadmap
Budget, major project milestones, resource availability, business process priority
Baseline Bus. Cont. strategy, roadmap, benefits, challenges,financial implications andjustification
2005 version
25
Streamlined BC ActionsInput Output
2. Vulnerability / Risk Assessment
List of vulnerabilities Defined vulnerabilities
3. Define desired HA/BC targets based on scope
Existing BC capability, KPIs, targets, and success rate
Defined BC baseline targets, architecture, decision and success criteria
4. Solution design andevaluation
Technologies and solution options
Business process segmentsand solutions
5. Recommend solutions and products
Generic solutions that meet criteria
Recommended IBMSolutions and benefits
1. Collect info forprioritization
Business processes, Key Perf. Indicators, IT inventory
Scope, Resource Business Impact
Component effect on business processes
6. Recommend strategy and roadmap
Budget, major project milestones, resource availability, business process priority
Baseline Bus. Cont. strategy, roadmap, benefits, challenges,financial implications andjustification
Do basic HA/DR
Data Strategy
Exploit
Workload Type
2012 version
26
How do we get there in 2012?
Bottom line #1: have a basic Data Strategy
Bottom line #2: Exploit Workload type
Data Protection Service Management Storage Efficiency
27
i.e. #1: It’s all about the
Data
Now, what do I mean by that?
2828
Applicationscreate data
InformationArchive / Retain / Delete
What is a basic Data Strategy? Specify data usage over it’s lifespanF
req
uen
cy o
f A
cces
s an
d U
se
Time
Informationand data
Management
29
Business processes drive strategies and they are integral to the Continuity of Business Operations. A company cannot be resilient without having strategies for alternate workspace, staff members, call centers and communications channels.
Integration into IT ManageBusiness Prioritization
StrategyDesign
riskassessment
businessimpactanalysis
Risks,
Vulnerabilities
and Threats
programassessment
Impacts
of
Outage
RTO/RPO
•Maturity Model
•Measure ROI
•Roadmap for Program
ProgramDesign
Current
Capability
Implement programvalidation
Estimated
Recovery Tim
e
ResilienceProgram
Management
Awareness, Regular Validation, Change Management, Quarterly Management Briefings
crisis team
businessresumption
disasterrecovery
highavailability
1. People2. Processes3. Plans4. Strategies5. Networks6. Platforms7. Facilities
Database andSoftware design
High Availability Servers
Storage, Data Replication
High Availabilitydesign
Source: IBM STG, IBM Global Services
Data strategy = collecting information, prioritizing, vulnerability/risk, scope
Data
Strategy
30
Data Strategy: relationship to Business, IT Strategies
Business Strategy
Business
Scope
Distinct
CompetenciesBusiness
Governance
IT Strategy
Technology
Scope
System
CompetenciesIT
Governance
Organization, Infrastructure,
Process
Process
Skills Tools
IT Infrastructure
And processes
IT
Infrastructure
Processes Skills
Business Strategies
IT Strategy
Data Strategy
Enterprise IT Architecture
IT Infrastructure
People
Process
Structure
Data
Technology
Data Strategy
Data Strategy Defined
31
The role of the basic “Data Strategy” for HA / BC purposes
• Define major data types “good enough”– i.e. by major application, by business line….– An ongoing journey
• For each data type:– Usage– Performance and measurement– Security– Availability– Criticality– Organizational role– Who manages– What standards for this data
• What type storage deployed on• What database • What virtualization
• Be pragmatic– Create a basic, “good enough” data strategy for HA/BC purposes
• Acquire tools that help you know your data
Data Strategy Defined
Business Strategies
IT Strategy
Data Strategy
Enterprise IT Architecture
IT Infrastructure
People
Process
Structure
Data
Technology
Data Strategy
You have toknow your data
And have abasic strategy
for it
32
Here’s the major difference for 2012: There are two major types of workloads:
Traditional IT Internet Scale Workloads
HA, Business Continuity, Disaster Recovery Characteristics
HA/DR/BC can be done “Agnostic / after the fact” using replication
HA/DR/BC must be “designed into software stack from the beginning”
Data Strategy Use traditional tools/concepts to understand / know dataStorage/server virtualization and pooling
Proven Open Source toolset to implement failure tolerance and redundancy in the application stack
Automation End to end automation of server / storage virtualization
End to end automation of the application software stack providing failure tolerance
Commonality Apply master vision and lessons learned from internet scale data centers
Apply master vision and lessons learned from internet scale data centers
33
Site Load Balancer
Web Server Clusters
Application / DBServer Clusters
Server Clusters Disk
Production Site
Choices for high availability and replication architectures
Local backup
Applicationor database Replication
ServerReplication
StorageReplic.
Geographic Load Balancer
Geographic Load Balancer Site
Load Balancer
PIT Image, Tape B/U
Web Server Clusters
Application / DBServer Clusters
Server Clusters
Other Site(s)
Workloadbalancer
34
Comparing IT BC architectural methods
• Application / database / file system replication / workload balancer– Typically requires the least bandwidth– May be required if the scale of storage is very large (i.e. internet scale)– Span of consistency is that application, database or file system only– Well understood by database, application, file system administrators– Can be more complex implementation, must implement for each application
File system,
DB, Applic.
Aware
File system,
DB, Applic.
Agnostic
• Replication – Server (traditional IT) – Well understood by operating systems administrators– Storage and application independent, uses server cycles– Span of recovery limited to that server platform
• Replication – Storage (traditional IT)– Can provide common recovery across multiple application stacks and multiple
server platforms– Usually requires more bandwidth– Requires storage replication skill set
Site Load Balancer
Web Server Clusters
Application / DB Server Clusters
Server Clusters Storage
Production Site
LocalBackup
Application / DB Replication
ServerReplication
StorReplic.
Geographic Load Balancer
Geographic Load Balancer Site
Load Balancer Replication,
PiT Image, Tape
Web Server Clusters
Application / DB Server Clusters
Server Clusters
Multiple Site(s)
WorkloadBalancer
35
Principles for Internet Scale Workloads
36
Internet Scale Workload Characteristics - 1
• Embarrassingly parallel Internet workload– Immense data sets, but relatively independent records being processed
• Example: billions of web pages, billions of log / cookie / click entries– Web requests from different users essentially independent of each over
• Creating natural units of data partitioning and concurrency• Lends itself well to cluster-level scheduling / load-balancing
– Independence = peak server performance not important– What’s important is aggregate throughput of 100,000s of servers
i.e. Very low inter-process
communication
• Workload Churn– Well-defined, stable high level API’s (i.e. simple URLs)– Software release cycles on the order of every couple of weeks
• Means Google’s entire core of search services rewritten in 2 years– Great for rapid innovation
• Expect significant software re-writes to fix problems ongoing basis– New products hyper-frequently emerge
• Often with workload-altering characteristics, example = YouTube
37
Internet Scale Workload Characteristics - 2• Platform Homogeneity
– Single company owns, has technical capability, runs entire platform end-to-end including an ecosystem
– Most Web applications more homogeneous than traditional IT– With immense number of independent worldwide users
1% - 2% of all Internet requests
fail*
Users can’t tell difference between Internet down and
your system down
Hence 99% good enough
*The Data Center as a Computer: Introduction to Warehouse Scale Computing, p.81 Barroso, Holzlehttp://www.morganclaypool.com/doi/pdf/10.2200/S00193ED1V01Y200905CAC006
• Fault-free operation via application middleware– Some type of failure every few hours, including software bugs– All hidden from users by fault-tolerant middleware– Means hardware, software doesn’t have to be perfect
• Immense scale: – Workload can’t be held within 1 server, or within max size tightly-clustered
memory-shared SMP– Requires clusters of 1000s, 10000s of servers with corresponding PBs storage,
network, power, cooling, software– Scale of compute power also makes possible apps such as Google Maps, Google
Translate, Amazon Web Services EC2, Facebook, etc.
38
IT architecture at internet scale
• Internet scale architectures fundamental assumptions:– Distributed aggregation of data
– High Availability, failure tolerance functionality is in software on the server
– Time to Market is everything• Breakage = “OK” if I can insulate that from user
– Affordability is everything– Use open source software where-ever possible
– Expect that something somewhere in infrastructure will always be broken
– Infrastructure is designed top-to-bottom to address this
• All other criteria are driven off of these
Criteria:
Cost
Extreme:
- Scale- Parallelism- Performance- Real time-Time to Market
39
SERVER HARDWARE
RHEL 2.6.X PAE
RACK
INTERIOR NETWORK IPv6
GFS / GFS II
BigTable MapreduceBigTable
Chubby Lock
GOOGLE APP ENGINE
Python, Java, C++, Sawzall, other
DC
GOOGLE APPSSEARCH
INDEXCRAWLGMAIL...
Architecture
Python. Java. C++
Exterior Network
GWQ
1. Google File System Architecture – GFS II
2. Google Database - Bigtable
3. Google Computation - MapReduce
4. Google Scheduling - GWQ
For Internet Scale workloads, Open Source based internet-scale software stack
Example shown is the 2003-2008 Google version:
The OS or HW doesn’t do any of the redundancy
Reliability, redundancy all in the “application stack”
40
Internet-scale IT infrastructure
Inp
ut
from
th
e I
nte
rnet
You
r cu
sto
mers
HA/DR/BC
For
InternetScale
Workloads
Each red block is an inexpensive server = plenty of power for its
portion of workflow
41
Warehouse Scale Computer programmer productivity framework example
• Hadoop– Overall name of software stack
• HDFS– Hadoop Distributed File System
• MapReduce– Software compute framework
• Map = queries • Reduce=aggregates answers
• Hive– Hadoop-based data warehouse
• Pig– Hadoop-based language
• Hbase– Non-relationship database fast
lookups
• Flume– Populate Hadoop with data
• Oozie– Workflow processing system
• Whirr– Libraries to spin up Hadoop on
Amazon EC2, Rackspace, etc.• Avro
– Data serialization• Mahout
– Data mining• Sqoop
– Connectivity to non-Hadoop data stores
• BigTop– Packaging / interop of all
Hadoop components
http://wikibon.org/wiki/v/Big_Data:_Hadoop%2C_Business_Analytics_and_Beyond
42
Summary - two major types of approaches, depending on workload type:
Traditional IT Internet Scale Workloads
HA, Business Continuity, Disaster Recovery Characteristics
HA/DR/BC can be done “Agnostic / after the fact” using replication
HA/DR/BC must be “designed into software stack from the beginning”
Data Strategy Use traditional tools/conceptsw to understand / know dataStorage/server virtualization and pooling
Proven Open Source toolset to implement failure tolerance and redundancy in the application stack
Automation End to end automation of server / storage virtualization
End to end automation of the application software stack providing failure tolerance
Commonality Apply master vision and lessons learned from internet scale data centers
Apply master vision and lessons learned from internet scale data centers
43
Principles for Architecting IT HA / DR / Business Continuity
44
Key strategy: segment data into logical storage pools by appropriate Data Protection characteristics (animated chart)
• Continuous Availability (CA) – E2E automation enhances RDR– RTO = near continuous, RPO = small as possible (Tier 7)– Priority = uptime, with high value justification
Lower cost
• Rapid Data Recovery (RDR) – enhance backup/restore– For data that requires it– RTO = minutes, to (approx. range): 2 to 6 hours– BC Tiers 6, 4– Balanced priorities = Uptime and cost/value
• Backup/Restore (B/R) – assure efficient foundation – Standardize base backup/restore foundation – Provide universal 24 hour - 12 hour (approx) recovery capability– Address requirements for archival, compliance, green energy– Priority = cost
Mission Critical
Know and categorize your data -
Provides foundation for affordable data protection
Know and categorize your data -
Provides foundation for affordable data protection
Enabled by
virtualization
45
Virtualization is fundamental to addressing today’s IT diversity
Virtualization
46
Virtualized IT infrastructure Business Processes
Virtualized systems become the resource pools that enable the recoverability
Consolidated virtualized systems become the Recoverable Units for IT Business Continuity
Virtualization
47
Recovery Time Objective
15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days
Co
st
/ Va
lue
BC Tier 4 – Add Point in Time replication to Backup/Restore
BC Tier 3 – VTL, Data De-Dup, Remote vault
BC Tier 2 – Tape libraries + Automation
BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery
BC Tier 6 – Add real-time continuous data replication, server or storage
BC Tier 1 – Restore from Tape
High Availability, Business Continuity Step by Step virtualization journey Balancing recovery time objective with cost / value
BC Tier 5 – Add Application/database integration to Backup/Restore
Recovery from a disk image Recovery from tape copy
Foundation
Storage pools
48
Storage PoolsApply appropriate server,
storage technology
Real Time replication(storage or server or
software)
Real Time replication(storage or server or
software)
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
- Foundation backup/restore- Physical or electronic transport
- Foundation backup/restore- Physical or electronic transport
PetaByteUnstructured
PetaByteUnstructured
PetabyteUnstructured
PetabyteUnstructured
Petabyte unstructured, due to usage and large scale, typically uses
application level intelligent redundancyfailure toleration design
Petabyte unstructured, due to usage and large scale, typically uses
application level intelligent redundancyfailure toleration design
Real-time replication
Point in time
Removable media
File, application, or disk-to-disk
periodic replication
Add automated failover to replicated storage
49
Recovery Time Objective
Co
st
Methodology: Traditional IT HA / BC / DR in stages, from bottom up
SAN SAN
Add: Point-in-time Copy, disk to disk, Tiered Storage (Tier 4)Foundation: electronic vaulting, automation, tape lib (Tier 3)
Foundation: standardized, automated tape backup (Tier 2, 1)
Disk VTL/De-DupDisk VTL/De-Dup VTL/De-Dup
•IBM FlashCopy, SnapShot•IBM XIV, SVC, DS, SONAS•IBM Tivoli Storage Productivity Center 5.1
•IBM ProtecTier•IBM Virtual Tape Library•IBM Tivoli Storage Manager Backup/restore
•VTL, de-dup, remote replication at tape level
50
Recovery Time Objective
Co
st
SAN SAN
Add: Point-in-time Copy, disk to disk for backup/restore (Tier 4)Foundation: electronic vaulting, automation, tape lib (Tier 3)
Foundation: standardized, automated tape backup (Tier 2, 1)
Disk VTL/De-DupDisk VTL/De-Dup VTL/De-Dup
Applicationintegration
Applicationintegration
Automate applications, database for replication and automation (Tier 5)Consolidate and implement real time data availability (Tier 6)
Datareplication
Data replication
End to end automated site failover servers, storage, applications (Tier 7)
Dynamic
End to endAutomatedFailover:Server
StorageApplications
Methodology: traditional IT HA / BC / DR in stages, from bottom up
If storage: •Metro Mirror, Global Mirror, Hitachi UR•XIV, SVC, DS, other storage•TPC 5.1
•VMWare•PowerHA on p
•Tivoli FlashCopy Manager
•Server virtualization
51
Technology Deployments in Cloud
EnterpriseData Center
Private Cloud
1EnterpriseEnterprise
Data Center
Co-lo operated
Managed Private Cloud
Co-lo owned and operated Co-lo owned
and operated
Hosted Private Cloud
2 3
• Consumption models including client-owned and provider-owned assets
• Delivery options including client premise & hosted
• Strategic Outsourcing clients with standardized services
Operated or
Co-located
Enterprise AEnterprise
BEnterprise C
Shared Cloud Services
4
• Standardized, multi-tenant service
• Pay-per-usage model with provider-owned assets
Pay-per-Usage
User A
User B
User C
User D
User E
Public Cloud Services
5
• Supporting compute-centric workloads
• Finer granularity in multi-tenancy model
• Provider-owned assets
Compute Cloud Persistent StoragePrivate Cloud
• Client-managed cloud
• Internal or partner implementation services
52
Cloud as remote site deployment options
Real Time replication(storage or server or
software)
Real Time replication(storage or server or
software)
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
- Point in Time Copies- Physical or electronic transport
- Point in Time Copies- Physical or electronic transport
PetaByteUnstructured
PetaByteUnstructured
PetabyteUnstructured
PetabyteUnstructured
Petabyte level storage typicallyuses intelligent file or application replication
due to large scale, usage patterns
Petabyte level storage typicallyuses intelligent file or application replication
due to large scale, usage patterns
ProductionRecovery
inCloud
53
VirtualizedStorageData strategy
remote cloud
Real Time replication(storage or server or
software)
Real Time replication(storage or server or
software)
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
- Point in Time Copies- Physical or electronic transport
- Point in Time Copies- Physical or electronic transport
PetaByteUnstructured
PetaByteUnstructured
PetabyteUnstructured
PetabyteUnstructured
Petabyte level storage typicallyuses intelligent file or application replication
due to large scale, usage patterns
Petabyte level storage typicallyuses intelligent file or application replication
due to large scale, usage patterns
Real-time replication
Point in time
Removable media
Disk-to-disk replication
Automated failover
54
Local Cloud deployment from data standpoint
PetaByteUnstructured
PetaByteUnstructured
55
Cloud providerresponsibilityfor HAand BC
Real Time replication(storage or server or
software)
Real Time replication(storage or server or
software)
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
Periodic PiT replication:-File System
- Point in Time Disk- VTL to VTL with Dedup
- Point in Time Copies- Physical or electronic transport
- Point in Time Copies- Physical or electronic transport
PetaByteUnstructured
PetaByteUnstructured
PetabyteUnstructured
PetabyteUnstructured
Petabyte level storage typicallyuses intelligent file or application replication
due to large scale, usage patterns
Petabyte level storage typicallyuses intelligent file or application replication
due to large scale, usage patterns
YourProduction
In Cloud
Recovery By
CloudProvider
56
Recovery Time Objective
15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days
Co
st
/ Va
lue
BC Tier 4 – Add Point in Time replication to Backup/Restore
BC Tier 3 – VTL, Data De-Dup, Remote vault
BC Tier 2 – Tape libraries + Automation
BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery
BC Tier 6 – Add real-time continuous data replication, server or storage
BC Tier 1 – Restore from Tape
Today’s world: High Availability, Business Continuity is a Step by Step data strategy / workload journey
Balancing recovery time objective with cost / value
BC Tier 5 – Add Application/database integration to Backup/Restore
Recovery from a disk image Recovery from tape copy
Workload Types
Data Strategy
Clouddeploymentif needed
57
Recovery Time Objective
15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days
Co
st
/ Va
lue
BC Tier 4 – Add Point in Time replication to Backup/Restore
BC Tier 3 – VTL, Data De-Dup, Remote vault
BC Tier 2 – Tape libraries + Automation
BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery
BC Tier 6 – Add real-time continuous data replication, server or storage
BC Tier 1 – Restore from Tape
Recovery from a disk image Recovery from tape copy
Step by Step Virtualization, High Availability, Business Continuity data strategy
Balancing recovery time objective with cost / value
BC Tier 5 – Add Application/database integration to Backup/Restore
Continuous AvailabilityContinuous Availability
Rapid Data RecoveryRapid Data Recovery
Backup/RestoreBackup/Restore
Workload typesData Strategy
Clouddeploymentif needed
59
Summary • Understand today’s best practices
– for IT High Availability and IT Business Continuity
• What has changed? What is the same?– Principles for requirements = no change
• Data Strategy– Deployment for true internet scale wkloads:
• Application level redundancy
• Strategies for:– Requirements, design, implementation– In-house vs. out-sourcing
• Step by step approach– Automation, virtualization essential– Segment workloads traditional vs. petabyte scale– Exploiting Cloud
59
DataStrategy
Workloadtypes
Clouddeployment
options
60