100
Parmigiano, a Monastery, Love and Faith Simone Brunozzi Senior Technology Evangelist, Amazon Web Services @simon Technical lessons on how to do backup and disaster recovery in the cloud

Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Embed Size (px)

DESCRIPTION

Maintaining data integrity and guaranteeing business continuity is of utmost importance for any organization. However, in today's world, those systems have grown in complexity and cost, while the business demands IT agility and lower costs. In this talk, we will explore how organizations should approach backup and disaster recovery, and how these two aspects can be implemented in the cloud to improve efficiency and flexibility. The talk starts with general concepts, and then dives into technical details, culminating in real customer examples that showcase some tips and tricks and the benefits of a cloud-based approach.

Citation preview

Page 1: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Parmigiano, a Monastery, Love and Faith

Simone Brunozzi Senior Technology Evangelist, Amazon Web Services

@simon

Technical lessons on how to do backup and disaster recovery in the cloud

Page 2: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

"The mind is not a vessel to be filled, but a fire to be ignited." - Plutarch

Page 3: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Agenda

I. Prologue

II. Lessons

III. Customer Story

IV. Earthquake

V. Lessons

VI. Conclusions

The story of Monte Cassino

Backup

Shaw Media

What happened to my Parmigiano?

Disaster Recovery

... And a little surprise!

Page 4: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Prologue

Part I

Page 5: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Abbey of Monte

Cassino

Page 6: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Why is Monte Cassino important? ] [

Page 7: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

The Treasure of Monte Cassino ] [

Page 8: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

The Treasure of Monte Cassino ] [

800 papal documents 20,500 volumes in the Old Library 60,000 in the New Library 200 manuscripts on parchment 100,000 prints and paintings (including 11 Titians) 500 incunabula

A book printed before 1501 C.E.

Gutenberg’s Bible was printed in 1455

C.E.

Titian, one of the most influential

painters ever

x

Page 9: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Business continuity continuum ] [

High availability

Backup storage

Disaster recovery

Page 10: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

High Availability : Keeping services alive.

Business continuity continuum ] [

Page 11: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

High Availability : Keeping services alive. Backing up : Process of copying and archiving of data so it may be used to restore the original after a data loss event.

Business continuity continuum ] [

Page 12: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

High Availability : Keeping services alive. Backing up : Process of copying and archiving of data so it may be used to restore the original after a data loss event. Disaster recovery : Recovery of technology infrastructure critical to an organization after a natural or human-induced disaster.

Business continuity continuum ] [

Page 13: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Origin of Backup ] [

Monastery : Brilliant, scalable, low-cost, highly durable backup system Origin of Universities (Charlemagne, 814 C.E.) The Empire

needs educated people

Let’s ask the Church!

Edict: Free education in

cathedrals and monasteries

Lots of books (and backups)

Page 14: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Origin ] [

Monastery : Brilliant, scalable, low-cost, highly durable backup system. Origin of Universities (Charlemagne, 814 a.C.) Indoctrination : One of the first critical function within an organization (Catholic Church) that needed continuation after any natural or human-induced disaster. It needed backup of books (Bibles, etc.) in order to function.

Barbarians, pestilences, fires, invasions, wars,

famines, revolts, etc.

Page 15: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Why is Monte Cassino important? ] [

Page 16: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

World War II ] [

Page 17: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Dec 1942: Many “treasures” are transported from Rome and other places to Monte

Cassino, for safety

The Treasure of Monte Cassino ] [

Page 18: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lost in translation ] [

It means “Military Division”

(abbreviated)

Intercepted German message: “Ist der Abt noch im Kloster?”

“Ja.”

It also means “Abbot”

(abbreviated)

Page 19: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Abbey of Monte Cassino ] [

Page 20: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

The Treasure of Monte Cassino ] [

Feb 1944: Schlegel and Becker (Panzer-Division Hermann Göring) had the treasures transferred to the Vatican

x

Page 21: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Escape from Monte Cassino ] [

Page 22: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Escape from Monte Cassino ] [

Lt. Col. Julius Schlegel

(an Austrian Roman Catholic)

Capt. Maximilian Becker

(a Protestant surgeon)

Page 23: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

“Biggest bombing against a single target of all time”

Page 24: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

2

4

Monte Cassino after bombing (1944) ] [

Page 25: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Restoration in 1954 ] [

Page 26: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

The Abbey of Monte Cassino today ] [

Page 27: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

End of Prologue

Page 28: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons from Monte Cassino

Part II

Page 29: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

1. My backup should be accessible

a.k.a. the pain of physical data

transfer

Page 30: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

AWS

1. My backup should be accessible

API AWS Direct Connect

AWS Storage Gateway

Customer owns the data

Redundancy

AWS Import/Export

Page 31: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

AWS Storage Gateway ] [

GW-stored volumes

Page 32: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

z

Page 33: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

GW-Cached volumes

GW-stored volumes

“Cool” storage

“Cold”

w

Page 34: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

VPN

Public / AWS Direct Connect

AWS Import/Export

z

Page 35: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

2. My backup should be able to scale

Page 36: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons from Monte Cassino ] [

2. My backup should be able to scale

• “Infinite” scale with Amazon S3 and Amazon Glacier • Scale to multiple regions • Seamless • No need to provision • Cost tiers (cheaper at scale)

Page 37: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Regions (8) GovCloud Regions (1)

(as of Nov 27th, 2012)

Global AWS Infrastructure ] [

Page 38: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Availability Zones (23)

Global AWS Infrastructure ] [ (as of Nov 27th, 2012)

Page 39: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Edge Locations (38)

Dallas (2)

St.Louis

Miami

Jacksonville Los Angeles (2)

Palo Alto

Seattle

Ashburn (2)

Newark New York (2)

Dublin

London Amsterdam (2) Stockholm

Frankfurt (2) Paris

Singapore (2)

Hong Kong

Tokyo

São Paulo

South Bend

San Jose

Osaka

Milan

Sydney

Madrid

Global AWS Infrastructure ] [ (as of Nov 27th, 2012)

Page 40: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

3. My backup should be safe

Page 41: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons from Monte Cassino ] [

3. My backup should be safe

• SSL Endpoints (Amazon S3 and Amazon Glacier) • Signed API calls • Store encrypted files • Server-side encryption • Durability: multiple copies across different data centers • Local/cloud with AWS Storage Gateway

Page 42: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

3. My backup should be safe

Page 43: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

4. My backup should work with a DR policy

(I don’t want to wait 10 years… )

Page 44: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons from Monte Cassino ] [

4. My backup should work with a DR policy

• Easy to integrate within AWS or Hybrid • AWS Storage Gateway: Run services on Amazon EC2 (DR) • Clear costs • Reduced costs • I decide redundancy/availability in relation to costs

Page 45: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud
Page 46: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons from Monte Cassino ] [

5. Someone should care about it

• Clear ownership • Permissions with IAM: Users, groups roles • Logs • AWS support

Page 47: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons from Monte Cassino ] [

1. My backup should be accessible

2. My backup should be able to scale

3. My backup should be safe

4. My backup should work with a DR policy

5. Someone should care about it

Page 48: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

A customer story

Part III

Page 49: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Augusto Rosa Manager, Server Operations – Shaw Media

augusto.rosa @ shawmedia.ca

Page 50: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

50

Shaw Media ] [

Page 51: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Who we are ] [ • Shaw Media: Division of Shaw Communications Inc. • It reaches almost 100% of Canadians; 18 specialty channels • Global national newscast: 1+ million viewers every weekday • Access to full episodes: 20 websites, 4 video-on-demand • It engages with 25+ million Canadians per week

Page 52: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Before AWS ] [ • Data centers in Winnipeg and Toronto • Challenge to manage, frequent power outages, downtime • Expensive hosting fees inherited from parent company • Technology was old and in disarray (total revamp needed)

Page 53: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Mission Impossible? ] [

Page 54: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Mission ] [ • Implement a new CMS • Empower the editorial team • Business objectives • Time frame of 9 months • Be agile and cost effective

Page 55: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

AWS

Page 56: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Amazon SQS Amazon SNS Amazon SES AWS Marketplace Amazon FPS Amazon DevPay Amazon Mechanical Turk Amazon Route 53 Amazon VPC AWS Direct Connect Amazon S3 Amazon Glacier Amazon EBS AWS Import/Export AWS Storage Gateway AWS Support

Amazon EC2 Amazon EMR Auto Scaling Elastic Load Balancing Amazon CloudFront Amazon RDS Amazon DynamoDB Amazon SimpleDB Amazon ElastiCache AWS Identity and Access Management Amazon CloudWatch AWS Elastic Beanstalk AWS CloudFormation Amazon CloudSearch Amazon SWF Alexa WIS and Alexa Top Sites

Amazon SQS Amazon SNS Amazon SES AWS Marketplace Amazon FPS Amazon DevPay Amazon Mechanical Turk Amazon Route 53 Amazon VPC AWS Direct Connect Amazon S3 Amazon Glacier Amazon EBS AWS Import/Export AWS Storage Gateway AWS Support

Amazon EC2 Amazon EMR Auto Scaling Elastic Load Balancing Amazon CloudFront Amazon RDS Amazon DynamoDB Amazon SimpleDB Amazon ElastiCache AWS Identity and Access Management Amazon CloudWatch AWS Elastic Beanstalk AWS CloudFormation Amazon CloudSearch Amazon SWF Alexa WIS and Alexa Top Sites

Page 57: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Phase One ] [ • Fast deployment of servers, network rules, load balancers • First site under new CMS: Live in 4 weeks from scratch • Full migration of 29 sites from a physical DC in 9 months

Page 58: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Phase Two ] [ • Full migration of 6 other websites and web services • From 2nd physical DC into AWS in 2 months • Migration: Windows ‘03/SQL ‘05 Windows ‘08/SQL ’08 • Creating new web farms takes 1 to 5 days (versus months) • Takes longer to procure licenses than the infrastructure • Ability to scale and automate

Page 59: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Benefits of Using AWS ] [ • Increased uptime from 98.8% to 99.99% • Scale to success, quicker response to business needs • 1+ M $ saved in capital and operational cost • No physical investment, smaller teams • Allowed using service management third-party companies • Easy backup on AWS 3 years retention (tax credits)

Page 60: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

AWS Architecture ] [

Page 61: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Some Numbers ] [ • 50+ EC2 instances (various sizes) • 25+ TB traffic/month • 40M+ Route 53 queries • 10+ TB backup on Amazon S3

... And growing!

Page 62: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons Learned ] [ • Architecting for AWS in mind from start • Use all Availability Zones in area you choose to host; divide across all • Plan for failures: Be crazy about it (things fail) • Backup backup backup • Monthly AMI • Windows/SQL Server workarounds (failover cluster, AD, etc.) • Engage with AWS Solutions Architects early

Page 63: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Disaster Recovery ] [ • Learn from outages all the time • Implement changes to prevent failures at cloud level • Document how you recover from failures • Single component may fail; architecture shouldn’t

Page 64: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Backup ] [ • Daily snapshots of all volumes automatically • VIP volumes: snapshots every 4 hours • Keep the last 10 snapshots • Dell Replay: It backs up file system files every 1 hour • Volumes replicated to Amazon S3 (Oregon) every 2 hours • SQL Server backup every 30 minutes • SQL Server backup volumes moved to Amazon S3 every 2 hours

Page 65: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Future ] [ • Move from public cloud to VPC • Auto Scaling on Amazon EC2 • Amazon S3 as image repository for all sites • Second cloud vendor as DR (instead of in-house) • Amazon ElastiCache for central caching for ASP.net apps

Page 66: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Augusto Rosa Manager, Server Operations – Shaw Media

augusto.rosa @ shawmedia.ca

Page 67: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

The 2012 Emilia Earthquake

Part IV

Page 68: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

May 20th, 2012: Earthquake in Italy ] [

Page 69: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud
Page 70: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Parmigiano warehouse (0.5B € damage) ] [

Page 71: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

“Let’s do something NOW” ] [

Page 72: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Buy 1 Kg of Parmigiano for 1 Euro ] [

Page 73: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

7

3

73

Everybody helped ] [

Page 74: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons from an Earthquake

Part V

Page 75: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

1. You NEED a DR in place!

2. Testing your DR

3. Reducing costs

4. You can have different DR solutions

Lessons from an Earthquake ] [

Page 76: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

1. You NEED a DR in place!

Page 77: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

DR with High Availability

Page 78: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

App DR with Standby

Page 79: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

7

9

Business Impact Analysis (RTO, RPO)

Page 80: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons from an Earthquake ] [

• RTO (Recovery Time Objective): 1) Time for trying to fix the problem 2) The recovery itself 3) Testing 4) Tell users • RPO (Recovery Point Objective): how much data I can lose

Business Impact Analysis (RTO, RPO)

Page 81: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons from an Earthquake ] [

1) Backup and Restore 2) “Pilot light” for quick recovery into AWS (cold standby) 3) Warm standby solution on AWS 4) Multi-site hybrid solution (AWS + on premises)

Different Types of DR Architecture

Page 82: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Cost ($/GB/month) Performance Durability

Amazon S3 0.125

Amazon Glacier 0.01

AWS Storage Gateway

0.125 (+ 125/GW)

Amazon EBS 0.10

Amazon EBS (PIOPS) 0.125

Page 83: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

8

3

83

2. Testing your DR

Page 84: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons from an Earthquake ] [

• Dev/test in the cloud is super easy • Spin up capacity only for the test • Regularly test your DR • Cost is minimal • What about data transfer speed?

2. Testing your DR

Page 85: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

s3cmd ls --recursive

s3://datasets.elasticmapreduce/ngrams/b

ooks/ | awk '{print $4;

sub(/s3:\/\/datasets.elasticmapreduce/,

"/array", $4); print $4}' | parallel -

j0 -N2 --progress /usr/bin/s3cmd --no-

progress get {1} {2}

Special thanks to Craig Carl, AWS Solutions Architect

Page 86: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

s3cmd ls --recursive

s3://datasets.elasticmapreduce/ngrams/b

ooks/ | awk '{print $4;

sub(/s3:\/\/datasets.elasticmapreduce/,

"/array", $4); print $4}' | parallel -

j0 -N2 --progress /usr/bin/s3cmd --no-

progress get {1} {2}

Lists every object in the bucket

Page 87: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

s3cmd ls --recursive

s3://datasets.elasticmapreduce/ngrams/b

ooks/ | awk '{print $4;

sub(/s3:\/\/datasets.elasticmapreduce/,

"/array", $4); print $4}' | parallel -

j0 -N2 --progress /usr/bin/s3cmd --no-

progress get {1} {2}

Gets the path to the Amazon

S3 object and the local destination path

Page 88: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

s3cmd ls --recursive

s3://datasets.elasticmapreduce/ngrams/b

ooks/ | awk '{print $4;

sub(/s3:\/\/datasets.elasticmapreduce/,

"/array", $4); print $4}' | parallel -

j0 -N2 --progress /usr/bin/s3cmd --no-

progress get {1} {2}

Runs parallel with as many threads as possible, '-N2' tells

parallel there were two arguments on stdin and

assigns them to {1} and {2}

Page 89: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

s3cmd ls --recursive

s3://datasets.elasticmapreduce/ngrams/b

ooks/ | awk '{print $4;

sub(/s3:\/\/datasets.elasticmapreduce/,

"/array", $4); print $4}' | parallel -

j0 -N2 --progress /usr/bin/s3cmd --no-

progress get {1} {2}

It’s the command that GNU Parallel will run, '{1}' is

substituted with the Amazon S3 object path, '{2}' is

substituted with the local destination path

Page 90: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

s3cmd ls --recursive

s3://datasets.elasticmapreduce/ngrams/b

ooks/ | awk '{print $4;

sub(/s3:\/\/datasets.elasticmapreduce/,

"/array", $4); print $4}' | parallel -

j0 -N2 --progress /usr/bin/s3cmd --no-

progress get {1} {2}

Copying 2.4 TB down from 48 hours to 9 hours (5x faster)

Page 91: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

3. Reducing costs

Page 92: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons from an Earthquake ] [

1) AWS cost reduction (e.g., S3 cost reduction on Nov 28) 2) Reduced redundancy (Amazon S3) 3) Retention policy 4) Hot/warm/cool/cold backup 5) Reserved capacity/tiers

3. Reducing costs

Page 93: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

0–1 TB 0.125 0.093

1–50 TB 0.110 0.083

50–500 TB 0.95 0.073

500–1,000 TB 0.90 0.063

1–5 PB 0.80 0.053

5+ PB 0.55 0.037

Amazon S3 Standard $/GB/Month

Reduced $/GB/Month

Page 94: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

94

4. You can have different DR solutions

Page 95: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Lessons from an Earthquake ] [

• Easy to integrate existing vendors with DR on AWS • Approach: One vendor/hybrid/multiple vendors • One region/multi-regions (if you need geodiversity)

4. You can have different DR solutions

Page 96: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

1. You NEED a DR in place!

2. Testing your DR

3. Reducing costs

4. You can have different DR solutions

Lessons from an Earthquake ] [

Page 97: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Conclusions

Part VI

Page 98: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud
Page 99: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Backups Disaster Recovery

Action items

Agility Cost savings Control

x

Page 100: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud

Parmigiano, a Monastery, Love and Faith

Simone Brunozzi Senior Technology Evangelist, Amazon Web Services

@simon

Technical lessons on how to do Backup and Disaster Recovery in the Cloud