59
Active Archiving with Amazon S3 ….and Tiering To Glacier Marc Trimuschat AWS Storage Services

Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Embed Size (px)

Citation preview

Page 1: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Active Archiving with Amazon S3 ….and Tiering To

Glacier

Marc TrimuschatAWS Storage Services

Page 2: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Data has gravity

…easier to move processing to the data

ProcessPartner

4k/8kGenomics

SeismicFinancial

LogsIoT

Page 3: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Cloud Data Migration

Direct ConnectSnow* data transport family

3rd Party Connectors

Transfer Acceleration

Storage Gateway

Kinesis Firehose

AWS Storage Platform and SolutionsThe AWS Storage Portfolio

Object

Amazon GlacierAmazon S3

Block

Amazon EBS (persistent)

Amazon EC2 Instance Store

(ephemeral)File

Amazon EFS

Page 4: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Audio Archives – SoundCloud• World’s leading social sound platform

• Audio files transcoded and stored in multiple formats

• Stores PBs of data

• Transcoded files served from Amazon S3

• Originals moved to Amazon Glacier for long-term retention

Page 5: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Satellite Image Archive

• DigitalGlobe takes Satellite imagery of the Earth• 100PB image library = 6 billion square kilometers • 1PB new image every year• Images to be archived and retained for decades

Page 6: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Patient Data–Philips Healthcare

• HealthSuite digital platform powered by AWS

• 15 petabytes of patient data

• Archived for decades (beyond the lifetime of patients)

• Uses AWS HIPAA-eligible services in the BAA

Page 7: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Archive: Data retained for the long term,

for compliance or potential future reference

Data archiving needs are growing everywhere

• Media assets, 4K, 8K• Health care/life sciences • Financial services• Regulated industries• Oil and gas/geospatial• Digital preservation• Long-term backups• Logs

Page 8: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

AWS Storage Review

Page 9: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Choice of storage classes

Standard

Active data Archive dataInfrequently accessed data

Standard - Infrequent Access Amazon Glacier

Page 10: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

- Transition Standard to Standard-IA

- Transition Standard-IA to Amazon Glacier

- Expiration lifecycle policy

- Versioning support

- Prefix support

Data Lifecycle Management

T T T T T T T T T T T TT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 days

Data access frequency over time

Page 11: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Cross-Region Replication Lifecycle Policy

Data Classification& Management

Event Notifications

CloudWatch Metrics S3 Inventory Audit with CloudTrail Data Events

Storage Analytics

Standard Standard - Infrequent Access Amazon Glacier

Amazon S3: What’s New

Page 12: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Data-driven storage management for S3

• Analyze storage usage to transition the right data to the right storage class• Understand how storage usage changes as your S3 objects get older• Discover how much of your storage is retrieved over time

Page 13: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Manage your dataData Classification and Management

Manage data based on what it is as opposed to where its located

• Easy data management• Classify your data

• Tag your objects with key-value pairs

• Write policies once based on the type of data

Classification Lifecycle PolicyAccess Control

Page 14: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Amazon Glacier

• Extremely low-cost archive storage service, starting at $0.004 GB/mo

• 3 retrieval options: Expedited (1-5min), Standard (3-5hrs), Batch (5-12 hrs)

• 99.999999999% of durability (5-6 orders of magnitude higher than 2 copies of tape)

• All data is encrypted at rest

• Features: compliance, data management, cost management, audit logging

Page 15: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Glacier: Key Concepts• Vaults – Container for archives, up to 1,000 vaults per account• Archives – basic unit, write-once, 40TB max, unlimited archives • Inventory – Cold index of archives refreshed every 24 hours• Access – Three ways to access Glacier• Uploads – Multi-part, lifecycle, cost optimizations, Snowball• Data management – Vault Lock, tagging, audit logs• Retrievals – Retrieval policies, range retrievals, new feature announcements

Page 16: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Archive Consideration 1 – Total Archive Cost

Page 17: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Traditional archiving approaches

• Tape libraries, robots, drives, media• Onsite (online and offline)• Offsite tape out/vaulting• Specialized software and personnel• Tape refresh every 3-5 years

Page 18: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

How can AWS help with your archival?

Metered usage:Pay as you go

No capital investmentNo commitment

No risky capacity planning

Avoid risks of physical media handling

Control your geographic locality for performance

and compliance

Page 19: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Consideration 2 – Durability

Page 20: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Amazon S3 and Glacier Durability

4 9s durability

5 9s durability

S3 - IA Glacier

11 9s durability

Page 21: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

99.999999999%Durability

Durability for long-term preservation

Built-in Fixity Checking

Automatic recovery

Page 22: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Consideration 3 – Accessibility

Page 23: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Amazon Glacier – Data Retrieval TiersStandard Retrieval• Current model

• 3-5 hours

• Disaster Recovery

Bulk Retrieval• Batch/Bulk access

• 5-12 hours

• PB scale re-transcoding or video/image analysis

Expedited Retrieval• Emergency access

• 1-5 minutes

• Last minute play-out schedule swap

$0.03/GB $0.01/GB $0.0025/GB

On-site tape replacement Off-site tape replacement

Page 24: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Consideration 4 - Application & Data Management

Page 25: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Accessing Glacier1. S3 lifecycle integration2. Direct Glacier API/SDK3. Third party tools and gateways

FastGlacier

Page 26: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Use Glacier via S3 Lifecycle

S3 Standard

Active data Archive dataInfrequently accessed data

S3 - Infrequent Access Amazon Glacier

Synchronous access Async accessSynchronous access

$0.023/GB/mo. $0.004/GB/mo.$0.0125/GB/mo.

Page 27: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

- Transition Standard to Standard-IA

- Transition Standard-IA to Amazon Glacier

- Transition based on object tags

- Expiration and versioning

Data lifecycle management

T T T T T T T T T T T TT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 days

Data access frequency over time

Page 28: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Transition older videos to Standard-IA

Page 29: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Glacier Direct Upload– The Basics

Create vault1

Configure access policies2

ArchiveApp user policyEffect:AllowResource: arn:aws:glacier:<accountId>:vaults/FilmsAction: glacier:UploadArchive

3 Upload archivesUploadArchive(data) -> Archive ID

Page 30: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Uploading Data: Inter- or Sneaker- net

AWS DirectConnect

Dedicated bandwidth between your site and AWS

InternetTransfer data in a secure SSL tunnel

over the public Internet

AWS Import/Export Snowball

Physical transfer of media into and out of AWS

Page 31: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

AWS Snowball EdgePetabyte-scale hybrid device with onboard compute and storage

• 100 TB local storage

• Local compute equivalent to an Amazon EC2 m4.4xlarge instance

• 10GBase-T, 10/25Gb SFP28, and 40Gb QSFP+ copper, and optical networking

• Ruggedized and rack-mountable

RE:INVENT 2016 LAUNCH

Page 32: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Use cases: AWS Import/Export Snowball

Cloud Migration

Disaster Recovery

Data CenterDecommission

ContentDistribution

Page 33: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

AWS storage migration expansion:AWS Snowmobile

Page 34: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Storage Gateway Enables Hybrid Storage SolutionsUse standard storage protocols to access AWS storage services

Customer Premises

FileVolume

Tape

Amazon EBS snapshots

Amazon S3Amazon Glacier

AWS IAM

AWS KMS

AWS CloudTrail

Amazon CloudWatch

Internet

Direct Connect

Amazon VPC

NFS

Enterprise storage

Backup servers

Applicationservers

iSCSIVTL

Page 35: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Which option should I choose?

• Use S3 lifecycle managed Amazon Glacier if the S3 object keys are sufficient for index/search capability

• Use Amazon Glacier directly if you already plan to store more metadata/indices in a database

• Use 3rd party tools or AWS Storage Gateway to minimize coding

Page 36: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Media Archive Use Case

Page 37: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

corporate data center

Media Archive and Metadata (cloud transition)

Onsite Archive Offsite Tape Archive

Hierarchical Storage Manager

Metadata (Asset Manager)

Processing Tasks

On-Premise Tape

Page 38: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Onsite Archive

Hierarchical Storage Manager

Metadata (Asset Manager)

Processing Tasks

corporate data center

AWS RegionAmazon Glacier

Cloud DAM (Syncing Metadata from on-prem)

Amazon Direct Connect

Offsite Tape ArchiveOn-Premise Tape

Media Archive (transition to the cloud)

Page 39: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Onsite Archive

Hierarchical Storage Manager

Metadata (Asset Manager)

Processing Tasks

corporate data center

AWS RegionAmazon Glacier

Cloud DAM (Syncing Metadata from on-

prem)Amazon S3

Cloud Based Processing Tasks

Amazon Direct Connect

On-Premise Tape Offsite Tape Archive

Media Archive (transition to the cloud)

Page 40: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Onsite Archive

Hierarchical Storage Manager

Metadata (Asset Manager)

Processing Taskscorporate data center

AWS RegionAmazon Glacier

Cloud DAM (Syncing Metadata from on-

prem)Amazon S3

Cloud Based Processing Tasks

Amazon Direct Connect

Onsite Cache Offsite Tape ArchiveOn-Premise Tape

Media Archive (transition to the cloud)

Page 41: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Media Solution: Sony DADC

Problem Statement:• Challenged by on-prem legacy infrastructure.• Provide a performant, secure, economical media distribution solution.• Decrease time to market for their customer’s finished content.

Use of AWS:• EC2 content processing and SWF, SQS, SNS for media workflow

automation• S3 for storage, Glacier for content archive• CloudFront for OTT.

Business Benefits: • Workflow pipelines can be run in a highly parallelized fashion through

AWS elastic scalability.• Significantly shorten content delivery SLA with a new AWS enabled

target of 1-hr.• Fully migrating away from on-prem infrastructure.

On-demand cloud-based media supply chain and delivery solution

Page 42: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

• Media distribution backbone (Ve.nue platform)• Over-The-Top (OTT) broadcast service• 20PBs of media assets, 1MM+ hours of high-res content • Assets to be archived and retained for decades

Video archives

Page 43: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Comprehensive media lifecycle

@SonyDADCNMS

Page 44: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

“If physical deliveries can happen within one hour based on unpredictable

requests, surely we are able to exceed such expectations digitally”

@SonyDADCNMS

Page 45: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Sony Migration

The Challenge

• Seamlessly migrate a platform that enables content delivery across all devices and more than 1,200 distribution points worldwide

• Store 20 petabytes of motion picture and television content

• Equating to 1,000,000M+ Hrs of content

• At a growth curve of ~1 petabyte every quarter

Desired Goals:

• One hour delivery turn around time

• Agile, scalable, predictable cost model & infrastructure

• Investing in innovation vs. hardware

@SonyDADCNMS

Page 46: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

On-Premise Asset Storage Workflow

@SonyDADCNMS

Page 47: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

AWS Cloud-based Asset Storage Workflow

@SonyDADCNMS

Page 48: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Glacier vs. On-Prem Cost Comparison

@SonyDADCNMS

Page 49: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Consideration 5 - Compliance and Retention

Page 50: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Amazon Glacier Vault Lock allows you to easily set compliance controls on individual vaults and enforce them via

a lockable policy

Time-based retentionMFA authentication

Controls govern all records in a vault

Immutable policyTwo-step locking

Compliance storage with Vault Lock

Page 51: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Glacier Vault Lock• Non-overwrite, non-erasable records

• Time-based retention with “ArchiveAgeInDays” control

• Policy lockdown (strong governance)

• Legal hold with vault-level tags

• Configure designated third-party access and grant temporary access

Amazon Glacier received a third-party assessment from Cohasset Associates on how Amazon Glacier with Vault Lock can be used to meet the requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c).

Page 52: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Proofpoint• Cloud-based security and compliance for the enterprise: threat

research, email, mobile, social, digital risk• Founded 2002, public in 2012• $350M annual revenue, $3B market cap

Page 53: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Proofpoint SocialPatrol• Policy controls and enforcement for social• Combats fraudulent brand impersonation• Moderates content at scale• Ensures compliance in publishing• Integrates with social APIs• 150+ classifiers using NLP and ML

• Text, links, images, meta data

• Ingesting >1M social posts per day• Built in AWS

Page 54: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Proofpoint SocialPatrol Archive with Glacier

• SEC Rule 17a-4(f)-compliant archive, purpose-built for social, enabled by Amazon Glacier and Vault Lock

PFPT in AWS

Policy engine MySQL/C*/SolrSocialAmazon Glacier &

Vault Lock

Page 55: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Proofpoint SocialPatrol Archive• The customer specifies the retention period in Proofpoint Social:

Page 56: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Proofpoint SocialPatrol Archive• Via AWS API we create a vault for that customer:

Page 57: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Proofpoint SocialPatrol Archive• Via AWS API,

we lock the vault,and specify policyto observe alegal hold via a tag.

Page 58: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Active-Archive Resources• Amazon S3: https://aws.amazon.com/s3/

• Amazon S3 Deep Dive (re-invent 2016): https://www.youtube.com/watch?v=bMhWWkhydFQ&t=249s

• Amazon Glacier: https://aws.amazon.com/glacier/

• Amazon Glacier Deep-Dive (re:Invent 2016): https://www.youtube.com/watch?v=dfr9mBcDJ-U

• WORM Compliance Assessment: https://aws.amazon.com/blogs/aws/glacier-cohasset-assessment/

• Sony Case Study: https://aws.amazon.com/solutions/case-studies/sony-dadc/

• Backup & Archive TCO Calculator: http://www.backuparchive.awstcocalculator.com/

Page 59: Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AWS Online Tech Talks

Thank You!