Upload
amazon-web-services
View
385
Download
5
Embed Size (px)
Citation preview
Active Archiving with Amazon S3 ….and Tiering To
Glacier
Marc TrimuschatAWS Storage Services
Data has gravity
…easier to move processing to the data
ProcessPartner
4k/8kGenomics
SeismicFinancial
LogsIoT
Cloud Data Migration
Direct ConnectSnow* data transport family
3rd Party Connectors
Transfer Acceleration
Storage Gateway
Kinesis Firehose
AWS Storage Platform and SolutionsThe AWS Storage Portfolio
Object
Amazon GlacierAmazon S3
Block
Amazon EBS (persistent)
Amazon EC2 Instance Store
(ephemeral)File
Amazon EFS
Audio Archives – SoundCloud• World’s leading social sound platform
• Audio files transcoded and stored in multiple formats
• Stores PBs of data
• Transcoded files served from Amazon S3
• Originals moved to Amazon Glacier for long-term retention
Satellite Image Archive
• DigitalGlobe takes Satellite imagery of the Earth• 100PB image library = 6 billion square kilometers • 1PB new image every year• Images to be archived and retained for decades
Patient Data–Philips Healthcare
• HealthSuite digital platform powered by AWS
• 15 petabytes of patient data
• Archived for decades (beyond the lifetime of patients)
• Uses AWS HIPAA-eligible services in the BAA
Archive: Data retained for the long term,
for compliance or potential future reference
Data archiving needs are growing everywhere
• Media assets, 4K, 8K• Health care/life sciences • Financial services• Regulated industries• Oil and gas/geospatial• Digital preservation• Long-term backups• Logs
AWS Storage Review
Choice of storage classes
Standard
Active data Archive dataInfrequently accessed data
Standard - Infrequent Access Amazon Glacier
- Transition Standard to Standard-IA
- Transition Standard-IA to Amazon Glacier
- Expiration lifecycle policy
- Versioning support
- Prefix support
Data Lifecycle Management
T T T T T T T T T T T TT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 days
Data access frequency over time
Cross-Region Replication Lifecycle Policy
Data Classification& Management
Event Notifications
CloudWatch Metrics S3 Inventory Audit with CloudTrail Data Events
Storage Analytics
Standard Standard - Infrequent Access Amazon Glacier
Amazon S3: What’s New
Data-driven storage management for S3
• Analyze storage usage to transition the right data to the right storage class• Understand how storage usage changes as your S3 objects get older• Discover how much of your storage is retrieved over time
Manage your dataData Classification and Management
Manage data based on what it is as opposed to where its located
• Easy data management• Classify your data
• Tag your objects with key-value pairs
• Write policies once based on the type of data
Classification Lifecycle PolicyAccess Control
Amazon Glacier
• Extremely low-cost archive storage service, starting at $0.004 GB/mo
• 3 retrieval options: Expedited (1-5min), Standard (3-5hrs), Batch (5-12 hrs)
• 99.999999999% of durability (5-6 orders of magnitude higher than 2 copies of tape)
• All data is encrypted at rest
• Features: compliance, data management, cost management, audit logging
Glacier: Key Concepts• Vaults – Container for archives, up to 1,000 vaults per account• Archives – basic unit, write-once, 40TB max, unlimited archives • Inventory – Cold index of archives refreshed every 24 hours• Access – Three ways to access Glacier• Uploads – Multi-part, lifecycle, cost optimizations, Snowball• Data management – Vault Lock, tagging, audit logs• Retrievals – Retrieval policies, range retrievals, new feature announcements
Archive Consideration 1 – Total Archive Cost
Traditional archiving approaches
• Tape libraries, robots, drives, media• Onsite (online and offline)• Offsite tape out/vaulting• Specialized software and personnel• Tape refresh every 3-5 years
How can AWS help with your archival?
Metered usage:Pay as you go
No capital investmentNo commitment
No risky capacity planning
Avoid risks of physical media handling
Control your geographic locality for performance
and compliance
Consideration 2 – Durability
Amazon S3 and Glacier Durability
4 9s durability
5 9s durability
S3 - IA Glacier
11 9s durability
99.999999999%Durability
Durability for long-term preservation
Built-in Fixity Checking
Automatic recovery
Consideration 3 – Accessibility
Amazon Glacier – Data Retrieval TiersStandard Retrieval• Current model
• 3-5 hours
• Disaster Recovery
Bulk Retrieval• Batch/Bulk access
• 5-12 hours
• PB scale re-transcoding or video/image analysis
Expedited Retrieval• Emergency access
• 1-5 minutes
• Last minute play-out schedule swap
$0.03/GB $0.01/GB $0.0025/GB
On-site tape replacement Off-site tape replacement
Consideration 4 - Application & Data Management
Accessing Glacier1. S3 lifecycle integration2. Direct Glacier API/SDK3. Third party tools and gateways
FastGlacier
Use Glacier via S3 Lifecycle
S3 Standard
Active data Archive dataInfrequently accessed data
S3 - Infrequent Access Amazon Glacier
Synchronous access Async accessSynchronous access
$0.023/GB/mo. $0.004/GB/mo.$0.0125/GB/mo.
- Transition Standard to Standard-IA
- Transition Standard-IA to Amazon Glacier
- Transition based on object tags
- Expiration and versioning
Data lifecycle management
T T T T T T T T T T T TT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+3 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+5 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT+ 15 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 25 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 30 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 60 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 90 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 150 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 250 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 daysT + 365 days
Data access frequency over time
Transition older videos to Standard-IA
Glacier Direct Upload– The Basics
Create vault1
Configure access policies2
ArchiveApp user policyEffect:AllowResource: arn:aws:glacier:<accountId>:vaults/FilmsAction: glacier:UploadArchive
3 Upload archivesUploadArchive(data) -> Archive ID
Uploading Data: Inter- or Sneaker- net
AWS DirectConnect
Dedicated bandwidth between your site and AWS
InternetTransfer data in a secure SSL tunnel
over the public Internet
AWS Import/Export Snowball
Physical transfer of media into and out of AWS
AWS Snowball EdgePetabyte-scale hybrid device with onboard compute and storage
• 100 TB local storage
• Local compute equivalent to an Amazon EC2 m4.4xlarge instance
• 10GBase-T, 10/25Gb SFP28, and 40Gb QSFP+ copper, and optical networking
• Ruggedized and rack-mountable
RE:INVENT 2016 LAUNCH
Use cases: AWS Import/Export Snowball
Cloud Migration
Disaster Recovery
Data CenterDecommission
ContentDistribution
AWS storage migration expansion:AWS Snowmobile
Storage Gateway Enables Hybrid Storage SolutionsUse standard storage protocols to access AWS storage services
Customer Premises
FileVolume
Tape
Amazon EBS snapshots
Amazon S3Amazon Glacier
AWS IAM
AWS KMS
AWS CloudTrail
Amazon CloudWatch
Internet
Direct Connect
Amazon VPC
NFS
Enterprise storage
Backup servers
Applicationservers
iSCSIVTL
Which option should I choose?
• Use S3 lifecycle managed Amazon Glacier if the S3 object keys are sufficient for index/search capability
• Use Amazon Glacier directly if you already plan to store more metadata/indices in a database
• Use 3rd party tools or AWS Storage Gateway to minimize coding
Media Archive Use Case
corporate data center
Media Archive and Metadata (cloud transition)
Onsite Archive Offsite Tape Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
On-Premise Tape
Onsite Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
corporate data center
AWS RegionAmazon Glacier
Cloud DAM (Syncing Metadata from on-prem)
Amazon Direct Connect
Offsite Tape ArchiveOn-Premise Tape
Media Archive (transition to the cloud)
Onsite Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
corporate data center
AWS RegionAmazon Glacier
Cloud DAM (Syncing Metadata from on-
prem)Amazon S3
Cloud Based Processing Tasks
Amazon Direct Connect
On-Premise Tape Offsite Tape Archive
Media Archive (transition to the cloud)
Onsite Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Taskscorporate data center
AWS RegionAmazon Glacier
Cloud DAM (Syncing Metadata from on-
prem)Amazon S3
Cloud Based Processing Tasks
Amazon Direct Connect
Onsite Cache Offsite Tape ArchiveOn-Premise Tape
Media Archive (transition to the cloud)
Media Solution: Sony DADC
Problem Statement:• Challenged by on-prem legacy infrastructure.• Provide a performant, secure, economical media distribution solution.• Decrease time to market for their customer’s finished content.
Use of AWS:• EC2 content processing and SWF, SQS, SNS for media workflow
automation• S3 for storage, Glacier for content archive• CloudFront for OTT.
Business Benefits: • Workflow pipelines can be run in a highly parallelized fashion through
AWS elastic scalability.• Significantly shorten content delivery SLA with a new AWS enabled
target of 1-hr.• Fully migrating away from on-prem infrastructure.
On-demand cloud-based media supply chain and delivery solution
• Media distribution backbone (Ve.nue platform)• Over-The-Top (OTT) broadcast service• 20PBs of media assets, 1MM+ hours of high-res content • Assets to be archived and retained for decades
Video archives
Comprehensive media lifecycle
@SonyDADCNMS
“If physical deliveries can happen within one hour based on unpredictable
requests, surely we are able to exceed such expectations digitally”
@SonyDADCNMS
Sony Migration
The Challenge
• Seamlessly migrate a platform that enables content delivery across all devices and more than 1,200 distribution points worldwide
• Store 20 petabytes of motion picture and television content
• Equating to 1,000,000M+ Hrs of content
• At a growth curve of ~1 petabyte every quarter
Desired Goals:
• One hour delivery turn around time
• Agile, scalable, predictable cost model & infrastructure
• Investing in innovation vs. hardware
@SonyDADCNMS
On-Premise Asset Storage Workflow
@SonyDADCNMS
AWS Cloud-based Asset Storage Workflow
@SonyDADCNMS
Glacier vs. On-Prem Cost Comparison
@SonyDADCNMS
Consideration 5 - Compliance and Retention
Amazon Glacier Vault Lock allows you to easily set compliance controls on individual vaults and enforce them via
a lockable policy
Time-based retentionMFA authentication
Controls govern all records in a vault
Immutable policyTwo-step locking
Compliance storage with Vault Lock
Glacier Vault Lock• Non-overwrite, non-erasable records
• Time-based retention with “ArchiveAgeInDays” control
• Policy lockdown (strong governance)
• Legal hold with vault-level tags
• Configure designated third-party access and grant temporary access
Amazon Glacier received a third-party assessment from Cohasset Associates on how Amazon Glacier with Vault Lock can be used to meet the requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c).
Proofpoint• Cloud-based security and compliance for the enterprise: threat
research, email, mobile, social, digital risk• Founded 2002, public in 2012• $350M annual revenue, $3B market cap
Proofpoint SocialPatrol• Policy controls and enforcement for social• Combats fraudulent brand impersonation• Moderates content at scale• Ensures compliance in publishing• Integrates with social APIs• 150+ classifiers using NLP and ML
• Text, links, images, meta data
• Ingesting >1M social posts per day• Built in AWS
Proofpoint SocialPatrol Archive with Glacier
• SEC Rule 17a-4(f)-compliant archive, purpose-built for social, enabled by Amazon Glacier and Vault Lock
PFPT in AWS
Policy engine MySQL/C*/SolrSocialAmazon Glacier &
Vault Lock
Proofpoint SocialPatrol Archive• The customer specifies the retention period in Proofpoint Social:
Proofpoint SocialPatrol Archive• Via AWS API we create a vault for that customer:
Proofpoint SocialPatrol Archive• Via AWS API,
we lock the vault,and specify policyto observe alegal hold via a tag.
Active-Archive Resources• Amazon S3: https://aws.amazon.com/s3/
• Amazon S3 Deep Dive (re-invent 2016): https://www.youtube.com/watch?v=bMhWWkhydFQ&t=249s
• Amazon Glacier: https://aws.amazon.com/glacier/
• Amazon Glacier Deep-Dive (re:Invent 2016): https://www.youtube.com/watch?v=dfr9mBcDJ-U
• WORM Compliance Assessment: https://aws.amazon.com/blogs/aws/glacier-cohasset-assessment/
• Sony Case Study: https://aws.amazon.com/solutions/case-studies/sony-dadc/
• Backup & Archive TCO Calculator: http://www.backuparchive.awstcocalculator.com/
Thank You!