Build hybrid storage architectures

Preview:

Citation preview

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Build hybrid storage architectures with AWS Storage Gateway

S T G 3 0 5

Asa Kalavade

AWS Storage Gateway General Manager

Paul Reed

AWS Storage Gateway Principal Product Manager

Mohammad Shaikh

Director of Research

ComputingBristol-Myers Squibb

Oleg Moiseyenko

Sr. Cloud Architect, Bristol-Myers Squibb

… then you’ve come to the right session

Are you faced with these on-premises storage challenges

Growing backup infrastructure costs

Storage capacity limits

Limited access to in-cloud data

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Use cases

Customer case study - BMS

New features deep dive

Storage Gateway overview

Summary

AWS Storage Gateway

Provides on-premises access to virtually unlimited cloud storage …

… regardless of cloud adoption stage

Move on-premises backups

to the cloud

Provide low latency access for

on-premises applications to

cloud data

Shift on-premises storage to

cloud-backed file shares

Tens of thousands of customers

PBs ingested

every day

Average 96% reduction of on-premises storage

100s of PBs managed in-cloud

AWS Storage Gateway

Managing rapidly growing customer datasets …

… and serving more customers every day

Some AWS Storage Gateway customers

Integrated with AWS Identity and Access Management

(IAM), AWS Key Management Service (AWS KMS),

AWS CloudTrail, Amazon CloudWatch services

AWS Storage Gateway

Configuration: VMware ESXi, Microsoft Hyper-V,

Amazon Elastic Compute Cloud (Amazon EC2),

Hardware Appliance

AWS CloudCustomer premise

Files

(NFS/SMB)

Volumes

(iSCSI)

Tapes

(iSCSI VTL)

AWS Storage GatewayAmazon S3

Glacier

Amazon S3

Amazon Elastic

Block Store

(Amazon EBS)

AWS Backup

Amazon S3

Glacier Deep

Archive

Storage Gateway serviceStorage Gateway

HTTPS

• Low latency cached access to data in Amazon S3

• Support for NFS (POSIX) and SMB file shares (Windows ACLs)

• One-to-one mapping between files and objects in S3

Features

File GatewayStore and access objects in Amazon S3 from file-based applications with local caching

On-Premises

NFS & SMB

File Gateway

HTTPS

Amazon

S3 bucketApplication Storage

Gateway

service

• Presents block storage over iSCSI in cached mode (recently accessed data) or stored mode (full volume)

• Cost-efficient incremental Amazon EBS snapshots of volumes managed through AWS Backup

• Compresses data between gateway and cloud to minimize storage charges

Features

Volume GatewayBlock storage on-premises backed by cloud storage

Storage

Gateway

service

On-Premises

iSCSI HTTPS

Volume

Gateway

Amazon EBS

snapshots

Application

• Emulates physical tape library through iSCSI-VTL protocol

• Compatible with most major backup applications

• Archive virtual tapes in S3 Glacier Deep Archive, lowest cost cloud storage, or S3 Glacier

Features

Tape Gateway

Learn more … STG217 – Shift your tape backups to AWS to save time and money

Tuesday, Dec 3, 5:30 PM - 6:30 PM

On-Premises

iSCSI VTL

Tape Gateway

HTTPS

Application

Storage Gateway service

Tape library(Amazon S3)

Tape shelf(S3 Glacier Deep Archive)

OR (S3 Glacier)

File

Gateway

Volume

GatewayTape

Gateway

What’s new since re:Invent 2018

NEW!

NEW!

NEW!

What’s new since re:Invent 2018

Hardware appliance Enterprise features

◉◉◉◉

◉◉◉◉◉

◉◉◉ ◉◉◉

Regions

• Currently available in 20

regions, including China

(Beijing), and GovCloud

(US-West)

NEW!

NEW!

NEW!

Limited time incentive for Hardware ApplianceMONDAY

CYBER

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

AWS Storage Gateway

Provides on-premises access to virtually unlimited cloud storage …

… regardless of cloud adoption stage

Move on-premises backups

to the cloud

Provide low latency access

for on-premises applications

to cloud data

Shift on-premises storage to

cloud-backed file shares

Move on-premises backups to the cloud

iSCSI VTL

AWS Cloud

File

Gateway

Volume

Gateway

Tape

Gateway

Storage

Gateway

Managed

Service

Database

Server

Application

Server

Backup

Server

iSCSI

NFS/SMB

Tape Library

(Amazon S3)Tape Archive

(S3 Glacier / GDA)

Amazon S3 Amazon EBSAWS Backup

HTTPS

HTTPS

HTTPS

On-premises

Any S3 storage class

lifecycle

Amazon S3

eject

Maintain your backup workflows while reducing your backup infrastructure on-premises

File Gateway for on-premises backupMove database and file backups into the cloud and free up on-premises storage capacity

Features

NFS/SMB protocol support, mount shares directly

on database and application servers

Files stored durably in Amazon S3, lifecycle to any

S3 storage class

Local cache for accessing recent backups

Windows ACL support to control access to

backup files

Support for S3 Object Lock

Bandwidth-optimized, only changes are transferred

Reduce on-premises storage for backups

Easily integrates with SAP, SQL Server,

Oracle, HDFS, and other applications

Restore backups on-premises or in the

cloud on EC2 or RDS

Benefits

AWS Cloud

HTTPSFile

Gateway

NFS/SMBApp/DB

Server

Any S3 storage class

On-premises

Amazon S3

lifecycle

Volume Gateway for on-premises backupEnable faster application recovery in-cloud or on-premises

AWS Cloud

HTTPS

On-premises

Volume

Gateway

Application

Server

iSCSI

Amazon S3 Amazon EBSAWS Backup

Features Benefits

Present cloud-based iSCSI block storage volumes

to on-premises applications

On-premises cache of recently accessed data

Backup volumes as EBS snapshots

Integrates with AWS Backup to coordinate

volume backup and retention

Store volume backups securely

and reliably

Restore backups on-premises or

in the cloud as EBS volumes

Tape Gateway for on-premises backupReplace physical tape infrastructure with virtual tape workflows

Features Benefits

iSCSI VTL interface compatible with leading

backup applications

Active tapes stored in Amazon S3

Ejected tapes stored in S3 Glacier or S3 Glacier

Deep Archive

Automatic fixity checking

Data compressed and encrypted, in-transit

and at-rest

Drop-in replacement for tape libraries,

tape media, and archiving services

Maintain existing backup workflows

Eliminate the hassles of physical tape

Store archived tapes durably and reliably

in Amazon S3 Glacier Deep Archive for

$1/TB/month

iSCSI VTL

AWS Cloud

Tape

Gateway

Backup

Server

Tape Library

S3 Glacier / S3

GDA

HTTPS

Amazon S3

Tape Archive

On-premises

eject

Backing up to physical tapes, sent off-site

Lengthy, unreliable recovery of data from tapes

No new backup budget approved

Couldn’t disrupt their existing operations

Problem

Solution

Outcome

EMC Networker connected to Tape Gateway

Backups stored in Virtual Tape Library (VTL)

on Amazon S3

Archive to Amazon S3 Glacier

No change in backup workflow

50% cost reduction

Parallel backups for one year, then turned off physical tape

Phased out off-site archive in 3 months

Analog Devices is a world leader in the design, manufacture, and marketing of a

broad portfolio of high performance analog, mixed-signal, and digital signal

processing (DSP) integrated circuits (ICs) used in virtually all types of electronic

equipment

Migrating datacenters & applications to AWS

Many on-premises databases and assets to migrate, backup & archive

High backup costs with commercial software

Install File Gateways for backup of SAP on Oracle

environments, hybrid backups, and archives of SQL

databases, Hadoop clusters, and other applications

Keep on-premises access to in-cloud data

~90% reduction in backup costs, eliminating

backup software

With a few TB of storage on premises, get access

to 100s of TB of storage and backups in cloud

Problem

Solution

Outcome

The world's leading cereal company, 2nd largest producer of cookies, crackers, and

savory snacks, and leading North American frozen foods company

Shift on-premises storage to cloud-backed file sharesAccess virtually unlimited, highly durable cloud storage using common file protocols

Features Benefits

Supports NFS and SMB protocols—no application

changes required

Files stored durably in Amazon S3

SMB shares integrate with Active Directory

Amazon CloudWatch events for

automated workflows

Reduce costs by moving storage to Amazon

S3 and accessing on-premises

Virtually unlimited cloud storage—no more

running out of capacity

Eliminate expensive hardware refresh cycles

AWS Cloud

HTTPSFile

Gateway

NFS/SMBApplication

On-premises

Amazon S3

NAS storage

Stacks of disk arrays on-premises were expensive and required a lot of space

Complex architecture and cache hierarchy

Many readers via NFS

Problem

Solution

Outcome

AWS DataSync to transfer bulk data and active

datasets to cloud

File Gateway for local access to cloud data

Active/active multi-region and versioning with

lifecycles

$1M bandwidth cost savings

Saved ~85% on storage, per location

Storage engineers focused on high-value activities

With more than 40,000 auto dealer clients across five continents, we strive to

understand your needs by pairing our insights and research with your business

goals – delivering inspired results to bridge the gap between consumers,

manufacturers, dealers and lenders at every stage of the automotive experience

Learn more … STG354 – Large-scale file migrations with AWS DataSync

Thursday, Dec 5, 3:15 PM - 4:15 PM

Low-latency access for on-premises applications to cloud dataAccess files quickly from distributed locations and scale capacity as needed

Features Benefits

Generate data in-cloud or ingest from on-

premises using AWS DataSync or AWS Snowball

Up to 16 TB local cache per gateway

Fully-managed gateway cache provides low-

latency access to data

Refresh cache at the bucket or prefix level

Access cloud storage from any

on-premises location

Process data in the cloud and refresh

gateway cache for up-to-date results

Data stored cost effectively and centrally

in the cloud

AWS Cloud

Application

NFS/SMB

Cache refresh

HTTPS

Cache refresh

HTTPS

Application

NFS/SMB

On-premises

File Gateway

On-premises

File GatewayIn-cloud processing

AWS

DataSync

AWS

Snowball

Data stored on premises for regulatory and performance reasons

Moved application data to Amazon S3 but developers still need file-based access

Required high level of security, encryption, and scalability

Problem

Solution

Outcome

Deployed multiple file gateways to manage ready

access to cloud data

Use gateways for granular control over data stored

in Amazon S3

Preserve developer access to frequently used data

Use native tools with no proprietary formats

No coding required—works with existing protocols

and OS-level commands

The world's leading and most diverse derivatives marketplace

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

New features deep dive

Customers asked to Feature we delivered

• High availability for all gateway types running

on VMware

• Gateway health checks integrated with VMware

provide application level monitoring including:

• NFS/SMB file share availability

• iSCSI availability

• Configuration errors; e.g., read-only root disks

• Gateway restarts on service interruption

High availability on VMware: Feature overview and benefitsFor VMware-based gateways running on premises or in VMware Cloud on AWS

• Enterprise workloads operate

uninterrupted

• VMware HA protects workloads against

hardware, hypervisor, and network

errors

• Gateway automatically recovers from

most service interruptions in under 60

seconds and maintains its local cache

What is it What are its benefits

How does it workGateway recovery for software, hardware, and datacenter failure scenarios

VMware Host

Software failure Hardware failure

VMware Host VMware Host

Datacenter failure

DR DatacenterCorporate Datacenter

VMware Host VMware Host

• Real-time visibility into cache utilization, gateway

access patterns, and throughput and I/O metrics

through CloudWatch integration

• Administrators can monitor performance and

cache metrics to tune resources based on

application needs

• High ”Cache Percent Dirty” can prompt an increase

in network allocation

• High “Cloud Traffic” can prompt and increase in

cache size

For all environments

Monitor all of your gateways from the console

CloudWatch integration For all environments

Trigger actions and notifications based on events and metrics

Corporate datacenter AWS Cloud

NEW!

NEW!

Gateway software updates are managed

automatically for customers

Granular control over maintenance windows

to meet the uptime requirements of enterprise-

wide applications that need to operate

without interruption:

• Day of the week—available now

• Day of the month—available now

• Day of the week of the month—coming soon

• Day of every # weeks—coming soon

Additional maintenance window optionsFor all environments

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Customer case study: Bristol-Myers SquibbStorage Gateway applications in life sciences

Mohammad Shaikh

Director of Research ComputingBristol-Myers Squibb

Oleg Moiseyenko

Sr. Cloud Architect, Bristol-Myers Squibb

To discover, develop, and deliver

innovative medicines that help

patients prevail over serious diseases

Our mission

Scientific Computing Services

Major data sources

• Raw data from labs

• Scratch space

• Results data

• External collaborations

• Public & government agencies

• R&D

It’s all about data, Big Data

From GBs to PBs scale

Exponential growth

(Tens of PBs)

Scientific data sets

• NGS data

• Proteomics

• Flow Cytometry

• Imaging data

• High-throughput screening

• Mass spectrometry

• Databases

2016 2017 2018 2019 2020

Our data sources

High-velocity and continuous sources• Illumina sequencers (Genomics data)

• Nuclear Magnetic Resonance (NMR)

• Many others

High-volume sources• High-resolution mass spectrometer (Proteomics)

• AT2 tissue microscope (Histology)

• High content screening

Intermediate storage• NAS drive, NFS-based metadata

• POSIX metadata captured only

• Business metadata: Relationships need to be enriched on S3

Hybrid file use cases: Data transfer, analytics, ML

Lab to Cloud (NMR, Histology, NGS)

• Instrument data

• Metadata catalog in the cloud

• Downstream analysis

Machine Learning (ML) analysis in cloud/visualization in Labs (Flow Cytometry)

• Instrument data to cloud

• ML-based analytics, unsupervised learning models

• Visualization of scientific data

Image management analysis in cloud

• Specialized scientific data formats

• Data enrichment

• Downstream analytics

1. Instruments writes raw data into File Gateway file share

2. File Gateway transfer files to S3 buckets

3. Data Management system scans S3 buckets regularly

4. Applications request data via Data Management system meta catalog

Typical data flow diagram

AWS Direct

Connect

10 Gb/s

S3 buckets Data

Management

System

ApplicationsFile GatewayBMS

Scientific

Instruments

1 2

3 4

AWS Storage Gateway in Image Discovery

AWS Direct

Connect

10 Gb/s

BMS AWS Cloud

S3 bucket A

S3 bucket B

S3 bucket N

S3 bucket N+1

S3 object store

S3 bucket 3

S3 bucket 2

S3 bucket 1

Data Management

System

(Metadata Catalog)

Image analysis

tools

S3 bucket

for transformed

images

Collaborator’s AWS Cloud

Image transformation

On premises

Scientific

instruments

Scientists

Images on local

server (NFS)

Images on local

server (NFS)

Images on local

server (NFS)

Local storage

layer

Image Metadata

database

Storage Gateway

Hardware appliance

AWS Snowball

Outcomes for BMS

Tech

Integration across standard protocols

Low-latency

Efficient data transfer

Easy to deploy: Virtual and hardware storage gateways

Data replication

Encryption in transit

Business

Cost and elasticity

Support many old and new applications

Overall simplicity

Effective workflows automation

Secure data sharing

Plan Storage Gateway deployment

Preparing for Storage Gateway

• S3 buckets

• Access policies

• File shares

• Mounting instructions

• Data transfers

Preparing for metadata catalog

• Collection names

• Directory names

• Data sources, daily volumes,

formats

• Business data tags and rules

• Access requirements

• Shared directory needs

• Data scan frequency

• Access to metadata catalog

AWS Storage Gateway hardware appliance

Appliance details

The hardware appliance comes with AWS Storage

Gateway software pre-installed on a validated

configuration of a Dell EMC PowerEdge R640XL server:

• 2 x Intel Xeon Silver 4114 2.20 GHz

processors with 10 cores each

• 128 GB DDR4 RAM

• 5 TB of usable enterprise SSD storage, with the

option to add 7 TB of usable enterprise SSD

storage for a total of 12 TB

• 4-port 10 Gigabit copper network card, with

the option to purchase and use a 4-port 10

Gigabit fiber-optic network card

• 3 years of hardware support from Dell—

accessed and coordinated through your

normal AWS support channels

1 2 3

4 5

Hardware applianceFacts:

• You own it!

• Secure local installation

• Low latency

• Data compression

• Suitable for legacy applications

• Provide local applications access to S3 storage

• Price range: $12K–$16K USD

Current limitations:

• One gateway type per appliance

• 5 TB usable storage (extendable up to 12 TB)

• Software RAID

• Intel X710 4-port 10 Gigabit fiber optic network card

• AWS Direct Connect is recommended

• Local proxy servers

1 2 3

4 5

Lessons learned

• Optimizing AWS Gateway: compute, storage, cache size

• Do not oversubscribe the CPUs of the host server (4-16-24 vCPU’s)

• Don’t mix upload buffer disks and cache storage

• Use high-performing RAID configuration for data store disks

• Cache disk configuration: Proxy server vs. Direct connect

• IP addresses, ports, and firewall rules

• Live test from actual scientific instruments

• Caution while sharing same S3 bucket through different AWS Storage Gateways

• Software-based RAID (no hardware RAID option?)

• Direct Connect links

• Storage Gateway, data governance and reliability

• Support channels and security

Preventing multiple file shares writing to S3 Bucket

When you create a file share, we

recommend that you configure your

Amazon S3 bucket so that only one

file share can write to it

If you configure your S3 bucket

to be written to by multiple file

shares, unpredictable results

can occur

To prevent this, create an S3 bucket

policy that denies all roles except the

role used for the file share to put or

delete objects in the bucket

{"Version":"2012-10-17","Statement":[

{"Sid":"DenyMultiWrite","Effect":"Deny","Principal":"*","Action":[

"s3:DeleteObject","s3:PutObject"

],

"Resource":"arn:aws:s3:::TestBucket/*","Condition":{

"StringNotLike":{"aws:userid":"TestUser:*"

}}

}]

}

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

AWS Storage Gateway General Managerkalavade@amazon.com

Paul Reed

AWS Storage Gateway Principal Product Managerpaulreed@amazon.com

Asa Kalavade

Question Time

Mohammad Shaikh

Director of Research

ComputingBristol-Myers Squibb

Oleg Moiseyenko

Sr. Cloud Architect, Bristol-Myers Squibb

Take action

Deploy a Storage

Gateway VM

Learn more … aws.amazon.com/storagegateway

Start using cloud

storage on-premises

Try it out

File

(NFS/SMB)

Volume

(iSCSI)

Tape

(iSCSI VTL)

Choose your

Gateway Type

With Amazon S3, Amazon S3

Glacier, Amazon S3 Glacier

Deep Archive, and Amazon EBS

Learn more about hybrid cloud storage in these sessions

• STG231 Lift and shift your tape-based backup workflows to AWS

• STG226 Hands-on with hybrid block storage using a Volume Gateway

• STG217 Shift your tape backups to AWS to save time and money

• STG213— —Storage for hybrid cloud and edge computing: Bring AWS to you

• STG313 Hybrid architectures for database backups & file migrations

• STG336— Using hybrid cloud storage to close a data center and migrate

Thank you!

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

kalavade@amazon.com

Paul Reed

paulreed@amazon.com

Asa Kalavade

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Recommended