41
Big Data on AWS Services Overview Bernie Nallamotu| Principle Solutions Architect \

Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Embed Size (px)

Citation preview

Page 1: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Big Data on AWS

Services Overview

Bernie Nallamotu| Principle Solutions Architect

\

Page 2: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

So what is it?

When your data sets become

so large that you have to start innovating around

how to collect, store, organize, analyze and share it

Compute Storage Big Data

Page 3: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

100

GB

1,000

PB

Challenges start at relatively small volumes

Compute Storage Big Data

Page 4: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

GB TB PB

Compute Storage Big Data Unconstrained data growth

95% of the 1.2 zettabytes of data in the digital universe is unstructured

70% of of this is user-generated content

Unstructured data growth explosive, with estimates of compound annual growth (CAGR) at 62% from 2008 – 2012.

Source: IDC

ZB

EB

Page 5: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Web sites Blogs/Reviews/Emails/Pictures

Social Graphs Facebook, Linked-in, Contacts

Application server logs Web sites, games

Sensor data Weather, water, smart grids

Images/videos Traffic, security cameras

Twitter 50m tweets/day 1,400% growth/year

Where does it come from?

Compute Storage Big Data

Page 6: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Innovation

Why AWS and big data?

Amazon

S3

Amazon

DynamoDB

Amazon

RedShift Spot

HPC EMR

Compute Storage

Page 7: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

AWS Worldwide Public Sector Team

Amazon EMR

(Elastic Map Reduce)

AWS Data Pipeline

Hosted Hadoop

framework Move data among AWS

services and on-

premises data sources

Amazon Redshift

Petabyte-scale data

warehouse service

Big Data Services

Compute Storage Big Data

Page 8: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

How do you get your slice of it?

AWS Direct Connect

Dedicated low latency

bandwidth

Queuing

Highly scalable event

buffering

Amazon Storage Gateway

Sync local storage to the cloud

AWS Import/Export

Physical media shipping

Compute Storage Big Data

Page 9: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

AWS Relational Database

Service

Fully managed database

(MySQL, Oracle, MS SQL Server,

PostgreSQL)

AWS DynamoDB

NoSQL, Schema-less,

Provisioned throughput

database

Amazon S3

Object datastore up to 5TB

per object

99.999999999% durability

Where do you put your slice of it?

AWS SimpleDB

NoSQL, Schema-less

Smaller datasets

Compute Storage Big Data

Page 10: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Amazon Glacier

Long term cold storage

From $0.01 per GB/Month

99.999999999% durability

Where do you put your slice of it?

Compute Storage Big Data

Page 11: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Scale Price

Performance

How quick do you need to read it?

Single digit ms 10s-100s ms <5 hours

AWS DynamoDB

Social scale applications Provisioned throughput performance

Flexible consistency models

AWS S3

Any object, any app 99.999999999% durability

Objects up to 5TB in size

AWS Glacier

Media & asset archives Extremely low cost

S3 levels of durability

Compute Storage Big Data

Page 12: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Scale Price

Performance

Operate at any scale

Unlimited data

Compute Storage Big Data

Page 13: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Data App App

http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/

Data has gravity

Compute Storage Big Data

Page 14: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Data

http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/

Compute Storage Big Data …and inertia at volume…

Page 15: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Data

…easier to move applications to the data

Compute Storage Big Data

http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/

Page 16: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Bring compute capacity to the data

Very large dataset seeks

strong & consistent

compute for short term

relationship, possibly

longer

Compute Storage Big Data

Page 17: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Compute Storage Big Data Flexible compute resources, on demand

Vertical

Scaling

From $0.02/hr

Amazon Elastic Compute Cloud (EC2) Basic unit of compute capacity

Range of CPU, memory & local disk options

27 Instance types available, from micro through cluster compute to SSD backed

Feature Details

Flexible Run Windows or Linux distributions

Scalable Wide range of instance types from micro to cluster compute

Machine Images Configurations can be saved as machine images (AMIs) from which

new instances can be created

Full control Full root or administrator rights

VM Import/Export Import and export VM images to transfer configurations in and out of

EC2

Monitoring Publishes metrics to Cloud Watch

Inexpensive On-demand, Reserved and Spot instance types

Secure Full firewall control via Security Groups

Page 18: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

On and Off Fast Growth

Variable peaks Predictable peaks

Elastic capacity as you need it

Compute Storage Big Data

Page 19: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

On and Off Fast Growth

Predictable peaks Variable peaks

WASTE

CUSTOMER DISSATISFACTION

Elastic capacity as you need it

Compute Storage Big Data

Page 20: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Elastic cloud capacity

Traditional

IT capacity

Your IT needs

Time

Capacity

Elastic capacity as you need it

Compute Storage Big Data

Page 21: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Fast Growth On and Off

Predictable peaks Variable peaks

Elastic capacity as you need it

Compute Storage Big Data

Page 22: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

From one instance…

Compute Storage Big Data

Page 23: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

…to thousands

Compute Storage Big Data

Page 24: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Innovation

Why AWS and big data?

S3

DynamoDB RedShift

Spot

HPC EMR

Compute Storage

Page 25: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Innovation

Why AWS and big data?

S3

DynamoDB RedShift

Spot

HPC EMR

Compute Storage

Page 26: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

AWS EMR – Elastic MapReduce

Page 27: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

AWS Worldwide Public Sector Team

A key tool in the toolbox to help with ‘Big Data’ challenges Makes possible analytics processes previously not feasible Cost effective when leveraged with EC2 spot market Broad ecosystem of tools to handle specific use cases

Amazon Elastic MapReduce

Page 28: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

What is EMR?

Map-Reduce engine Integrated with tools

Hadoop-as-a-service

Massively parallel

Cost effective AWS wrapper

Integrated to AWS services

Page 29: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

HDFS Reliable storage

MapReduce Data analysis

Page 30: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

map Input

file reduce Output

file

EC2 instance

Page 31: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

map Input

file reduce Output

file

map Input

file reduce Output

file

map Input

file reduce Output

file

EC2 instance

EC2 instance

EC2 instance

Page 32: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Person Duration Bob 23 Charlie 16 Charlie 18 Charlie 14 Bob 15 Alice 8 David 17 Alice 7 Charlie 15 Bob 11 David 12 Alice 10

Person Start End Bob 00:44:48 00:45:11 Charlie 02:16:02 02:16:18 Charlie 11:16:59 11:17:17 Charlie 11:17:24 11:17:38 Bob 11:23:10 11:23:25 Alice 16:26:46 16:26:54 David 17:20:28 17:20:45 Alice 18:16:53 18:17:00 Charlie 19:33:44 19:33:59 Bob 21:13:32 21:13:43 David 22:36:22 22:36:34 Alice 23:42:01 23:42:11

map

Person Total Alice 25 Bob 49

Charlie 63 David 29

reduce

Map? Reduce?

Page 33: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

AWS Worldwide Public Sector Team

AWS Elastic MapReduce Architecture

Page 34: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

HDFS

Amazon EMR

Pig

Page 35: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

HDFS

Amazon S3 Amazon

DynamoDB

Amazon EMR

Page 36: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

HDFS

Data management

Amazon EMR

Amazon S3 Amazon

DynamoDB

Page 37: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

HDFS

Pig

Analytics languages Data management

Amazon EMR

Amazon S3 Amazon

DynamoDB

Page 38: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

HDFS

Pig

Amazon

RDS

Analytics languages Data management

Amazon EMR

Amazon S3 Amazon

DynamoDB

Page 39: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

HDFS

Pig

Analytics languages Data management

Amazon

RedShift AWS Data Pipeline

Amazon EMR Amazon

RDS

Amazon S3 Amazon

DynamoDB

Page 40: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Useful Resources & Links

• AWS Big Data: http://aws.amazon.com/big-data

• AWS HPC: http://aws.amazon.com/hpc-applications

• Architecture Center: http://aws.amazon.com/architecture

• Documentation: http://aws.amazon.com/documentation

• Security Center: http://aws.amazon.com/security

• Whitepapers: http://aws.amazon.com/whitepapers

• Resources: http://aws.amazon.com/resources

• Case Studies: http://aws.amazon.com/solutions/case-studies

• Solution Providers: http://aws.amazon.com/solutions/global-solution-providers

• Calculator: http://calculator.s3.amazonaws.com/calc5.html

• TCO Calculator: http://aws.amazon.com/tco-calculator

• AWS Blog: http://aws.typepad.com

• The Power of 60: http://www.powerof60.com

Page 41: Big Data on AWS · PDF fileZB EB . Web sites Blogs/Reviews/Emails/Pictures Social Graphs ... Storage Big Data Compute Flexible compute resources, on demand Vertical Scaling From $0.02/hr

Thank you!

Tim Bixler | Manager, Federal Solutions Architecture

[email protected]