[Gaming on AWS] Big Data Analysis in the Cloud

Preview:

DESCRIPTION

Big Data Analysis in the Cloud - AWS Korea (정윤진, Solutions Architect)

Citation preview

Cloud

Thank you

In the next 30 minutes

1

3

What is big data

Big data on AWS

How customers using AWS

2

Where is this data coming from ?

Human generated

Machine generated

Tweet

Surf the internet

Buy and sell products

Upload images and videos

Play games

Check in at restaurants

Search for cafes

Find deals

Watch content online

Look for directions

Use social media

Human generated

Machine generated

Networks and security devices

Mobile phones

Cell phone towers

Smart grids

Smart meters

Telematics from cars

Sensors on machines

Videos from traffic and security cameras

What is it used for ?

Data for competitive advantage

Data for competitive advantage

Customer Segmentation

Financial modeling,

System analysis,

Line-of-sight,

Replacing Human decisions

Business intelligence..

Data for competitive advantage

Customer Segmentation

Financial modeling,

System analysis,

Line-of-sight,

Replacing Human decisions

Business intelligence..

Innovating new business and revenue models

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

lower cost,

increased

throughput

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

lower cost,

increased

throughput

constraint

Very high barrier to

turning data into

information…

Very high barrier to

turning data into

information.

Infrastructure capacity

Technical Skills

Questions to ask

Cheap experimentation

Amazon Web Services Cloud

Elastic and highly scalable

No upfront capital expense

Only pay for what you use

+

+

Available on-demand

+

= Remove

constraints

Remove constraints = More experimentation

More experimentation = More innovation

More Innovation = Competitive edge

Amazon Web Services

Removes constraints

Focus on your data

Leave undifferentiated heavy lifting to us

HOW

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

26

AWS Cloud Corporate Data center

Virtual Private Cloud

VPN

Internet

Direct Connect

Storage Gateway

AWS Import/Export

S3 EMR RedShift

How to move your data into AWS

AWS

Import/Export

Corporate

data center

Amazon

Elastic

MapReduce Amazon

Simple

Storage

Service (S3)

BI Users

Clickstream data

from 500+

websites and VoD

platform

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

More than 25 Million Streaming Members

50 Billion Events Per Day

30 Million plays every day

2 billion hours of video in 3

months

4 million ratings per day

3 million searches

Device location , time ,

day, week etc.

Social data

10 TB of streaming data per day

What is S3?

Highly scalable data storage

Access via APIs

Fast

(850K requests

per sec)

Highly available & durable

(99.999999999% Durability

Economical

($0.095 per GB)*

Web store

Velocity of data

Amazon Dynamodb

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

“Who buys video games?”

3.5 billion records

13 TB of click stream logs

71 million unique cookies

Per day:

500% return on ad spend

17,000% reduction in

procurement time

Results:

What is EMR?

Map-Reduce engine Integrated with tools

Hadoop-as-a-service

Massively parallel

Cost effective AWS wrapper

Integrated to AWS services

+

Source: http://nerds.airbnb.com/redshift-performance-cost

Table Size Query type Hive Redshift

3 billion

rows

Simple range

query

1680

seconds (28

min)

360 seconds

(6 min)

1 million

rows

2 complex

joins

182 seconds 8 seconds

$13.60/hour on Redshift versus $57/hour on

HIVE

Every day is crucial and costly

Challenge: To run a virtual screen with a higher

accuracy algorithm & 21 million compounds

Metric Count

Compute Hours of

Work

109,927 hours

Compute Days of

Work

4,580 days

Compute Years of

Work

12.55 years

Ligand Count ~21 million ligands

Using Cycle Computing and Amazon

Web Services

3 Hours for $4828.85/hr

Instead of $20+

Million in

Infrastructure

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

Open web index.

3.4 billion records.

Available to all.

1000 Genomes

project

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

Game instances

DB instances Proxy farms

Amazon EMR

Amazon

Glacier

Amazon

RedShift

Amazon

DynamoDB

Game traffic Analysis

Users

Sample architecture

Thank you! aws.amazon.com/big-data

younjin@amazon.com

May 21st, COEX Intercontinental, Seoul

One day Free training

Walk through of services

http://aws.amazon.com/apac/awsday/seoul/