View
412
Download
4
Category
Tags:
Preview:
DESCRIPTION
Opening Keynote at ZDNet Advanced Computing Conference by Abhishek Sinha (Business Development Manager APAC)
Citation preview
Cloud
What is big data
Data analysis Pipeline
How customers are using the pipeline
When your data sets become so large that you have to start
innovating how to collect, store, organize, analyze and share it
What does big data look like ?
Volume Velocity Variety
3Vs
Where is this data coming from ?
Human generated
Machine generated
Tweet
Surf the internet
Buy and sell products
Upload images and videos
Play games
Check in at restaurants
Search for cafes
Find deals
Watch content online
Look for directions
Use social media
Human generated
Machine generated
Networks and security devices
Mobile phones
Cell phone towers
Smart grids
Smart meters
Telematics from cars
Sensors on machines
Videos from traffic and security cameras
What is it used for ?
Data for competitive advantage
Data for competitive advantage
Customer Segmentation
Financial modeling,
System analysis,
Line-of-sight,
Replacing Human decisions
Business intelligence..
Data for competitive advantage
Customer Segmentation
Financial modeling,
System analysis,
Line-of-sight,
Replacing Human decisions
Business intelligence..
Innovating new business and revenue models
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
lower cost,
increased
throughput
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
lower cost,
increased
throughput
constraint
Very high barrier to
turning data into
information…
Very high barrier to
turning data into
information.
Infrastructure capacity
Technical Skills
Questions to ask
Cheap experimentation
Amazon Web Services Cloud
Elastic and highly scalable
No upfront capital expense
Only pay for what you use
+
+
Available on-demand
+
= Remove
constraints
Remove constraints = More experimentation
More experimentation = More innovation
More Innovation = Competitive edge
Amazon Web Services
Removes constraints
Focus on your data
Leave undifferentiated heavy lifting to us
HOW
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
25
AWS
Import/Export
Corporate
data center
Amazon
Elastic
MapReduce Amazon
Simple
Storage
Service (S3)
BI Users
Clickstream data
from 500+
websites and VoD
platform
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
More than 25 Million Streaming Members
50 Billion Events Per Day
30 Million plays every day
2 billion hours of video in 3
months
4 million ratings per day
3 million searches
Device location , time ,
day, week etc.
Social data
10 TB of streaming data per day
What is S3?
Highly scalable data storage
Access via APIs
Fast
(850K requests
per sec)
Highly available & durable
(99.999999999% Durability
Economical
($0.095 per GB)*
Web store
Data consumed in multiple ways
S3
EMR
Prod Cluster (EMR)
Recommen
dation
Engine
Ad-hoc
Analysis
Personalization
Velocity of data
Amazon Dynamodb
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
“Who buys video games?”
3.5 billion records
13 TB of click stream logs
71 million unique cookies
Per day:
500% return on ad spend
17,000% reduction in
procurement time
Results:
“Who is using our
service?”
Identified early mobile usage
Invested heavily in mobile
development
Finding signal in the noise of logs
9,432,061 unique mobile devices
used the Yelp mobile app.
4 million+ calls. 5 million+ directions.
In January 2013
What is EMR?
Map-Reduce engine Integrated with tools
Hadoop-as-a-service
Massively parallel
Cost effective AWS wrapper
Integrated to AWS services
+
Source: http://nerds.airbnb.com/redshift-performance-cost
Table Size Query type Hive Redshift
3 billion
rows
Simple range
query
1680
seconds (28
min)
360 seconds
(6 min)
1 million
rows
2 complex
joins
182 seconds 8 seconds
$13.60/hour on Redshift versus $57/hour on
HIVE
Every day is crucial and costly
Challenge: To run a virtual screen with a higher
accuracy algorithm & 21 million compounds
Metric Count
Compute Hours of
Work
109,927 hours
Compute Days of
Work
4,580 days
Compute Years of
Work
12.55 years
Ligand Count ~21 million ligands
Using Cycle Computing and Amazon
Web Services
3 Hours for $4828.85/hr
Instead of $20+
Million in
Infrastructure
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
Open web index.
3.4 billion records.
Available to all.
1000 Genomes
project
Generation
Collect
Store
Collaboration & sharing
Analysis and Computation
Thank you! aws.amazon.com/big-data
sinhaar@amazon.com
May 21st, COEX Auditorium, Seoul
One day Free training
Walk through of services
http://aws.amazon.com/apac/awsday/seoul/
Recommended