36
Big Data with Amazon Redshift and ATI November, 27th 2013

Sound cloud - User & Partner Conference - AT Internet

Embed Size (px)

DESCRIPTION

Big Data with Amazon Redshift and ATI - AT Internet

Citation preview

Page 1: Sound cloud - User & Partner Conference - AT Internet

Big Data with Amazon Redshift and ATINovember, 27th 2013

Page 2: Sound cloud - User & Partner Conference - AT Internet

HI, I’M OLE

Page 3: Sound cloud - User & Partner Conference - AT Internet

SOUNDCLOUD IS THE WORLD’S LEADING AUDIO PLATFORM

Page 4: Sound cloud - User & Partner Conference - AT Internet

Every minute, creators upload

12hrs of audio

Page 5: Sound cloud - User & Partner Conference - AT Internet

reaching over

250m

people every month

Page 6: Sound cloud - User & Partner Conference - AT Internet

8% of the internet

Page 7: Sound cloud - User & Partner Conference - AT Internet
Page 8: Sound cloud - User & Partner Conference - AT Internet

FOO FIGHTERS SNOOP LION MADONNA MACKLEMOREPRESIDENT OBAMA JOHN OLIVER(DAILY SHOW/BUGLE)

SKRILLEX

Page 9: Sound cloud - User & Partner Conference - AT Internet
Page 10: Sound cloud - User & Partner Conference - AT Internet

How‘s the sales funnel performingin Brazil and what‘s the split between products?

Page 11: Sound cloud - User & Partner Conference - AT Internet

• Avoid Silos

• Remove unnecessary restrictions

• Provide simple tools

• Teach People how to use data

DATA DEMOCRATIZATION

Page 12: Sound cloud - User & Partner Conference - AT Internet

In one sentence:

DATA DEMOCRATIZATION

Deliver the right information to the

right person at the right time.

Page 13: Sound cloud - User & Partner Conference - AT Internet

PRODUCTION DB

ANALYTICS DB

DATA ANALYSIS AND REPORTING

2010-2012

AT Internet

Page 14: Sound cloud - User & Partner Conference - AT Internet

DATA ANALYSIS AND REPORTING

ListensSoundsUsersCommentsFavoritesSharesReposts

ImpressionsClicksConversionsSuggestionsDownloadsTaggingsUploads

Page 15: Sound cloud - User & Partner Conference - AT Internet

DATA ANALYSIS AND REPORTING

Listens

timestamp

duration

sound

owner

listener

API-key

(location)

country

Page 16: Sound cloud - User & Partner Conference - AT Internet

DATA ANALYSIS AND REPORTING

additional metadata:

• location within sound

• context (location on site)

• segmentation

Listening creates >6000 events/s

BIG DATA

Page 17: Sound cloud - User & Partner Conference - AT Internet

HADOOP TO THE RESCUE

2 Datacenter in AMS

200+ Nodes

Page 18: Sound cloud - User & Partner Conference - AT Internet

HADOOP TO THE RESCUE

listen data

listen metadata

search data

recommender data

product testing data

backend production data

backend logs

Page 19: Sound cloud - User & Partner Conference - AT Internet

HADOOP AND DATA DEMOCRATIZATION

Data is siloed on hadoop

Data governance not existing

Technical hurdles for access

Not realtime

Slow access

Page 20: Sound cloud - User & Partner Conference - AT Internet

AMAZON REDSHIFT

Fast fully managed DW service

Optimized for petabyte or more

datasets

Fast query and I/O performance

Columnar storage technology

Page 21: Sound cloud - User & Partner Conference - AT Internet

Staging Area

Pig/Ruby Scripts

Amazon EMR

COPY

Pig/Ruby Scripts

Job execution powered by:

2013BI INFRASTRUCTURE

Data Exploration

Source Systems

Hadoop

MySql

External Systems

(production db)MySql

DataWarehouse

ETL Scripts ETL Scripts

AT Internet

Page 22: Sound cloud - User & Partner Conference - AT Internet

How‘s the sales funnel performingin Brazil and what‘s the split between products?

Page 23: Sound cloud - User & Partner Conference - AT Internet

ATI Data Query

Create query:

1. filter on funnel

pages

2.select metrics

and dimension

3.add REST URL to

ETL pipeline

Page 24: Sound cloud - User & Partner Conference - AT Internet

Staging Area

Pig/Ruby Scripts

Amazon EMR

COPY

Pig/Ruby Scripts

Job execution powered by:

Data Exploration

Source Systems

Hadoop

MySql

External Systems

(production db)MySql

DataWarehouse

ETL Scripts ETL Scripts

AT Internet

Page 25: Sound cloud - User & Partner Conference - AT Internet

DATA EXPLORATION

Simple and fast access to data

More time for “deep dives” into

data

Individualized Reporting

Allows interactivity between users

Integrated with RedShift

Page 26: Sound cloud - User & Partner Conference - AT Internet

• Reports designed by end users

• Central repository for data analysis

• User interaction

• Data from one source only

• Scalable solution

• Data to the people!

DATA DEMOCRATIZATION

Page 27: Sound cloud - User & Partner Conference - AT Internet

QUESTIONS?

Page 28: Sound cloud - User & Partner Conference - AT Internet

THANK YOU!

P.S. WE’RE HIRING.SOUNDCLOUD.COM/JOBS

Page 29: Sound cloud - User & Partner Conference - AT Internet

APPENDIX

Page 30: Sound cloud - User & Partner Conference - AT Internet

First: Gather data from the several source systems into S3

Hadoop

MySql

External Systems

(production db)MySql

Full/Daily Imports

MapReduce for: - Listens - Plays- Impressions- Affiliations- ...

IMPORT DATA FROM SOURCE SYSTEMS

Page 31: Sound cloud - User & Partner Conference - AT Internet

Second: Rebuild staging area tables for full imports

IMPORT DATA FROM SOURCE SYSTEMS

Staging Area

tracks users client applications

...

Based on configuration files

Create statements generated

Re-create DISTKEYS and SORTKEYS

Full control in changes in the data

model

yaml config files

Page 32: Sound cloud - User & Partner Conference - AT Internet

Third: Import the data from S3 to RedShift

Staging Area

tracks users client applications

...

Full import: TRUNCATE & COPYDaily import: COPY

IMPORT DATA FROM SOURCE SYSTEMS

Page 33: Sound cloud - User & Partner Conference - AT Internet

ETL scripts divided into layers:

- Layer 1: Staging -> DW (dimensions)

- Layer 2: Staging -> DW (fact tables - raw data)

- Layer 3: DW -> DW (aggregated fact tables)

- Layer 4: DW -> Reporting Data Cubes (reporting data)

ETL AND DW DATAMODEL

Page 34: Sound cloud - User & Partner Conference - AT Internet

DataWarehouse

ETL AND DW DATAMODEL

Staging Area

Data CleaningData Transformation

Ruby/SQL Scripts

ETL Layer 1 & 2

Data Aggregation

Ruby/SQL Scripts

ETL Layer 3

Data Exploration

ETL Layer 4

Data Presentation

SQL

Page 35: Sound cloud - User & Partner Conference - AT Internet

JOB SCHEDULE AND EXECUTION

Job-scheduling tool developed

internally

Set dependencies between jobs

Execution in multiple machines

Supports all the ETL layers

Page 36: Sound cloud - User & Partner Conference - AT Internet

TIMELINEWeek 2 Week 4 Week 8 Week 10 Week 12 Week 14 Week 16

• Gap Analysis

• Business Exploration

(requirements

interviews)

• Information Mapping

Design

• Solution Design (Draft)

Requirement Analysis

Analysis Stage

End of Analysis Stage

Milestones Design & Build

• Define Infrastructure

• Design Data Model

Week 6

Infrastructure Ready!

• Build ETL

• Build Data Cubes

• Design Reports/Dashboards (Presentation

Layer)

BI 1.0 is built!

• System/Integration

Tests

• User Acceptance

BI 1.0 is tested!

• User Workshops

• BI 1.0 Evaluation

BI 1.0 is ready to use!

Test & Deploy