54
Watching Pigs Fly with the Netflix Hadoop Toolkit Hadoop Summit 2013 San Jose, CA

Watching Pigs Fly with the Netflix Hadoop Toolkit

  • Upload
    hana

  • View
    65

  • Download
    2

Embed Size (px)

DESCRIPTION

Watching Pigs Fly with the Netflix Hadoop Toolkit. Hadoop Summit 2013 San Jose, CA. Our Motivation. Data should be accessible, easy to discover, and easy to process for everyone. Our Users. Analysts. Engineers. Hadoop Platform as a Service. Hadoop Platform as a Service. S3. - PowerPoint PPT Presentation

Citation preview

Page 1: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Watching Pigs Fly with the Netflix Hadoop Toolkit

Hadoop Summit 2013San Jose, CA

Page 2: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Data should be accessible, easy to discover, and easy to process for everyone.

Our Motivation

Page 3: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Our Users

Analysts Engineers

Page 4: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Hadoop Platform as a Service

Page 5: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Hadoop Platform as a Service

S3

Page 6: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Hadoop Platform as a ServiceData Platform

Page 7: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Data Platform as a Service

Franklin(Metadata API)

Sting(Adhoc Visualization)

Forklift (Data Movement)

Looper(Backloading)

Ignite(A/B Test Analytics)

Spock(Data Auditing)

Genie(Hadoop PaaS)

Lipstick(Pig Workflow Visualization)

Event Service(Orchestration)

Hadoop

S3

Other Processing

Page 8: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Let’s solve a problem using the data!

Page 9: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Build a recommender.

Page 10: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

But, what makes good recommendations?Similarity

Personalization

Page 11: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

COLORS!

Page 12: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

COLORS!Box art is colorful…

Page 13: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

We’re Sorry

COLORS!Box art is colorful…

Page 14: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Where can I find the data?

Page 15: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Hadoop Platform as a Service

S3

Page 16: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Hadoop Platform as a Service

S3Cassandra TeradataRedshiftRDS

Page 17: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Data Platform as a Service

Franklin(Metadata API)

S3Cassandra TeradataRedshiftRDS

Page 18: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Data Platform as a Service

Franklin(Metadata API)

Page 19: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Create a dataset for box art and color.

Page 20: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Whether your dataset is large or small, being able to visualize it makes it easier to explain.

Page 21: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Data Platform as a Service

Franklin(Metadata API)

Sting(Adhoc Visualization)

Page 22: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Sting

• Allows users to cache the results of a genie job in memory

• Sub second response to OLAP style operations (slicing, dicing, aggregations).

• Adhoc / recurring schedule• Easy to use!

Page 23: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

HiveQuery

Schema

Page 24: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

% Content Consumed / Hour

Page 25: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

HemlockGrove

House ofCards

ArrestedDevelopment

Page 26: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Similarity

Page 27: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit
Page 28: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit
Page 29: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

House ofCards Macbeth

Page 30: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit
Page 31: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit
Page 32: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Toddlers& Tiaras

Star Trek:Voyager

Page 33: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Personalization

Page 34: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

# of subscribers X # of titles = ???,000,…,000 (big data)

Big Data

Page 35: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Netflix Apache Pig

Page 36: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit
Page 37: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Lipstick

Data Platform as a Service

Franklin(Metadata API)

Sting(Adhoc Visualization)

Page 38: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Lipstick

• Allows users to visualize their data flow• Allows users to see common errors• Allows users to easily monitor their jobs• Empowers users to support themselves• Facilitates communication between

infrastructure team and users

Page 39: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Lipstick

Page 40: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Overall JobProgress

Page 41: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

LogicalPlan

Overall JobProgress

Page 42: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Logical Operator(reduce side)

Logical Operator(map side)

Map/Reduce Job

Intermediate Row Count

RecordsLoaded

Page 43: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

HadoopCounters

Page 44: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

My Job has stalled.

Common Problem #1

Page 45: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit
Page 46: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Unoptimized/OptimizedLogical Plan Toggle

Dangling Operator

Page 47: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

I didn’t get the data I was expecting

Common Problem #2

Page 48: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit
Page 49: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit
Page 50: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

I don’t understand why my job failed.

Common Problem #3

Page 51: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Failed Job(light red background)

Successful Job(light blue background)

Page 52: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit
Page 53: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Wrapping up

• Demos at the Netflix booth in the exhibit hall (see more Lipstick, Sting, and Genie).

• Lipstick is part of Netflix OSS.• Clone it on github at http:

//github.com/Netflix/Lipstick• We welcome feedback and contributions!

Page 54: Watching Pigs Fly with the  Netflix  Hadoop  Toolkit

Charles Smith: [email protected] Jeff Magnusson: [email protected]

Thank you!

Jobs: http://jobs.netflix.comNetflix OSS: http://netflix.github.io

Tech Blog: http://techblog.netflix.com/