61
Your Trusted Third Party in the Digital Age™ Scalding on Tez Twitter HQ, July 14 th , 2015

Scalding on tez (final)

Embed Size (px)

Citation preview

Page 1: Scalding on tez (final)

Your Trusted Third Party in the Digital Age™

Scalding on Tez

Twitter HQ, July 14th, 2015

Page 2: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

2

• Who’s this guy?• How did we come to use Scalding?• Scalding on Tez: the Mini-HOWTO• In practice• Tips and Tricks• All aboard: how?• Performance

Agenda

Page 3: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

3

WHO’S THIS GUY?

Page 4: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

4Images: Amos Evans / « Rama » / Marcin Wichary // Wikipedia

• I’m 39• My oldest

computer is 33

Who’s this guy?8-bit Basic(s) Z80

assembly

Turbo Pascal

C++

PythonJava

ISO CNC

C#Scala

Still afraid of Shapeless

Page 5: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

5

HOW DID WE COME TO SCALDING?

Page 6: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

6

• A Trusted Third Party – Data escrow, controlled execution– Independent re-computation– Privacy & Personal Data compliance

assessment

• Big Data Services for Entertainment–Metadata enrichment– IP use certification– Dataset analysis as a service

Why Scalding?Transparency Rights Management:

Page 7: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

7

Why Scalding?« Big Data Services for Entertainment » - a Use Case

Digital Service Provider Report

Copyright Owners / Collective

Management Organizations

Page 8: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

8

Why Scalding?« Big Data Services for Entertainment » - a Use Case

Digital Service Provider Report

Copyright Owners / Collective

Management Organizations

Data Improvement

Automatic Data Feed

(« in your format »)

Independent Report

Conformance Report

Page 9: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

9

• September 2013: SQL Server overheats• October 2013: using Lingual

12 SQL steps + bash scripts

• September 2014: Cascading + Java• September 28th: tried out Scalding• November 2014: delivered first results

on Scalding• April 2015: First success on

Scalding+Tez

Why Scalding?Dataset analysis (from YouTube monthly reports)

Page 10: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

10

Our system…

Jenkins

git

Mesos

Chronos Marathon

YARN 2.6.0

HDFS 2.6.0

Debian Debian Debian DebianDebian

Ansi

ble

APP

scalding

cascading

YARNRM

APP (WS)

Akka Spray

Myriad

Artifactory 4-way Non-Reg

Jenkins Slave

Page 11: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

11

Our system…7 machines, and still a lot of things to discover

Page 12: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

12

SCALDING ON TEZ, THE MINI-HOWTO

Page 13: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

13

• Step 0: Prerequisites:– A YARN cluster– Cascading 3.0– TEZ runtime lib in HDFS– A version of scalding with fabric

selection

Scalding on Tez, the mini-howto

(2.6.0)

0.6.2-SNAPSHOT

0.13.1 + PR1220

Page 14: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

14https://github.com/cchepelov/wcplus/blob/master/build.sbt

Scalding on Tez, the mini-HOWTO• Step 1: build.sbt

Page 15: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

15

Scalding on Tez, the mini-HOWTO• Step 1: build.sbt (redux)

1. Regain control on what libraries are included

2. Exclude some « long transitive » dependencies that pull in junk

3. Put in the desired fabric, in a configurable way sbt --DCASCADING_FABRIC=hadoop clean assembly

Page 16: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

16

Scalding on Tez, the mini-HOWTO• Step 1bis: assembly.sbt

We’re using fatjars to simplify deployment.

Because of jar hell, we « need » a complicated assembly.sbt

https://github.com/cchepelov/wcplus/blob/master/assembly.sbt

Page 17: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

17https://github.com/cchepelov/wcplus/blob/master/src/main/scala/com/transparencyrights/demo/wcplus/CommonJob.scala

Scalding on Tez, the mini-HOWTO• Step 2: a few job flags

Page 18: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

18

• tez.task.resource.memory.mb– As large as you can afford to give, per CPU per

node– The more memory, the less Tez needs to spill

intermediates to disk

• tez.container.max.java.heap.fraction– Defaults (1024MiB * 0.8) assume the JVM’s Native

memory requirements don’t exceed 208 MiB– Scalding + the Scala runtime + Cascading on top

of Tez seems to require more. YARN kills offenders switftly!

– The 460MiB figure we’re using (1024+512)*(1-0.7) may be a bit wasteful

• Step 2: a few job flags (continued)

Page 19: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

19

THAT’S IT.

(ALMOST)

Page 20: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

20

IN PRACTICE…

Page 21: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

21

« A VERSION OF SCALDING WITH FABRIC SELECTION »

WAIT, WHAT?

Page 22: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

22

Scalding traditional --local and --hdfs flags:– Uses either LocalFlowConnector or

HadoopFlowConnector– Types are hard-coded

Cascading 2.5 introduced a new fabric concept. You can run either with cascading-hadoop or with cascading-hadoop2-mr1. But:– Incompatible jars (can’t load both)–Main types visible to Scalding are different

In practice« A version of scalding with fabric selection » Wait, What?

Page 23: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

23

PR1220: No longer hardcodes « either Local or

Hadoop 1.X » Enables supplying any flow connector

implementation, as long as the jar’s around.

--hdfs to be deprecated as an alias to --hadoop1

Still built against Cascading 2.6

In practice« A version of scalding with fabric selection » Wait, What?

Page 24: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

24

« STILL BUILT ON CASCADING 2.6 »

WHY?

Page 25: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

25

Cascading 3.0 has carefully updated some argument types to prepare for the futureThis is source- and binary-compatible:

In practice« Still built on Cascading 2.6 »

Scala enforces generic type safety, and the Cascading 3.0 upgrades are not legal with scalac. But they still are with the JVM…

libra

ryco

nsum

er

Libr

ary

V2Sa

me

cons

umer

In Java

Page 26: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

26

Scalding will require some adjustment to become compatible with the java-level source upgrades.

Can this happen without breaking scalding application source code ?

In practice… Going to native Cascading 3.0 ?

Page 27: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

27

GUAVA

Page 28: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

28

GUAVAGUAVA

Page 29: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

29

• Guava is a nice library…… of little use in Scala (?)

• In a Scalding/Cascading/Tez JVM, multiple versions of guava are required. Each layer depends on its own version.About every single version from 11.0 to 16.0.2

• There have been breaking changes (method renames & removals) in guava 13

• These happen on really mundane objects (Closeable, Stopwatch), but they’re major troublemakers

In practice…Guava.

Page 30: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

30

• Asking Apache to quickly upgrade to guava 18, or Google to re-introduce deprecated interfaces… probably not immediate

• Solution: Frankenguava.

In practice…Guava Hell: a temporary solution

Guava 18.0 JAR

Page 31: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

31

• Asking Apache to quickly upgrade to guava 18, or Google to re-introduce deprecated interfaces… probably not immediate

• Solution: Frankenguava.

In practice…Guava Hell: a temporary solution

Guava 18.0 JAR

Stopwatch & Closeables

Page 32: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

32

• Asking Apache to quickly upgrade to guava 18, or Google to re-introduce deprecated interfaces… probably not immediate

• Solution: Frankenguava.

In practice…Guava Hell: a temporary solution

Guava 18.0 JAR

Stopwatch & Closeables including

deprecated overloads

Stopwatch & Closeables

Page 33: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

33

• Step 1: Post-prepare the Tez runtime

• Step 2: Enforce the use of the appropriate guava

In practice…Frankenguava: howto

• Build tez from source• Unpack runtime jar from tez-dist• Remove guava• Put frankenguava• Repack• Deploy on HDFS

Page 34: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

34

CASCADING’S TEZ*REGISTRY

Page 35: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

35

• Cascading 3.0 uses a set of mapping registries to convert cascading patterns into the back-end API.

The Tez registries are new, and distinct from the MR registries

• The Tez registries are hardened against Concurrent’s extensive test library, which is built on years of MR experience. Tez has its own trouble spots.

Beware of hash joins.

• It works fine now, but getting the scalding test library onboard will help a long way.

In practice…Cascading’s Tez*Registry

Page 36: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

36

• It works mostly fine now, but getting the scalding test library onboard will help a long way.

In practice…Cascading’s Tez*Registry

Last-minute update:

.filterWithValue / .mapWithValue currently crash the Cascading planner (as of 3.0.1)

(implementation uses a HashJoin)

Page 37: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

37

AN EXAMPLE

Page 38: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

38

A small test:

Page 39: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

39

A small test: « wc plus »

70 books1.1M lines10M words56M bytes

Word, relative frequency,

deviation from median relative freq

Two Words, relative frequency,

deviation from median relative freq

Ten Words, relative frequency,

deviation from median relative freq

ComputeFrequencies

Ignoring things that are more frequent

than 80% of the maxword frequency

All Expressions (1-W to 10-W),

relative frequency, deviation from median relative freq

Page 40: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

40

A small test: « wc plus »

70 books1.1M lines10M words56M bytes

Word, relative frequency,

deviation from median relative freq

Two Words, relative frequency,

deviation from median relative freq

Ten Words, relative frequency,

deviation from median relative freq

ComputeFrequencies

Ignoring things that are more frequent

than 80% of the maxword frequency

All Expressions (1-W to 10-W),

relative frequency, deviation from median relative freq

No .filterWithValue / .mapWithValue for now

Roulex45 / Wikipedia

count

count

count

count

Page 41: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

41

A small test: « wc plus »

Page 42: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

42

TIPS & TRICKS

Page 43: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

43

Run your job with

-Dcascading.planner.plan.path=/tmp/path/to/plan.lst

The planner will output a lot of useful files. One of them is…/$(Job)/4-final-flow-steps/0000-step-node-sub-graph.dot

Run that file through graphvizdot –O –Tpdf 0000-step-node-sub-graph.dot

or, if the PDF is illegible, Firefox’s great at zooming into SVG files:

dot –O –Tsvg 0000-step-node-sub-graph.dot

Tips & Tricks0000-step-node-sub-graph.dot

Page 44: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

44

Tips & Tricks0000-step-node-sub-graph.dot

This is how TEZ names our stuff !

Page 45: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

45

MR– One flow, many (MANY)

independent steps– One or more operators

per step– Step-to-step

communications involve disk (HDFS)

– Each step is independent as far as MR is concerned

– Step scheduling managed from outside the cluster, by Cascading

TEZ– One flow, one DAG. A DAG

includes several nodes.– One or more operators per

node– Node-to-Node

communications managed by TEZ. Memory, direct network or disk as necessary

– YARN sees one « Application » per flow

– Node scheduling managed by TEZ DAG AppMaster

Tips & TricksMajor differences between how a cascading job gets mapped to MR and to TEZ:

Page 46: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

46

Tips & Tricksyarn-swimlanes.sh

• A tool included in the tez source distribution, in tez-tools/swimlanes (bash + python)

• Requires YARN ATS to work« yarn logs –applicationId application_1345431315_1511 » must work

• Reports, in a GANTT chart, the per-container occupation

Page 47: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

47

Tips & Tricksyarn-swimlanes.sh (2)

application_1435150225179_0474.svg

Page 48: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

48

Tips & Tricksyarn-swimlanes.sh (3)

time

containers

Page 49: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

49

Tips & TricksConsider using .forceToDisk to ensure work is balanced within the DAG

890 seconds

160 seconds

Page 50: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

50

Tips & TricksConsider using .forceToDisk to ensure work is balanced within the DAG

890 seconds 160 seconds

Page 51: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

51

• .forceToDisk really means « don’t merge those two TEZ nodes » which implies « manage appropriate data transmission between these two nodes »

• TextFile & other FixedPathSource friends don’t seem to automatically spread out work as well as they used to (huh?)

• YMMV, WIP.

Tips & Tricks• Consider using .forceToDisk to ensure

work is balanced within the DAG

Page 52: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

52

ALL ABOARD: HOW?

Page 53: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

53

• A build of scalding against Cascading 3.0.x Fabric-switching logic Get the test library to pass also on Tez Some applications might still uncover new mapping issues

increased community test case experience ???

• Getting the « guava mess » fixed Ideally all of Apache goes to recent guavas Enforced shading of Guava across the whole stack? Failing that, automated runtime patcher? (my « build stuff » partner makes me write: OSGI/Java9) ???

• Except for that, Tez is really easy for a YARN shop. Drop it in, and it runs!

All aboard: how?Smoothening up the UX for us app developers

Page 54: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

54

PERFORMANCE

Page 55: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

55

PerformanceMR vs TEZ

Page 56: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

56

PerformanceMR vs TEZ; to scale

Page 57: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

57

PerformanceMR vs TEZ; TO SCALE!!!

MR run time:14:22 (wall)12:49 (cluster time)5:43:26 (total CPU)

TEZ run time:4:03(wall)2:50(cluster time)1:25:35 (total CPU)

Page 58: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

58

PerformanceOutput of tez-tool « yarn-swimlanes.sh »

• 1 « swimlane » per active container• 1 colour per DAG Vertex (the black dots are actually the Vertex ID) • Container occupation is pretty good while there is work to do• (not demonstrated here) containers die when they are idle.

This is good!

Page 59: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

59

CONCLUSION

Page 60: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

60

As a conclusion…A lot of effort so far…

…but worth it!

Images: Nicholas Babaian // Flickr. Marathon du Médoc 2008

Page 61: Scalding on tez (final)

Cop

yri

gh

t ©

20

15

Tra

nsp

are

ncy

Rig

hts

Man

ag

em

en

t. A

ll ri

gh

ts r

ese

rved

61

THANKS!

For building that techFor helping outFor your attention today