24
Parallel Webpage Layout Leo Meyerovich, Chan Siu Man, Chan Siu On, Heidi Pan Krste Asanovic, Rastislav Bodik and many others from the UPCRC Berkeley project UC Berkeley

Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

ParallelWebpageLayout

LeoMeyerovich,ChanSiuMan,ChanSiuOn,HeidiPanKrsteAsanovic,RastislavBodik

andmanyothersfromtheUPCRCBerkeleyproject

UCBerkeley

Page 2: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

2

Personal Health

Image Retrieval

Hearing, Music Speech Parallel

Browser Motifs

Sketching

Legacy Code Schedulers Communication &

Synch. Primitives

ParLabResearchOverview

Legacy OS

Multicore/GPGPU

OS Libraries & Services

RAMP Manycore

Hypervisor

Cor

rect

ness

Composition & Coordination Language (C&CL)

Parallel Libraries

Parallel Frameworks

Static Verification

Dynamic Checking

Debugging with Replay

Directed Testing

Autotuners

C&CL Compiler/Interpreter

Efficiency Languages

Type Systems

Dia

gnos

ing

Pow

er/P

erfo

rman

ce

Efficiency Language Compilers

Page 3: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

ParallelWebBrowser

Whythebrowser?–  animportantapplicationplatform–  browserwarsagain:competingonperformance(latency)–  howimportant?handheldpageloadistensofCPUseconds

Whyaparallelbrowser?–  sooninyourphone?4coresx2threadsx8‐wideSIMD=64–  parallelismismoreenergyefficient

Technicalchallenge–  Parallelizethebrowsertorunwith100‐wayparallelism

Page 4: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

ThisTalk:ParallelizeSinglePageLayout

• Pagelayout(HTML+CSS)istheLaTeXoftheWeb–  latextakessecondstoformatadocument–  butpageloadshouldbe20‐100ms–  pageloadisabottleneck:51%ofCPUtimeonIE8

• Pagelayoutisachallenging“desktop”application–  notparallelizedbefore–  specifications:oftenambiguousandsequential–  low‐latency:problemsareshort‐running–  lessunderstoodmotif:treecomputation

•  Knuth:“MultiprocessorsarenohelptoTEX”

Page 5: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

OurContributions

1.  Analyzedbrowserperformance–  layoutisabottleneck;weidentifieditscriticalmotifs

2.  DistilledessentialCSSandwroteadeclarativespecforit–  crucialstepforexposingparallelismhiddenbytoday’sspec

3.  Developedfirstparallelpagelayoutalgorithms(1)matching:taskparallel,20xspeedup,stronglyscalesto16(2)solving:taskparallel,4xspeedup,stronglyscalesto3cores

4.  Futuresteps–componentsandalgorithms

Page 6: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

OverallPageLayoutProblem

p width=100%

imgwidth=100pxfloat=le4

pimg width=10px

align=right align=right

float=le4;width=10px

Input:documenttree+CSSrulesOutput:sizesandpositionsoftreenodesSteps:determinestylingrules;solveconstraints

p{width:100%}img{width:100px;float:left}pimg{width:10pt}

<body>hello<imgsrc="http:...”><p><b>world</b>okokokokok

CSSstylingrulesHTML

<body>

<p> <p>

<img>hello <b> okokokok

world

ok

+ →

x=12,y=17

XX=?,y=?

Whatthebrowserdoes

Our page layout subproblem

25% 25%

Page 7: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

Thelayoutspecisconfusing

Exampleofspec:–  “Ingeneral,theleftedgeofalineboxtouchestheleftedgeofitscontainingblock…However,@loatingboxesmaycomebetween[them].”

Hardtoimplementcorrectly,evensequentially.

Safari Firefox

Page 8: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

simplestwaytoimplementthespecseemstobeto(mostly)@lowtheelementssequentiallyinorder

Flow:sequentiallayoutintoday’sbrowsers

Page 9: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

Flowisguidedbyacursor

Cursorpointstowherenextelementgoes

world ok ok

Δ Δ

Δ

hello

ok Δ Δ

Δ

ok ok

<body>

<p>

<img>hello

<p>

<b> ok ok ok ok

world

ok

Δ Δ

Δ

Δ

Δ Δ Δ Δ Δ

Page 10: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

Flow’sdependences

<body>

<p> <p>

<img>hello <b> okokokok

world

ok

w=100,fs=12

w=50,float=le4

w=100,fs=12x=0,y=0

w=100,fs=6x=0,y=0

w=40,fs=6x=0,y=0h=10

h=10

constraintsnotspecifiedifequality(e.g.,inherited)orintrinsic(e.g.,defaultimagesizeoraspectraVo)

w=100,fs=12x=0,y=10

w=50x=0,y=10h=20

w=30,fs=12x=50,y=10h=10

h=10

w=100,fs=12x=0,y=10

h=40

h=40

fs=50%

fs, Δ, w

fs, Δ, w Δ fs,Δ,w

Δ

Δ fs, Δ, w

fs, Δ, w

fs, Δ, w

Page 11: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

Dependenciespreventparallelism

<body>

<p> <p>

<img>hello <b> okokokok

world

ok

w=200,fs=12

w=50,float=le4

w=100,fs=12x=0,y=0

w=100,fs=6x=0,y=0

w=40,fs=6x=0,y=0h=10

h=10w=100,fs=12x=0,y=10

w=50x=0,y=10h=20

w=30,fs=12x=50,y=10h=10

h=10

w=100,fs=12x=0,y=10

h=40

h=40

fs=50%

fs, Δ, w

fs, Δ, w Δ fs, Δ,w

Δ

c fs, Δ, w

fs, Δ, w

fs, Δ, w

w=40,fs=6x=0,y=0h=10

w=100,fs=12x=0,y=10

Δ

c fs, Δ, w

Page 12: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

Enableparallelismbydoingpartofwork

<body>

<p> <p>

<img>hello <b> okokokok

world

ok

w=200,fs=12

w=50,float=le4

w=100,fs=12x=0,y=0

w=100,fs=6x=0,y=0

w=40,fs=6x=0,y=0h=10

h=10w=100,fs=12x=0,y=10

w=50x=0,y=10h=20

w=30,fs=12x=50,y=10h=10

h=10

w=100,fs=12x=0,y=10

h=40

h=40

fs=50%

fs, Δ, w

fs, Δ, w Δ fs, Δ,w

Δ

Δ fs, Δ, w

fs, Δ, w

fs, Δ, w w=30,fs=12x=50,y=10h=10

fs, Δ, w

fs, Δ, w

fs, Δ, w

Page 13: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

ParallelLayoutSolving:FivePhases

ExtensiveanalysisledustofivephasesTheseenableparallelism

1. font size, tentative widths 2. preferred widths: max, min ,

3. final widths: break cycles by over-specifying CSS

4. heights, relative x/y positions

5. absolute x/y positions

Page 14: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

EachPhaseExhibitsTreeParallelism

<body>

<p> <p>

<img>hello <b> okokokok

world

ok

w=100,fs=12

float=le4

fs=6

fs=12

fs=12

fs=6fs=12

fs=12

fs=12wp=40wm=40 wp=50

wm=50

wp=30,wm=30

wp=10wm=10

wp=80wm=30

wp=80,wm=40

wp=30wm=30

wp=40wm=40

fs=12

<body>

<p> <p>

<img>hello <b> okokokok

world

ok

Phase 1: font size, temporary width Phase 2: preferred max & min width Phase 3: width Phase 4: height, relative x/y position

fs=50%

Phase 5: absolute x/y position

Page 15: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

ParallelLayout:SpeculativeEvaluation

• Didnotbreakdependenciesforfloats–  mightstickoutofparagraphs

world ok ok

ok ok ok

world ok ok

ok ok ok

hello

hello

Page 16: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

ParallelLayout:SpeculativeEvaluation

• Didnotbreakdependenciesforfloats–  mightstickoutofparagraphs

•  Speculate:assumenofloats•  Check• Patchupasneeded

world ok ok ok ok ok

hello

hello

world ok ok

ok ok ok

Page 17: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

ParallelLayout:SpeculativeEvaluation

• Didnotbreakdependenciesforfloats–  mightstickoutofparagraph

•  Speculate:assumenofloats•  Check• Patchupasneeded

–  floatsrare–  Webelieveoverflowis

minimal world ok ok

ok ok ok

world ok ok

ok ok ok

hello

hello

Page 18: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

BerkeleyStyleSheetLayoutLanguage

•  CancompileessentialCSSintoit• RefactoredCSStoseparatefeatures•  Simplifies:correctness,parallelization,use

Page 19: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

Analysis

• Model:sequentialspeed~=Firefoxspeed•  Cilk++:4xspeedup,scalesto3cores• NeedtoSIMDizeleaves

<body>

<p> <p>

<img>

hellooo

<b>

ok

ok 0

1

2

3

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Aver

age

Spee

dup

# Hardware Threads

Modeled Speedup w/Cilk++

Eight socket x 4 core AMD Opteron 2356 Barcelona Sun X4600

Dual socket x 4 core AMD Opteron 2356 Barcelona Sun X2200

Preproduction 2 socket x 4 core x 2 thread Intel Xeon Nehalem

Page 20: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

• Matching–  Tagpath(img:<body><p><img>)–  RuleSelectors–  Foreachtagpath:which

selectorsare~substrings?

• Ruleresolution–  Prioritizepropertiesby

ruleorder:loweroverrides

RuleMatching:ProblemStatement

width=100pxfloat=le4

<body>

<p> <p>

<img>hello <b> ok ok ok ok

world

ok

selectors p img pimg

proper*es width=100% width=100pxfloat=le4

width=100px

width=10px

Page 21: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

•  ~600nodes,1000srules• Assignnodestocores

–  loadbalancing:randomassignment

•  SIMDizable?

RuleMatching:Parallelization

<body>

<p> <p>

<img>hello <b> ok ok ok ok

world

ok

selectors p img pimg

proper*es width=100% width=100pxfloat=le4

width=100px

Page 22: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

Analysis

• Results–  perfectscaling:upto10cores–  20xspeedupon32cores–  …butwithpython

•  interp.overhead(seq.)• procs.,notthreads

•  Future–  C++implementation–  SIMDrulematching 0

4

8

12

16

20

24

28

32

1 4 8 16 32

Aver

age

Spee

dup

# Hardware Threads

Slashdot

Rotten Tomatoes

Wikipedia

NY Times

8 socket x 4 cores AMD Opteron 2356 Barcelona

Speedup vs # Cores (w/ Python)

Page 23: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

Takeaways

• Artifacts–  BSS/CSSspecification&dependencydecomposition–  4xsolvingspeedup(untuned),20xmatching(inpython)

•  Lessons–  4x<<100xSIMDizelow‐levellibraries(e.g.,fonts)–  motifs:lowlatencytreeops,vectors,pixelblending–  attributegrammarshelped

• Nextsteps–  tunetasks,SIMDkernels,biggerscopeofmodel–  implicationsforconcurrentscriptsusinglayout?

Page 24: Parallel Webpage Layout - University of California, Berkeleyparlab.eecs.berkeley.edu/sites/all/parlab/files/playout.pdf · 2010-10-05 · 2. Distilled essential CSS and wrote a declarative

(questions?)