20
1 Framework for Profile- Analysis Data-Layout Optimizations Shai Rubin Ras Bodik Trishul Chilimbi Microsoft Research University of Wisconsin University of Wisconsin

Framework for Profile-Analysis Data-Layout Optimizations

  • Upload
    varuna

  • View
    21

  • Download
    1

Embed Size (px)

DESCRIPTION

Framework for Profile-Analysis Data-Layout Optimizations. Shai Rubin. Ras Bodik. Trishul Chilimbi. University of Wisconsin. University of Wisconsin. Microsoft Research. DL Optimization. Data Layout Optimization (What). References sequence: A.x, B, A.z. Original data layout. - PowerPoint PPT Presentation

Citation preview

Page 1: Framework for Profile-Analysis Data-Layout Optimizations

1

Framework for Profile-Analysis Data-Layout Optimizations

Shai Rubin Ras Bodik Trishul Chilimbi

Microsoft ResearchUniversity of Wisconsin University of Wisconsin

Page 2: Framework for Profile-Analysis Data-Layout Optimizations

2

Data Layout Optimization (What)

CPU

Cache

Memory

References sequence: A.x, B, A.z

1 cycle

102 cycles

106 cycles

Disk

B

A

A.x

time

time

cache blocks

1

2

3

4

Memory Pages

1

2

BAA

time

time

cache blocks

B

1

2

3

4

Memory Pages

1

2

DL Optimization

A.x B A.z

A.x B A.z

A.x B A.z

A.x B A.z

A.x B A.z

A.x B A.z

AB BA.x B A.z

A.x B A.z

DL optimization: increase spatial locality of data to prevent memory faults.

Original data layout Modified data layout

A.z

B

A

A.x A.zA.z A.x

Page 3: Framework for Profile-Analysis Data-Layout Optimizations

3

Data Layout

Layout Space

Data Layout Optimization (How)

Optimal for simple

loopsHeuristic

Reference Summary

Array Dep.

Analysis(static)

Ref. Trace

(dynamic)

Scientific(array based)

General purpose

(pointer based)

Compile Time

1. Compile Time 2. Runtime

Program

Optimal Layout

Enforce layout

Data Layout Optimizer“Good” Layout

Program′

Page 4: Framework for Profile-Analysis Data-Layout Optimizations

4

Problems with Current Data-Layout Optimization

• Computationally hard to find the optimal layout [Petrank].

• Computationally hard to approximate the optimal layout

[Petrank].

• Implication - heuristics are not robust:– will not work for all programs.

• From our experience with heuristics:– Field Reordering [Chilimbi PLDI’99] – no improvement (on perl).

– Custom Memory Allocator [Seidl ASPLOS’98] degrades performance (on

espresso).

• Our approach: replace heuristic with feedback-driven search.

Page 5: Framework for Profile-Analysis Data-Layout Optimizations

5

Data Layout Space

Searching For a Data Layout

Current program data layout

“Good” Layouts“Good” + “easy” to enforce layouts

– a “good” layout.

• Search advantage: – Robust, for each program finds a “good” layout.

Optimal data layout

– an “easy” to enforce layout.

• Problem: Perform a search in the data layout space.

• Look for:

Page 6: Framework for Profile-Analysis Data-Layout Optimizations

6

Is Search Practical?

Possible layouts

Data Layout

Reference Trace

Optimizer (Heuristic)

Enforce layoutEdit Compile Execute Evaluate Continue?

End

• Not clear:

Enforce

Page 7: Framework for Profile-Analysis Data-Layout Optimizations

7

Outline

• Background and Problem Definition

• Search is a solution, but may not practical

– Making the search practical

• Applications

• Summary

Page 8: Framework for Profile-Analysis Data-Layout Optimizations

8

Making the Search Practical

Reference Trace

Data Layout Search Engine

Edit Compile Execute Evaluate Continue?

End

Compress(T)CST

Data Object Analysis DOA(CST,LS)NLS

Layout Selector LS(NLS,B,CST,SS)DL

Enforce LayoutAL(DL,CST)NT

EvaluateSimulate(NT)B

“good “and enforceable

layoutsClass Splitting

Linearization

Field ReorderingLayout

Space

Narrowed Space

Search Strategy

Trace

Data Layout

New Trace

Continue(B)

Benefit

Benefit

CompressedSymbolicTrace

Search Strategy

T

T

Trace

Framework for Data Layout Optimization

T

Page 9: Framework for Profile-Analysis Data-Layout Optimizations

9

Trace Representation

• Problem: reference trace cannot be easily manipulated since it is too

large (>10GB, >100M references).

• Solution: compressed trace (using modified SEQUITUR).

• Example:

- Trace: acbcbcbcbdbdbdbde

• Representation advantage:

- Compact; fits into main memory [ChilimbiPLDI’01].

- Expose repetitions (we use this later).

- It produces a symbolic trace (i.e., a terminal is a data object).

SEQUITUR Representation

SacBBBAAe Bbc

ACC Cbd

Page 10: Framework for Profile-Analysis Data-Layout Optimizations

10

Framework for Data-Layout Optimization

Reference Trace

Data Layout Search Engine

Compile Continue?

End

Compress(T)CST

Data Object Analysis DOA(CST,LS)NLS

Layout Selector LS(NLS,B,CST,SS)DL

Enforce LayoutEL(DL,CST)CST’

EvaluateSimulate(NT)B

“good “and enforceable

layoutsClass Splitting

Linearization

Field ReorderingLayout

Space

Narrowed Space

Search Strategy

Trace

Data Layout

Continue(B)

Benefit

Benefit

CompressedSymbolicTrace

Search Strategy

New Trace

Page 11: Framework for Profile-Analysis Data-Layout Optimizations

11

Avoid re-compilation• Problem: data layout evaluation (edit+compilation+simulation).

• Solution: “pretend” that the program was edited and compiled.

A.x, B, A.z, B

A.x10A.z14B20

30,20,34,20

New concrete trace

Single symbolic trace

CompileRun

(simulate)Edit

program

Enforce Layout

• Symbolic trace + data layout concrete address trace.

A.x30A.z34B20

30,20,34,20

• Simple, but crucial for an efficient search.

User(Optimizer)

Simulate

Page 12: Framework for Profile-Analysis Data-Layout Optimizations

12

Framework for Data-Layout Optimization

Reference Trace

Data Layout Search Engine

Compile Continue?

End

Compress(T)CST

Data Object Analysis DOA(CST,LS)NLS

Layout Selector LS(NLS,B,CST,SS)DL

Enforce LayoutEL(DL,CST)CST’

Evaluate Simulate(CST’)B

“good “and enforceable

layoutsClass Splitting

Linearization

Field ReorderingLayout

Space

Narrowed Space

Search Strategy

Trace

Data Layout

Continue(B)

Benefit

Benefit

CompressedSymbolicTrace

Search Strategy

New Trace

Page 13: Framework for Profile-Analysis Data-Layout Optimizations

13

Memoization: Efficient Trace Simulation

• Evaluation using simulation: MissRateT=Simulate(T);

• Problem: simulation of the whole trace (T) is too expensive.

• Solution: avoids re-simulation of repeated sub-traces.

SEQUITUR Representation

SBBBAA Bbc

ACC Cbd

CSC=Simulate′(C)

CSB=Simulate′(B)

CSA = CSCCSC

CSS = CSBCSBCSBCSACSA T: bcbcbcbdbdbdbd

• Memoization:

1. Simulate each “low level” rule, compute its memoization value.− For cache simulation: memoization value = CacheState [CS].

2. Recursively compose memoization values for “higher” rules.

MissRateT = Length(T)

CSMissess

Page 14: Framework for Profile-Analysis Data-Layout Optimizations

14

Outline• Background and Problem Definition

• Search is a solution, but maybe not feasible

– Making the search practical:• Trace representation

• Avoid recompilation

• Efficient simulation

• Applications

• Summary

Page 15: Framework for Profile-Analysis Data-Layout Optimizations

15

Framework Application (1)• Application: an implementation of the

framework that searches in a sub-space of

the layout space.

• Field Reordering:

– Objective: reduce number of cache misses.

– Sub-space: all possible (legal) orders of fields in

(heap) objects.

– Our search strategy: (almost) exhaustive search.

Page 16: Framework for Profile-Analysis Data-Layout Optimizations

16

Field Reordering: Exhaustive Search

• We compared:

– Best field order found by our iterative search.

– Field orders produced by existing heuristics:

• Fields Temporal Affinity [ChilimbiPLDI’99]

• Fields Access Frequency [TruongPACT’98].

Miss Rate Reduction

-10.00%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

perl twolf boxsim

iteration affinity frequency

Runtime improvement: 0%-4.5%.

Page 17: Framework for Profile-Analysis Data-Layout Optimizations

17

Custom Memory Allocator (CMA)

A

B

APage 1

Page 2 B

A

time

address

A B APage 1

Page 2

B A

time

address

• Objective: reduce number of page faults.

Allocator 1 Allocator 2

Poor locality Good locality

• CMA can work well if it has a good placement function:assigns dynamically allocated heap objects to memory pages (heaps).

Reference trace: ABABA

Page 18: Framework for Profile-Analysis Data-Layout Optimizations

18

CMA Placement Function (PF)malloc(size s){

}

PF: Map objects to heapsPF(heap object)int

• How we can find a placement function using our framework?• A placement function defines a data layout.

• Learn by measuring the benefits of its data layout.• How: use a learning algorithm.

Learner PF(Attributes)int

Use Framework to Evaluate PF

Size

1 2

size<24size24

Decision Tree

Learner

Profiling InformationProfile(Heap objects)

runtime attributes

Page 19: Framework for Profile-Analysis Data-Layout Optimizations

19

CMA Results

Program Number of heaps

Espresso 2

Boxsim 8

Twolf 5

Perl 5

Ghostscript 10

Lp_solve 6

WS Size Reduction1

02468

1012141618

Esp

ress

o

Box

sim

Tw

olf

Per

l

Gho

stS

crip

t

lp_s

olve

Benchmark

Red

uct

ion

%

test input

WS Size Reduction1

0

5

10

15

20

Esp

ress

o

Bo

xsim

Tw

olf

Pe

rl

Gh

ost

Scr

ipt

lp_

solv

e

Re

du

cti

on

%

train input test input

1Relative to original working set size.

Page 20: Framework for Profile-Analysis Data-Layout Optimizations

20

Contributions and Future Work

• Formulate data layout optimization as a search process.

• Build a framework for efficient search process.

• Improve existing optimizations; enable new

optimizations.

• Framework limitations:– Difficult to handle very large traces (>0.5B references).

– Requires some guidance from the programmer (search strategy).

• Future work – Advanced search strategies that combine several optimizations.

– Other non-data-layout optimization – prefetching.