69
Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa Barbara, California 93106 {martin,pedro}@cs.ucsb.edu http://www.cs.ucsb.edu/ ~{martin,pedro}

Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Embed Size (px)

Citation preview

Page 1: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers

Martin C. Rinard

Pedro C. Diniz

University of California, Santa Barbara

Santa Barbara, California 93106

{martin,pedro}@cs.ucsb.edu

http://www.cs.ucsb.edu/~{martin,pedro}

Page 2: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Goal

Develop a Parallelizing Compiler for Object-Oriented Computations

• Current Focus

• Irregular Computations

• Dynamic Data Structures

• Future

• Persistent Data

• Distributed Computations

• New Analysis Technique:

Commutativity Analysis

Page 3: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Structure of Talk

• Model of Computation• Example• Commutativity Testing• Steps To Practicality• Experimental Results• Conclusion

Page 4: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Model of Computation

operationsobjects

initial object state

executing operation new object

state

invoked operations

operation

10

30

10

10

11

Page 5: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Graph Traversal Example

class graph {

int val, sum;

graph *left, *right;

};

void graph::traverse(int v) {

sum += v;

if (left !=NULL) left->traverse(val);

if (right!=NULL) right->traverse(val);

}

10

20

30

40

GoalExecute left and right traverse operations in parallel

Page 6: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Parallel Traversal

10

20

30

40

10

20

30

40

11

20

30

40

11

20

30

40

11

21

31

40

Page 7: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Commuting Operations in Parallel Traversal

11

21

31

45

11

21

31

40

3

15

25

35

4

11

21

31

42

11

21

31

40

15

21

31

40

Page 8: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Model of Computation

• Operations: Method Invocations

• In Example: Invocations of graph::traverse•left->traverse(3)•right->traverse(2)

• Objects: Instances of Classes

• In Example: Graph Nodes

• Instance Variables Implement Object State

• In Example: val, sum, left, right

10

30

Page 9: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Model of Computation

• Operations: Method Invocations

• In Example: Invocations of graph::traverse•left->traverse(3)•right->traverse(2)

• Objects: Instances of Classes

• In Example: Graph Nodes

10

30

Page 10: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Separable Operations

Each Operation Consists of Two Sections

Object SectionOnly Accesses Receiver

Object

Invocation SectionOnly Invokes Operations

Both SectionsCan Access Parameters

Page 11: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Basic Approach

• Compiler Chooses A Computation to Parallelize

• In Example: Entire graph::traverse Computation

• Compiler Computes Extent of the Computation

• Representation of all Operations in Computation

• Current Representation: Set of Methods

• In Example: { graph::traverse }

• Do All Pairs of Operations in Extent Commute?

• No - Generate Serial Code

• Yes - Generate Parallel Code

• In Example: All Pairs Commute

Page 12: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Code GenerationFor Each Method in Parallel Computation

• Augments Class Declaration With Mutual Exclusion Lock • Generates Driver Version of Method

• Invoked from Serial Code to Start Parallel Execution

• Invokes Parallel Version of Operation

• Waits for Entire Parallel Computation to Finish

• Generates Parallel Version of Method

Object Section

• Lock Acquired at Beginning

• Lock Released at End

• Ensure Atomic Execution

Invocation Section

• Invoked Operations

Execute in Parallel

• Invokes Parallel Version

Page 13: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Class Declaration

class graph {

lock mutex;

int val, sum;

graph *left, *right;

};

Driver Versionvoid graph::traverse(int v){

parallel_traverse(v);

wait();

}

Code Generation In Example

Page 14: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Parallel Version In Example

void graph::parallel_traverse(int v) {

mutex.acquire();

sum += v;

mutex.release();

if (left != NULL)

spawn(left->parallel_traverse(val));

if (right != NULL)

spawn(right->parallel_traverse(val));

}

Page 15: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Compiler Structure

Computation Selection

ExtentComputation

Commutativity Testing

Generate ParallelCode

Generate SerialCode

All OperationsCommute

Operations MayNot Commute

Entire Computationof Each Method

Traverse Call Graph to Extract Extent

All Pairs of OperationsIn Extent

Page 16: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Traditional Approach

• Data Dependence Analysis

• Analyzes Reads and Writes

• Independent Pieces of Code Execute in Parallel

• Demonstrated Success for Array-Based Programs

Page 17: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Data Dependence Analysis in Example

• For Data Dependence Analysis To Succeed in Example

• left and right traverse Must Be Independent

• left and right Subgraphs Must Be Disjoint

• Graph Must Be a Tree

• Depends on Global Topology of Data Structure

• Analyze Code that Builds Data Structure

• Extract and Propagate Topology Information

• Fails For Graphs

Page 18: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Properties of Commutativity Analysis

• Oblivious to Data Structure Topology

• Local Analysis

• Simple Analysis

• Wide Range of Computations

• Lists, Trees and Graphs

• Updates to Central Data Structure

• General Reductions

• Introduces Synchronization

• Relies on Commuting Operations

Page 19: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Commutativity Testing

Page 20: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Commutativity Testing Conditions

• Do Two Operations A and B Commute?

• Compiler Considers Two Execution Orders

• A;B - A executes before B

• B;A - B executes before A

• Compiler Must Check Two Conditions

Instance Variables

New values of instance variables are same in both

execution orders

Invoked Operations

A and B together directly invoke same set of operations in both execution orders

Page 21: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Commutativity Testing Conditions

40

40

40

42

43

45

Page 22: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Commutativity Testing Algorithm

• Symbolic Execution:

• Compiler Executes Operations

• Computes with Expressions not Values

• Compiler Symbolically Executes Operations

In Both Execution Orders

• Expressions for New Values of Instance Variables

• Expressions for Multiset of Invoked Operations

Page 23: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Expression Simplification and Comparison

• Compiler Applies Rewrite Rules to Simplify Expressions•a*(b+c) a*b)+(a*c)•b+(a+c) (a+b+c)•a+if(b<c,d,e) if(b<c,a+d,a+e)

• Compiler Compares Corresponding Expressions

• If All Equal - Operations Commute

• If Not All Equal - Operations May Not Commute

Page 24: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Commutativity Testing Example

•Two Operationsr->traverse(v1) and r->traverse(v2)

• In Order r->traverse(v1);r->traverse(v2)

Instance Variables

New sum=

(sum+v1)+v2

Invoked Operationsif(right!=NULL,right->traverse(val)),

if(left!=NULL,left->traverse(val)),

if(right!=NULL,right->traverse(val)),

if(left!=NULL,left->traverse(val))

• In Order r->traverse(v2);r->traverse(v1)

Instance Variables

New sum=(sum+v2)+v1

Invoked Operationsif(right!=NULL,right->traverse(val)),

if(left!=NULL,left->traverse(val)),

if(right!=NULL,right->traverse(val)),

if(left!=NULL,left->traverse(val))

Page 25: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Important Special Case

• Independent Operations Commute

• Analysis in Current Compiler

• Dependence Analysis

• Operations on Objects of Different Classes

• Independent Operations on Objects of Same Class

• Symbolic Commutativity Testing

• Dependent Operations on Objects of Same Class

• Future

• Integrate Pointer or Alias Analysis

• Integrate Array Data Dependence Analysis

Page 26: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Important Special Case

• Independent Operations Commute

• Conditions for Independence

• Operations Have Different Receivers

• Neither Operation Writes an Instance Variable that Other Operation Accesses

• Detecting Independent Operations

• In Type-Safe Languages

• Class Declarations

• Instance Variable Accesses

• Pointer or Alias Analysis

Page 27: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Analysis in Current Compiler

• Dependence Analysis

• Operations on Objects of Different Classes

• Independent Operations on Objects of Same Class

• Symbolic Commutativity Testing

• Dependent Operations on Objects of Same Class

• Future

• Integrate Pointer or Alias Analysis

• Integrate Array Data Dependence Analysis

Page 28: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Steps to Practicality

Page 29: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Programming Model Extensions

• Extensions for Read-Only Data

• Allow Operations to Freely Access Read-Only Data

• Enhances Ability of Compiler to Represent Expressions

• Increases Set of Programs that Compiler can Analyze

• Analysis Granularity Extensions

• Integrate Operations into Callers for Analysis Purposes

• Coarsens Commutativity Testing Granularity

• Reduces Number of Pairs Tested for Commutativity

• Enhances Effectiveness of Commutativity Testing

Page 30: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Optimizations

• Synchronization Optimizations

• Eliminate Synchronization Constructs in Methods that Only Access Read-Only Data

• Reduce Number of Acquire and Release Constructs

• Parallel Loop Optimization

• Suppress Exploitation of Excess Concurrency

Page 31: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Extent Constants

Motivation: Allow Parallel Operations to Freely Access Read-Only Data

• Extent Constant Variable Global variable or instance variable written by no operation in extent

• Extent Constant Expression Expression whose value depends only on extent constant variables or parameters

• Extent Constant Value Value computed by extent constant expression

• Extent Constant Automatically generated opaque constant used to represent an extent constant value

• Requires: Interprocedural Data Usage Analysis

• Result Summarizes How Operations Access Instance Variables

• Interprocedural Pointer Analysis for Reference Parameters

Page 32: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Extent Constant Variables In Example

void graph::traverse(int v) {

sum += v;

if (left != NULL) left->traverse(val);

if (right != NULL) right->traverse(val);

}

Extent Constant Variable

Extent Constant Variable

Page 33: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Advantages of Extent Constants

• Extent Constants Extend Programming Model

• Enable Direct Global Variable Access

• Enable Direct Access of Objects other than Receiver

• Extent Constants Make Compiler More Effective

• Enable Compact Representations of Large Expressions

• Enable Compiler to Represent Values Computed by Otherwise Unanalyzable Constructs

Page 34: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Auxiliary Operations

Motivation: Coarsen Granularity of Commutativity Testing

• An Operation is an Auxiliary Operation if its Entire Computation

• Only Computes Extent Constant Values

• Only Externally Visible Writes are to Local Variables of Caller

• Auxiliary Operations are Conceptually Part of Caller

• Analysis Integrates Auxiliary Operations into Caller

• Represents Computed Values using Extent Constants

• Requires:

• Interprocedural Data Usage Analysis

• Interprocedural Pointer Analysis for Reference Parameters

• Intraprocedural Reaching Definition Analysis

Page 35: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Auxiliary Operation Example

int graph::square_and_add(int v) {

return(val*val + v);

}

void graph::traverse(int v) {

sum += square_and_add(v);

if (left != NULL) left->traverse(val);

if (right != NULL) right->traverse(val);

}

Extent Constant Expression

Extent Constant VariableParameter

Page 36: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Advantages of Auxiliary Operations

• Coarsen Granularity of Commutativity Testing

• Reduces Number of Pairs Tested for Commutativity

• Enhances Effectiveness of Commutativity Testing Algorithm

• Support Modular Programming

Page 37: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Synchronization Optimizations

• Goal: Eliminate or Reduce Synchronization Overhead

• Synchronization Elimination

Data

Use One Lock for

Multiple Objects

Computation

Generate One Lock Acquire and

Release for Multiple Operations

on the Same Object

An Operation Only

Computes Extent

Constant Values

Compiler Does Not

Generate Lock Acquire and Release

ThenIf

• Lock Coarsening

Page 38: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Data Lock Coarsening Example

class vector { lock mutex; double val[NDIM];}void vector::add(double *v){ mutex.acquire(); for(int i=0; i < NDIM; i++) val[i] += v[i]; mutex.release();}class body { lock mutex; double phi; vector acc;};void body::gravsub(body *b){ double p, v[NDIM]; mutex.acquire(); p = computeInter(b,v); phi -= p; mutex.release(); acc.add(v);}

class vector { double val[NDIM];}void vector::add(double *v){ for(int i=0; i < NDIM; i++) val[i] += v[i]; }class body { lock mutex; double phi; vector acc;};void body::gravsub(body *b){ double p, v[NDIM]; mutex.acquire(); p = computeInter(b,v); phi -= p; acc.add(v); mutex.release();}

Original Code Optimized Code

Page 39: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Computation Lock Coarsening Example

class body { lock mutex; double phi; vector acc;};void body::gravsub(body *b){ double p, v[NDIM]; mutex.acquire(); p = computeInter(b,v); phi -= p; acc.add(v); mutex.release();}void body::loopsub(body *b){ int i;

for (i = 0; i < N; i++) { this->gravsub(b+i);

}

}

class body { lock mutex; double phi; vector acc;};void body::gravsub(body *b){ double p, v[NDIM];

p = computeInter(b,v); phi -= p; acc.add(v);

}void body::loopsub(body *b){ int i; mutex.acquire();

for (i = 0; i < N; i++) { this->gravsub(b+i);

} mutex.release();}

Original Code Optimized Code

Page 40: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Parallel Loops

Goal: Generate Efficient Code for Parallel Loops

If A Loop is in the Following Form

for (i = exp1; i < exp2; i += exp3) {

exp4->op(exp5,exp6, ...);

}

Where exp1, exp2, ... Extent Constant Expressions

Then Compiler Generates Parallel Loop Code

Page 41: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Parallel Loop Optimization

• Without Parallel Loop Optimization

• Each Loop Iteration Generates a Task

• Tasks are Created and Scheduled Sequentially

• Each Iteration Incurs Task Creation and Scheduling Overhead

• With Parallel Loop Optimization

• Generated Code Immediately Exposes All Iterations

• Scheduler Operates on Chunks of Loop Iterations

• Each Chunk of Iterations Incurs Scheduling Overhead

• Advantages

• Enables Compact Representation for Loop Computation

• Reduces Task Creation and Scheduling Overhead

• Parallelizes Overhead

Page 42: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Suppressing Excess Concurrency

Goal: Reduce Overhead of Exploiting Parallelism

• Goal Achieved by Generating Computations that

• Execute Operations Serially with No Parallelization Overhead

• Use Synchronization Required to Execute Safely in Parallel Context

• Mechanism: Mutex Versions of Methods

Object Section

• Acquires Lock at Beginning

• Releases Lock at End

Invocation Section

• Operations Execute Serially

• Invokes Mutex Version

• Current Policy:

• Each Parallel Loop Iteration Invokes Mutex Version of Operation

• Suppresses Parallel Execution Within Iterations of Parallel Loops

Page 43: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Experimental Results

Page 44: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Methodology• Built Prototype Compiler

• Built Run Time System

• Concurrency Generation and Task Management

• Dynamic Load Balancing

• Synchronization

• Acquired Two Complete Applications

• Barnes-Hut N-Body Solver

• Water Code

• Automatically Parallelized Applications

• Ran Applications on Stanford DASH Machine

• Compare Performance with Highly Tuned, Explicitly Parallel Versions from SPLASH-2 Benchmark Suite

Page 45: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Prototype Compiler

• Clean Subset of C++

• Sage++ is Front End

• Structured As a Source-To-Source Translator

• Analysis Finds Parallel Loops and Methods

• Compiler Generates Annotation File

• Identifies Parallel Loops and Methods

• Classes to Augment with Locks

• Code Generator Reads Annotation File

• Generates Parallel Versions of Methods

• Inserts Synchronization and Parallelization Code

• Parallelizes Unannotated Programs

Page 46: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Major Restrictions

Motivation: Simplify Implementation of Prototype

• No Virtual Methods

• No Operator or Method Overloading

• No Multiple Inheritance or Templates

• No typedef, struct, union or enum types

• Global Variables must be Class Types

• No Static Members or Pointers to Members

• No Default Arguments or Variable Numbers of Arguments

• No Operation Accesses a Variable Declared in a Class from which its Receiver Class Inherits

Page 47: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Run Time Library

Motivation: Provide Basic Concurrency Managment

• Single Program, Multiple Data Execution Model

• Single Address Space

• Alternate Serial and Parallel Phases

• Library Provides

• Task Creation and Synchronization Primitives

• Dynamic Load Balancing

• Implemented

• Stanford DASH Shared-Memory Multiprocessor

• SGI Shared-Memory Multiprocessors

Page 48: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Applications

• Barnes-Hut

• O(NlgN) N-Body Solver

• Space Subdivision Tree

• 1500 Lines of C++ Code

• Water

• Simulates Liquid Water

• O(N^2) Algorithm

• 1850 Lines of C++ Code

Page 49: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Obtaining Serial C++ Version of Barnes-Hut

• Started with Explicitly Parallel Version (SPLASH-2) • Removed Parallel Constructs to get Serial C

• Converted to Clean Object-Based C++

• Major Structural Changes

• Eliminated Scheduling Code and Data Structures

• Split a Loop in Force Computation Phase

• Introduced New Field into Particle Data Structure

Page 50: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Obtaining Serial C++ Version of Water

• Started with Serial C translated from FORTRAN

• Converted to Clean Object-Based C++

• Major Structural Change

• Auxiliary Objects for O(N^2) phases

Page 51: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Commutativity Statistics for Barnes-Hut

Position(3 Methods)

Force(6 Methods)

Velocity (3 Methods)

20

15

10

5

Symbolically Executed Pairs

Independent PairsP

airs

Tes

ted

for

Com

mut

ativ

ity

Parallel Extent

Page 52: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Auxiliary Operation Statistics for Barnes-Hut

Position(3 Methods)

Force(6 Methods)

Velocity (3 Methods)

15

10

5

Call SitesA

uxili

ary

Ope

ratio

n C

all S

ites

Parallel Extent

Page 53: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Performance Results for Barnes-Hut

Spe

edup

0 8 16 24 320

JJJ

H

JJ

J

HH

J

JJHJ

HH

H

H

H

H8

16

24

32

Ideal

CommutativityAnalysis

J

H SPLASH-2

Barnes-Hut on DASHData Set - 8K Particles

Number of Processors

Barnes-Hut on DASHData Set - 16K Particles

Number of Processors

0 8 16 24 320

HHH

JJJ

JJ

JJ

JJJ

HH

H

H

H

H

H8

16

24

32

Ideal

CommutativityAnalysis

J

H SPLASH-2

Spe

edup

Page 54: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Performance Analysis

Motivation: Understand Behavior of Parallelized Program

• Instrumented Code to Measure Execution Time Breakdowns

Parallel Idle - Time Spent Idle in Parallel Section

Serial Idle - Time Spent Idle in a Serial Section

Blocked - Time Spent Waiting to Acquire a Lock Held by Another Processor

Parallel Compute - Time Spent Doing Useful Work in a Parallel Section

Serial Compute - Time Spent Doing Useful Work in a Serial Section

Page 55: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Performance Analysis for Barnes-Hut

Barnes-Hut on DASHData Set - 16K Particles

Barnes-Hut on DASHData Set - 8K Particles

1 2 4 8 16 24 320

20

40

60

80

100

120

Cum

ulat

ive

Tot

al T

ime

(sec

onds

)

Number of Processors

1 2 4 8 16 24 320

50

100

150

200

250

300

Cum

ulat

ive

Tot

al T

ime

(sec

onds

)

Number of Processors

Serial Compute

Parallel Compute

Blocked

Serial Idle

Parallel Idle

Page 56: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Performance Results for Water

JJJJJ

HH

HH

H

H

H

H J

H

JJ

HJJ0

8

16

24

32

0 8 16 24 32

JJJJJH

HH

JJJJJ

HH

HH

H

H

H

0

8

16

24

32

0 8 16 24 32

Water on DASHData Set - 343 Molecules

Number of Processors

Water on DASHData Set - 512 Molecules

Number of ProcessorsS

peed

up

Ideal

CommutativityAnalysis

J

H SPLASH-2Ideal

CommutativityAnalysis

J

H SPLASH-2

Spe

edup

Page 57: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Performance Results for Computation Replication Version of Water

HH

HH

HH

H

HHH

JJ

JJ

JJ

JJJJ

0 8 16 24 320

8

16

24

32

HH

HH

HH

H

HHH

JJ

JJ

JJ

JJJJ

0 8 16 24 320

8

16

24

32

Spe

edup

Spe

edup

Water on DASHData Set - 512 Molecules

Number of Processors

Water on DASHData Set - 343 Molecules

Number of Processors

Ideal

CommutativityAnalysis

J

H SPLASH-2Ideal

CommutativityAnalysis

J

H SPLASH-2

Page 58: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Commutativity Statistics for Water

Virtual (3 Methods)

15

10

5

Forces (2 Methods)

Loading (4 Methods)

Momenta (2 Methods)

Energy (5 Methods)

Symbolically Executed Pairs

Independent Pairs

Pai

rs T

este

d fo

r C

omm

utat

ivity

Parallel Extent

Page 59: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Auxiliary Operation Statistics for Water

Virtual (3 Methods)

15

10

5

Forces (2 Methods)

Loading (4 Methods)

Momenta (2 Methods)

Energy (5 Methods)

Call Sites

Aux

iliar

y O

pera

tion

Cal

l Site

s

Parallel Extent

Page 60: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Performance Analysis for Water

Water on DASHData Set - 343 molecules

Water on DASHData Set - 512 molecules

1 2 4 8 16 24 320

100

200

300

400

500

600

Cum

ulat

ive

Tot

al T

ime

(sec

onds

)

Number of Processors

1 2 4 8 16 24 320

200

400

600

800

1000

1200

1400

Cum

ulat

ive

Tot

al T

ime

(sec

onds

)

Number of Processors

Serial Compute

Parallel Compute

Blocked

Serial Idle

Parallel Idle

Page 61: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Future Work

• Relative Commutativity

• Integrate Other Analysis Frameworks

• Pointer or Alias Analysis

• Array Data Dependence Analysis

• Analysis Problems

• Synchronization Optimizations

• Analysis Granularity Optimizations

• Generation of Self-Tuning Code

• Message Passing Implementation

Page 62: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

• Bernstein (IEEE Transactions on Computers 1966)

• Dependence Analysis for Pointer-Based Data Structures

• Reduction Analysis• Ghuloum and Fisher (PPOPP 95)

• Pinter and Pinter (POPL 92)

• Callahan (LCPC 91)

• Commuting Operations in Parallel Languages• Rinard and Lam (PPOPP 91)

• Steele (POPL 90)

• Barth, Nikhil and Arvind (FPCA 91)

Related Work

• Landi, Ryder and Zhang (PLDI 93)

• Hendren, Hummel and Nicolau (PLDI 92)

• Plevyak, Karamcheti and Chien (LCPC 93)

• Chase, Wegman and Zadek (PLDI 90)

• Larus and Hilfinger (PLDI 88)

• Ghiya and Hendren (POPL 96)

• Ruf (PLDI 95)

• Wilson and Lam (PLDI 95)

• Deutsch (PLDI 94)

• Choi, Burke and Carini (POPL 93)

Page 63: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Conclusions

Page 64: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Conclusion

• Commutativity Analysis

• New Analysis Framework for Parallelizing Compilers

• Basic Idea

• Recognize Commuting Operations

• Generate Parallel Code

• Current Focus

• Dynamic, Pointer-Based Data Structures

• Good Initial Results

• Future

• Persistent Data

• Distributed Computations

Page 65: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Latest Version of Paper

http://www.cs.ucsb.edu/~martin/paper/pldi96.ps

Page 66: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

What if Operations Do Not Commute?

• Parallel Tree Traversal

• Example: Distance of Node from Root

class tree {

int distance;

tree *left;

tree *right;

};

tree::set_distance(int d) {

distance = d;

if (left != NULL) left->set_distance(d+1);

if (right != NULL) right->set_distance(d+1);

}

Page 67: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Equivalent Computation with Commuting Operations

tree::sum_distance(int d) {

distance = distance + d;

if (left != NULL) left->sum_distance(d+1);

if (right != NULL) right->sum_distance(d+1);

}

tree::zero_distance() {

distance = 0;

if (left != NULL) left->zero_distance();

if (right != NULL) right->zero_distance();

}

tree::set_distance(int d) {

zero_distance();

sum_distance(d);

}

Page 68: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Theoretical Result

• For Any Tree Traversal on Data With• A Commutative Operator (for example +) that has• A Zero Element (for example 0)

• There Exists A Program P such that• P Computes the Traversal • Commutativity Analysis Can Automatically Parallelize P

• Complexity Results:• Program P is asymptotically optimal if the Data Struture is

a Perfectly Balanced Tree• Program P has complexity O(N^2) if the Data Structure is

a Linked-List

Page 69: Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz University of California, Santa Barbara Santa

Pure Object-Based Model of Computation

• Goal

• Obtain a Powerful, Clean Model of Computation

• Enable Compiler to Analyze Program

• Objects: Instances of Classes

• Implement State with Instance Variables

• Primitive Types from Underlying Language (int, ...)

• References to Other Objects

• Nested Objects

• Operations: Invocations of Methods

• Each Operation Has Single Receiver Object