46
A Pattern Language for Parallel Programming Beverly Sanders University of Florida

A Pattern Language for Parallel Programming Beverly Sanders University of Florida

Embed Size (px)

Citation preview

A Pattern Language for Parallel Programming

Beverly Sanders

University of Florida

Overview of talkOverview of talk

History of pattern languagesHistory of pattern languages

Motivation for a pattern language for Motivation for a pattern language for parallel programmingparallel programming

Pattern example: Reengineering for Pattern example: Reengineering for ParallelismParallelism

Tour through the entire pattern language Tour through the entire pattern language via a programming examplevia a programming example

History ‘60s and ‘70sHistory ‘60s and ‘70s

Berkeley architecture Berkeley architecture professor Christopher professor Christopher AlexanderAlexander

253 patterns for city 253 patterns for city planning, landscaping, and planning, landscaping, and architecturearchitecture

Attempted to capture Attempted to capture principles for “living” design.principles for “living” design.

Example: Six Foot BalconyExample: Six Foot Balcony

Balconies less than six feet deep are hardly ever Balconies less than six feet deep are hardly ever used used

Discussion of what makes a good balconyDiscussion of what makes a good balcony

Therefore: Therefore: Whenever you build a balcony or a Whenever you build a balcony or a porch, always make it at least six feet deep. porch, always make it at least six feet deep. If possible, recess at least a part of it into the If possible, recess at least a part of it into the building so that it is not cantilevered out and building so that it is not cantilevered out and separated from the building by a simple line, separated from the building by a simple line, and enclose it partiallyand enclose it partially

A new approach to designA new approach to design

Not just a collection of patterns, but a Not just a collection of patterns, but a pattern languagepattern language– Patterns lead to other patternsPatterns lead to other patterns– Patterns are hierarchical and compositionalPatterns are hierarchical and compositional– Embodies design methodology and Embodies design methodology and

vocabularyvocabulary

Small impact on architectural practiceSmall impact on architectural practice

Patterns in Object-oriented Patterns in Object-oriented ProgrammingProgramming

OOPSLA’87 Kent Beck and Ward CunninghamOOPSLA’87 Kent Beck and Ward Cunningham

1995 Design Patterns: Elements of Reusable 1995 Design Patterns: Elements of Reusable Object-Oriented SoftwareObject-Oriented SoftwareGang of Four (GOF)Gang of Four (GOF): : Gamma, Helm, Johnson, Gamma, Helm, Johnson, Vlissides, Vlissides, – catalog of patternscatalog of patterns– Creation, structural, behavioralCreation, structural, behavioral

PLoP ConferencesPLoP Conferences

GOF Pattern ExampleGOF Pattern Example

Behavioral Pattern: VisitorBehavioral Pattern: Visitor– Separate the structure of an object collection Separate the structure of an object collection

from the operations performed on that from the operations performed on that collection.collection.

– Example: Abstract syntax tree in a compilerExample: Abstract syntax tree in a compilerMultiple node types (declaration, command, Multiple node types (declaration, command, expression, etc.)expression, etc.)

Action during traversal depends on both type of node Action during traversal depends on both type of node and compiler pass (type checking, code generation)and compiler pass (type checking, code generation)

Can add new functionality by implementing new Can add new functionality by implementing new visitor without modifying AST code.visitor without modifying AST code.

Impact of GOF bookImpact of GOF book

Good solutions to frequently recurring Good solutions to frequently recurring problemsproblems

New vocabularyNew vocabulary

Pattern catalogPattern catalog

Significant influence on object-oriented Significant influence on object-oriented programming!programming!

Design PatternDesign Pattern

High quality solution to frequently recurring High quality solution to frequently recurring problem in some domainproblem in some domain

Each pattern has a name, providing a Each pattern has a name, providing a vocabulary for discussing the solutionsvocabulary for discussing the solutions

Written in prescribed format to allow the Written in prescribed format to allow the reader to quickly understand the solution reader to quickly understand the solution and its contextand its context

A pattern formatA pattern format

Name Name

Also known asAlso known as

ProblemProblem

ContextContext

ForcesForces

SolutionSolution

Examples and known usesExamples and known uses

Related patternsRelated patterns

……

Pattern LanguagePattern Language

Carefully structured collection of patternsCarefully structured collection of patterns

Structure embodies a design methodology Structure embodies a design methodology and leads user through the language so and leads user through the language so that complex designs can be developed that complex designs can be developed using patternsusing patterns

Provides domain specific advice to the Provides domain specific advice to the designerdesigner

Not a programming languageNot a programming language

Parallel ProgrammingParallel Programming

Parallel hardware becoming increasingly Parallel hardware becoming increasingly mainstream and inexpensivemainstream and inexpensive– Multicore CPUs in desktop PCs and serversMulticore CPUs in desktop PCs and servers– ClustersClusters

Software to fully exploit the hardware currently Software to fully exploit the hardware currently rare (except specialized area of high rare (except specialized area of high performance computing)performance computing)Can a pattern language providing guidance for Can a pattern language providing guidance for the entire development process make parallel the entire development process make parallel programming easier?programming easier?

Structure of the pattern Structure of the pattern languagelanguage

– Reengineering for ParallelismReengineering for Parallelism pattern for pattern for dealing with legacy sequential codedealing with legacy sequential code

4 Design spaces4 Design spaces– Finding ConcurrencyFinding Concurrency

Help designer expose exploitable concurrency—Help designer expose exploitable concurrency—find high level task and data decompositionfind high level task and data decomposition

– Algorithm StructureAlgorithm StructureHelp designer map tasks to processes or threads Help designer map tasks to processes or threads to best take advantage of the potential to best take advantage of the potential concurrencyconcurrency

Structure of the pattern Structure of the pattern language, continuedlanguage, continued

– Supporting StructuresSupporting StructuresCode structuring patternsCode structuring patternsDistributed and thread-safe data structuresDistributed and thread-safe data structures

– Implementation MechanismsImplementation MechanismsLow level mechanisms used to write parallel Low level mechanisms used to write parallel programs programs 3 categories of mechanisms3 categories of mechanisms

– UE (process/thread) ManagementUE (process/thread) Management– SynchronizationSynchronization– CommunicationCommunication

Starting with legacy sequential Starting with legacy sequential application?application?

Reengineering for Parallelism pattern Reengineering for Parallelism pattern provides guidance toprovides guidance to– Manage the process Manage the process – Determine what to change Determine what to change

We’ll look at this as an example patternWe’ll look at this as an example pattern

Reengineering for ParallelismReengineering for Parallelism

Problem: Problem: How can existing applications be parallelized using PLPP to improve How can existing applications be parallelized using PLPP to improve

performance by making use of parallel hardware?performance by making use of parallel hardware?Context: Context: We have legacy code that cannot be rewritten from scratch, need to We have legacy code that cannot be rewritten from scratch, need to

improve performance…improve performance…Forces: Forces: – User base has expectations for behaviorUser base has expectations for behavior– Existing application may not be fully understoodExisting application may not be fully understood– Amdahl’s law pushes programmer to avoid sequential bottlenecks at Amdahl’s law pushes programmer to avoid sequential bottlenecks at

any cost, which may imply wholesale restructuring of the programany cost, which may imply wholesale restructuring of the program– Starting point is working code that embodies significant programming Starting point is working code that embodies significant programming

work, bug fixes, and knowledge. Minimizing changes is desirable. It is work, bug fixes, and knowledge. Minimizing changes is desirable. It is rarely feasible to make sweeping rewrites.rarely feasible to make sweeping rewrites.

– Concurrency introduces new classes of errors that are hard to detect Concurrency introduces new classes of errors that are hard to detect and make software difficult to validate.and make software difficult to validate.

Solution:PreparationSolution:Preparation

Survey the landscapeSurvey the landscape– Pattern provides a list of questions to help assess Pattern provides a list of questions to help assess

existing codeexisting code– Many are the same as in any reengineering projectMany are the same as in any reengineering project– Is program numerically well-behaved?Is program numerically well-behaved?

Define the scope and get users’ buy-inDefine the scope and get users’ buy-in– Required precision of resultsRequired precision of results– Input rangeInput range– PerformancePerformance– Feasibility (back of envelope calculations)Feasibility (back of envelope calculations)

Define a testing protocolDefine a testing protocol

Solution: ContinuedSolution: Continued

Identify hot spots—where is most of the time Identify hot spots—where is most of the time spent?spent?– Look at codeLook at code– Use profiling toolsUse profiling tools

ParallelizationParallelization– Start with hot spots firstStart with hot spots first– As much as possible, make sequence of small As much as possible, make sequence of small

changes, each followed by testingchanges, each followed by testing– Use PLPP patterns (pattern provides guidance)Use PLPP patterns (pattern provides guidance)

Reengineering for Parallelism Reengineering for Parallelism Pattern, continuedPattern, continued

Extended exampleExtended example

Discussion of related patternsDiscussion of related patterns– Patterns for legacy codePatterns for legacy code– Patterns for parallel programmingPatterns for parallel programming

Example: Molecular dynamicsExample: Molecular dynamics

Simulate motion in large molecular systemSimulate motion in large molecular system

Example application: how protein Example application: how protein interacts with druginteracts with drug

Forces Forces – Bonded forces within a moleculeBonded forces within a molecule– Long-range forces between moleculesLong-range forces between molecules

Not tractable NNot tractable N22

Use cutoff method—only consider forces from Use cutoff method—only consider forces from neighbors that are “close enough”neighbors that are “close enough”

Sequential Molecular dynamics Sequential Molecular dynamics simulationsimulation

real atoms(3,N)real atoms(3,N)

real force(3,N)real force(3,N)

int neighbors(2,M)int neighbors(2,M)

loop over time stepsloop over time stepsCompute bonded forcesCompute bonded forces

Compute neighborsCompute neighbors

Compute long-range forcesCompute long-range forces

Update position …Update position …

end loopend loop

Starting with legacy sequential Starting with legacy sequential code?code?

If so start with the If so start with the

Reengineering for ParallelismReengineering for Parallelism pattern pattern

Next: Next: Finding Concurrency Design SpaceFinding Concurrency Design Space

Decomposition Patterns

Finding Concurrency Design Finding Concurrency Design SpaceSpace

Dependency Analysis Patterns

Design Evaluation

Decomposition Patterns

Finding Concurrency Design Finding Concurrency Design SpaceSpace

Dependency Analysis Patterns

Design Evaluation

Task Decomposition Data Decomposition

Molecular dynamics Molecular dynamics decompositiondecomposition

Each function is a loop over atomsEach function is a loop over atomsSuggests task decomposition with each Suggests task decomposition with each task corresponding to a loop iteration task corresponding to a loop iteration (update of an atom)(update of an atom)– tasks for bonded forces tasks for bonded forces – tasks for long -range forcestasks for long -range forces– tasks to update positionstasks to update positions– tasks to compute neighbor list tasks to compute neighbor list

Data shared between the tasksData shared between the tasks

Decomposition Patterns

Finding Concurrency Design Finding Concurrency Design SpaceSpace

Dependency Analysis Patterns

Design Evaluation

Group tasks Order tasks Data Sharing

Molecular dynamics Molecular dynamics dependency analysisdependency analysis

Bonded forcesNeighbor list

Update position

Long-range forces

next time step

Molecular dynamics Molecular dynamics dependency analysisdependency analysis

Bonded forcesNeighbor list

Update position

Long-range forces

next time step

atoms(3,N)

forces(3,N)

neighbors

Read

Write

Accumulate

Decomposition Patterns

Finding Concurrency Design Finding Concurrency Design SpaceSpace

Dependency Analysis Patterns

Design Evaluation

Suitability for target platformDesign Quality (flexibility, efficiency, simplicity)

Preparation for the next phase

Design evaluation for molecular Design evaluation for molecular dynamicsdynamics

Target architecture for example: distributed Target architecture for example: distributed memory cluster, message passingmemory cluster, message passing

Data sharing has enough special properties Data sharing has enough special properties (read only, accumulate, temporal constraints) (read only, accumulate, temporal constraints) that we should be able to make it work in a that we should be able to make it work in a distributed memory environment.distributed memory environment.

Design seems OK, move to next design spaceDesign seems OK, move to next design space

Algorithm structure design Algorithm structure design spacespace

Map tasks to Units of Execution (threads Map tasks to Units of Execution (threads or processes)or processes)

Target platform propertiesTarget platform properties– number of UEs number of UEs – communication between UEscommunication between UEs

Major organizing principleMajor organizing principle

Organize by tasks?Organize by tasks?

Recursive?

Task Parallelism

Divide and Conqueryes

no

Organize by data?Organize by data?

Recursive?

GeometricDecomposition

Recursive Datayes

no

Organize by ordering?Organize by ordering?

Regular?

Event-based Coordination

Pipeline

yes

no

Algorithm structure for Algorithm structure for molecular dynamicsmolecular dynamics

Organized by taskOrganized by task

Task decomposition patternTask decomposition pattern– GranularityGranularity: decide bonded forces not worth : decide bonded forces not worth

parallelizing now. parallelizing now. – Load balancingLoad balancing: static OK, partition : static OK, partition

iterations of original loop (over atoms) to UEsiterations of original loop (over atoms) to UEs– TerminationTermination: easy since number of UEs can : easy since number of UEs can

be determined in advancebe determined in advance

Separable dependenciesSeparable dependencies

Multiple tasks update force array Multiple tasks update force array concurrently by adding to its value.concurrently by adding to its value.This type of update is called This type of update is called accumulationaccumulationAllows dependencies to be separated Allows dependencies to be separated from concurrent part of computationfrom concurrent part of computation– Each UE gets a local copy of dataEach UE gets a local copy of data– Updates local copyUpdates local copy– After local updates completed, After local updates completed, reducereduce

(combine results using associative operator)(combine results using associative operator)

Supporting Structures Design Supporting Structures Design SpaceSpace

A intermediate stage between algorithm A intermediate stage between algorithm structures and implementation structures and implementation mechanisms (Similar level to GOF)mechanisms (Similar level to GOF)

Program structuring patterns Program structuring patterns – SPMD, Fork/Join, Loop Parallelism, SPMD, Fork/Join, Loop Parallelism,

Master/WorkerMaster/Worker

Data structuresData structures– Shared queue, Distributed array, Shared dataShared queue, Distributed array, Shared data

Choose SPMD PatternChoose SPMD Pattern

Single program multiple data. Single program multiple data. – Each UE executes exactly the same programEach UE executes exactly the same program– Uses process ID to determine behaviorUses process ID to determine behavior– Issues: replicate or partition data, Issues: replicate or partition data,

computation?computation?

Replicate or partition data in MD?Replicate or partition data in MD?– Replicate atoms, force. Replicate atoms, force. – Partition neighbor listPartition neighbor list

Duplicate non-parallelized parts of Duplicate non-parallelized parts of computation, or designate one process to computation, or designate one process to compute?compute?– Duplicate all computation except I/O.Duplicate all computation except I/O.

Parallel SimulationParallel Simulation realreal atoms(3,N) atoms(3,N)

real force(3,N)real force(3,N) int neighbors(2,M)int neighbors(2,M) myID = getProcessIDmyID = getProcessID nprocs = getNumberProcessesnprocs = getNumberProcesses

loop over time stepsloop over time steps

Compute bonded forces //replicate computationCompute bonded forces //replicate computationCompute neighbors Compute neighbors //only for atoms assigned to myID//only for atoms assigned to myIDCompute long range forcesCompute long range forcesglobalSum(N,&forces) globalSum(N,&forces) //reduction to combine all force arrays//reduction to combine all force arrays

Update position …Update position …end loopend loopif (myid == printID) printResultsif (myid == printID) printResults

Implementation MechanismsImplementation Mechanisms

Describes low level mechanisms used to Describes low level mechanisms used to write parallel programs write parallel programs

3 categories of mechanisms3 categories of mechanisms– UE (process/thread) ManagementUE (process/thread) Management– SynchronizationSynchronization– CommunicationCommunication

Not in pattern formatNot in pattern format

We discuss OpenMP, MPI, and JavaWe discuss OpenMP, MPI, and Java

Implement SimulationImplement Simulation

For our target platform, MPI is the best For our target platform, MPI is the best choicechoice

Need to add standard code for initialization Need to add standard code for initialization and MPI specific reduction operator to and MPI specific reduction operator to previous solutionprevious solution

Pattern languages evolvePattern languages evolve

A pattern language should not be A pattern language should not be considered a static document: considered a static document: – Evaluate and revise Evaluate and revise – Extend with new patterns: new parallel Extend with new patterns: new parallel

programming models, specific application programming models, specific application domainsdomains

– We added the Reengineering for Parallelism We added the Reengineering for Parallelism pattern as a result of feedback from readerspattern as a result of feedback from readers

For more informationFor more information

Mattson, Sanders, and Mattson, Sanders, and Massingill. Massingill. Patterns for Patterns for Parallel ProgrammingParallel Programming. . Addison-Wesley Software Addison-Wesley Software Patterns Series. 2005Patterns Series. 2005

Reengineering for Reengineering for Parallelism, PLoP05Parallelism, PLoP05

www.cise.ufl.edu/research/ParallelPatternswww.cise.ufl.edu/research/ParallelPatterns

Acknowledgements and Acknowledgements and collaboratorscollaborators

The pattern language is joint work with: The pattern language is joint work with: – Tim Mattson, IntelTim Mattson, Intel– Berna Massingill, Trinity UniversityBerna Massingill, Trinity University

Supported by NSF and IntelSupported by NSF and Intel