View
55
Download
0
Category
Tags:
Preview:
DESCRIPTION
Bulk Synchronous Parallel Processing Model. Jamie Perkins. Overview. Four W’s – Who, What, When and Why Goals for BSP BSP Design and Program Cost Functions Languages and Machines. A Bridge for Parallel Computation. Von Neumann model Designed to insulate hardware and software - PowerPoint PPT Presentation
Citation preview
Bulk Synchronous Parallel Processing ModelBulk Synchronous Parallel Processing Model
Jamie PerkinsJamie Perkins
Overview Overview
Four W’s – Who, What, When and Why
Goals for BSP
BSP Design and Program
Cost Functions
Languages and Machines
Four W’s – Who, What, When and Why
Goals for BSP
BSP Design and Program
Cost Functions
Languages and Machines
A Bridge for Parallel ComputationA Bridge for Parallel Computation
Von Neumann modelDesigned to insulate hardware and
software
BSP model (Bulk Synchronous Parallel)Proposed by Leslie Valiant of Harvard
University in 1990Developed by W.F. McColl of OxfordDesigned to be a “bridge” for parallel
computation
Von Neumann modelDesigned to insulate hardware and
software
BSP model (Bulk Synchronous Parallel)Proposed by Leslie Valiant of Harvard
University in 1990Developed by W.F. McColl of OxfordDesigned to be a “bridge” for parallel
computation
Goals for BSPGoals for BSP
Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors
Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture
Predictability – performance of SW on different architecture must be predictable in a straight forward way
Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors
Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture
Predictability – performance of SW on different architecture must be predictable in a straight forward way
BSP DesignBSP Design
Three ComponentsNode
Processor and Local MemoryRouter or Communication Network
Message Passing or Point-to-Point communication
Barrier or Synchronization MechanismImplemented in hardware
Three ComponentsNode
Processor and Local MemoryRouter or Communication Network
Message Passing or Point-to-Point communication
Barrier or Synchronization MechanismImplemented in hardware
BSP Design BSP Design
Fixed memory architectureHashing to allocate memory in “random”
fashion
Fast access to local memory
Uniformly slow access to remote memory
Fixed memory architectureHashing to allocate memory in “random”
fashion
Fast access to local memory
Uniformly slow access to remote memory
Illustration of BSP ComputerIllustration of BSP Computer
Communication Network
P M P M P M
Node Node Node
Barrier
http://peace.snu.ac.kr/courses/parallelprocessing/
BSP ProgramBSP Program
Composed of S supersteps
Superstep consists of:A computation where each processor
uses only locally held valuesA global message transmission from
each processor to any subset of the others
A barrier synchronization
Composed of S supersteps
Superstep consists of:A computation where each processor
uses only locally held valuesA global message transmission from
each processor to any subset of the others
A barrier synchronization
Strategies for programming on BSPStrategies for programming on BSP
Balance the computation between processes
Balance the communication between processes
Minimize the number of supersteps
Balance the computation between processes
Balance the communication between processes
Minimize the number of supersteps
BSP ProgramBSP Program
Superstep 1
Superstep 2Barrier
P1 P2 P3 P4
Computation
Communication
http://peace.snu.ac.kr/courses/parallelprocessing/
Advantages of BSPAdvantages of BSP
Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness)
Synchronization allows automatic optimization of the communication pattern
BSP model provides a simple cost function for analyzing the complexity of algorithms
Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness)
Synchronization allows automatic optimization of the communication pattern
BSP model provides a simple cost function for analyzing the complexity of algorithms
Cost FunctionCost Function
g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one
superstep w – largest amount of work performed (per
processor) h – largest number of packets sent or received
wi + ghi + L = execution time for the
superstep i
g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one
superstep w – largest amount of work performed (per
processor) h – largest number of packets sent or received
wi + ghi + L = execution time for the
superstep i
Languages & MachinesLanguages & Machines
BSP ++CC++FortranJBSPOpal
BSP ++CC++FortranJBSPOpal
IBM SP1SGI Power
Challenge(Shared Memory)
Cray T3DHitachi SR2001TCP/IP
IBM SP1SGI Power
Challenge(Shared Memory)
Cray T3DHitachi SR2001TCP/IP
Thank YouThank You
Any QuestionsAny Questions
ReferencesReferences
http://peace.snu.ac.kr/courses/parallelprocessing/ http://wwwcs.uni-paderborn.de/fachbereich/AG/
agmad http://www.cs.mu.oz.au/677/notes/node41.html McColl, W.F. The BSP Approach to Architecture
Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec. 1994
United States Patent 5083265 Valiant, L.G. A Bridging Model for Parallel
Computation. Communications of the ACM 33,8 (1990), 103-111.
http://peace.snu.ac.kr/courses/parallelprocessing/ http://wwwcs.uni-paderborn.de/fachbereich/AG/
agmad http://www.cs.mu.oz.au/677/notes/node41.html McColl, W.F. The BSP Approach to Architecture
Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec. 1994
United States Patent 5083265 Valiant, L.G. A Bridging Model for Parallel
Computation. Communications of the ACM 33,8 (1990), 103-111.
Recommended