24
Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour.

Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Embed Size (px)

Citation preview

Page 1: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Parallel Processing

I’ve gotta spend at least 10 hours studying for the

IT 344 final!I’m going to study with 9 friends… we’ll be done

in an hour.

Page 2: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Next up: TIPS

• Mega- = 106, Giga- = 109, Tera- = 1012, Peta- = 1015

• BOPS, anyone?

• Light travels about 1 ft / 10-9 secs in free space. •A Tera-Hertz uniprocessor could have no clock-to-clock path longer than 300 microns…

•We already know of problems that require greater than a TIP (Simulations of weather, weapons, brains)

Page 3: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Solution: Parallelism

• Pipelining – reasonable for a small number of stages (5-10), after that bypassing and stalls become unmanageable.

• Superscalar – replicate data paths and design control logic to discover parallelism in traditional programs.

• Explicit parallelism – must learn how to write programs that run on multiple CPUs.

Page 4: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Pipelining

Page 5: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Superscalar – How far can it go? Multiple functional units (ALUs, Addr, Floating point, etc.)

Instruction dispatch

Dynamic scheduling

Pipelines

Speculative execution

Page 6: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Explicit Parallelism

• Distributed– Transaction-oriented– Geographically dispersed locations– E.g. SETI@home

• Parallel– Single goal computing– Computing intense and/or data-intense– High-speed data exchange

• Often on custom hardware

– E.g. Geochemical surveys

Page 7: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Challenges

• For distributed processing, parallelism is given and usually cannot easily change. Programming is relatively easy.

• For parallel processing, the programmer defines parallelism by partitioning the serial program(s). Parallel programming in general is more difficult than transaction applications.

Page 8: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Other vocabulary

• Decomposition– The way that a program can be broken up for

parallel processing

• Course-grain– Breaks into big chunks (fewer processors)– SMP– Distributed (often)

• Fine-grain– Breaks into small chunks (more processors)– Image processing

Page 9: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Inter-processor communications

Loosely-coupled

Tightly-coupled

Distributed processors

Beowulf clusters

Custom supercomputers

Page 10: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

More Terminology

1. SIMD (Single Instruction Multiple Data)

2. MIMD (Multiple Instruction Multiple Data)

3. MISD (Pipeline)

Page 11: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

SIMD• Same instruction

executed in multiple units, on different data

• Examples: Vector processors, AltiVec

I

I

I

I

D1

D2

D3

D4

Page 12: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

MIMD

• Each unit does own instruction on own text

• Examples: Mercury, Beowulf, etc.

I1

I2

I3

I4

D1

D2

D3

D4

Page 13: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

MISD (pipeline)

I1 I2 I3 I4D1D2D3D4

Page 14: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Distributed Programming Tools

•C/C++ with TCP/IP

•Perl with TCP/IP

•Java

•Corba

•ASP

•.Net

Page 15: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Parallel Programming Tools

• PVM

• MPI

• Synergy

• Others (proprietary hardware)

Page 16: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Parallel Programming Difficulties

• Program partition and allocation

• Data partition and allocation

• Program(process) synchronization

• Data access mutual exclusion

• Dependencies

• Process(or) failures

• Scalability…

Page 17: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Software techniques

• Shared Memory Buffers — Areas of memory that any node can read or write

• Sockets — Provide full-duplex message passing between processes.

• Semaphores and Spinlocks — Provide locking and synchronization functions

• Mailbox Interrupts — Provide an interrupt-driven communication mechanism

• Direct Memory Access — Provides asynchronous shared memory bufferI/O.

Page 18: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Hardware configurations – Interconnects and Memory

Page 19: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Interconnects

Page 20: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Crossbar

Page 21: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Mesh

Page 22: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Interconnects

Page 23: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

What it really looks like

Note: this computer would rank well on www.top500.org

Page 24: Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour

Summary

• Prospects for future CPU architectures:– Pipelining - Well understood, but mined-out– Superscalar - Nearing its practical limits– SIMD - Limited use for special applications– VLIW - Returns controls to S/W. The future?

• Prospects for future Computer System architectures:– SMP - Limited scalability. Harder than it appears.– MIMD/message-passing - It’s been the future for over

20 years now. How to program?