-1- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Parallel Computer...

-1- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Parallel Computer Architectures

2nd weekReferencesFlynn’s Taxonomy Classification of Parallel Computers

Based on ArchitecturesTrend in Parallel Computer

Architectures

REFERENCES

Scalable Parallel Computing– Kai Hwang and Zhiwei Xu, McGraw-Hill

Chapter.1Parallel Computing: Theory and Practice

– Michael J. Quinn, McGraw-HillChapter 3

Parallel Processing Course– Yang-Suk Kee(yskee@iris.snu.ac.kr)

School of EECS, Seoul National University

FLYNN’S TAXONOMY

A way to classify parallel computers (Flynn,1972)

Based on notions of instruction and data streams– SISD (Single Instruction stream over a Single Data stream )– SIMD (Single Instruction stream over Multiple Data streams )– MISD (Multiple Instruction streams over a Single Data

stream)– MIMD (Multiple Instruction streams over Multiple Data

stream) Popularity

– MIMD > SIMD > MISD

SISD (Single Instruction Stream Over A Single Data

Stream )

CU PU MU

IS DSI/O

IS : Instruction Stream DS : Data StreamCU : Control Unit PU : Processing UnitMU : Memory Unit

SISD – Conventional sequential machines

SIMD (Single Instruction Stream Over Multiple Data

Streams )SIMD

– Vector computers– Special purpose computations

PE1 LM1

PEn LMn

Program loaded from host

Data sets loaded from host

SIMD architecture with distributed memory

PE : Processing Element LM : Local Memory

MISD – Processor arrays, systolic arrays– Special purpose computations

Is it different from pipeline computer?

MISD (Multiple Instruction Streams Over A Single Data

Streams)

Memory(Program,

Data) PU1 PU2 PUn

CU1 CU2 CUn

DS DS DSIS IS

MISD architecture (the systolic array)

MIMD (Multiple Instruction Streams Over Multiple Data

Stream)MIMD

– General purpose parallel computers

CU1 PU1

SharedMemory

IS DSI/O

CUn PUnIS DSI/O

MIMD architecture with shared memory

CLASSIFICATION BASED ON ARCHITECTURES

Pipelined ComputersDataflow ArchitecturesData Parallel SystemsMultiprocessorsMulticomputers

Instructions are divided into a number of steps (segments, stages)

At the same time, several instructions can be loaded in the machine and be executed in different steps

PIPELINED COMPUTERS

Instruction i IF ID EX WB

IF ID EX WB

Instruction i+1

Instruction i+2

Instruction i+3

Instruction i+4

Instruction # 1 2 3 4 5 6 7 8

Cycles

DATAFLOW ARCHITECTURES

Data-driven model– A program is represented as a directed acyclic graph in

which a node represents an instruction and an edge represents the data dependency relationship between the connected nodes

– Firing rule: A node can be scheduled for execution if and only if its input data become valid for consumption

Dataflow languages– Id, SISAL, Silage, ...– Single assignment, applicative(functional) language, no

side effect– Explicit parallelism

DATAFLOW GRAPH

z = (a + b) * c+

The dataflow representation of an arithmetic expression

DATA-FLOW COMPUTER

Execution of instructions is driven by data availability

Basic components– Data are directly held inside instructions– Data availability check unit– Token matching unit– Chain reaction of asynchronous instruction

executionsWhat is the difference between this and

normal (control flow) computers?

DATAFLOW COMPUTERS

Advantages– Very high potential for parallelism– High throughput – Free from side-effect

Disadvantages– Time lost waiting for unneeded arguments– High control overhead– Difficult in manipulating data structures

Dataflow Machine (Manchester Dataflow

Computer)

From host

To host

First actual hardware implementation

I/OSwitch

MatchingUnit

OverflowUnit

TokenQueue

InstructionStore

func1 funck

Network

Token<data, tag, dest, marker>

Match<tag, dest>

DATAFLOW REPRESENTATION

input d,e,f c0 = 0

for i from 1 to 4 do begin ai := di / ei

bi := ai * fi

ci := bi + ci-1

output a, b, c

c1 c2 c3

EXECUTION ON CONTROL FLOW MACHINES

a1 b1 c1 a2 b2 c2 a4 b4 c4

Sequential execution on a uniprocessor in 24 cycles

Assume all the external inputs are available before entering do loop+ : 1 cycle, * : 2 cycles, / : 3 cycles,

How long will it take to execute this program on a dataflow computer with 4 processors?

EXECUTION ON A DATAFLOW MACHINE

c1 c2 c3 c4a1

Data-driven execution on a 4-processor dataflow computer in 9 cycles

Can we further reduce the execution time of this program ?

PROBLEMS OF DATAFLOW COMPUTERS

Excessive copying of large data structures in dataflow operations– I-structure : a tagged memory unit for

overlapped usage by the producer and consumer

Retreat from pure dataflow approach (shared memory)

Handling complex data structures

Chain reaction control is difficult to implement– Complexity of matching store and memory

units Expose too much parallelism (?)

DATA PARALLEL SYSTEMS

Programming model – Operations performed in parallel

on each element of data structure– Logically single thread of control,

performs sequential or parallel steps

– Conceptually, a processor associated with each data element

DATA PARALLEL SYSTEMS (cont’d)

SIMD Architectural model– Array of many simple, cheap processors

with little memory eachProcessors don’t sequence through

instructions– Attached to a control processor that

issues instructions– Specialized and general communication,

cheap global synchronization

VECTOR PROCESSOR

Instruction set includes operation on vectors as well as scalars

2 types of vector computers– Processor arrays– Pipelined vector processors

PROCESSOR ARRAY

A sequential computer connected with a set of identical processing elements simultaneouls doing the same operation on different data. Eg CM-200

Data path

Processing element Data

memory

…Processor array

Instruction path

Program and Data Memory

I/O processor

Front-end computer

PIPELINED VECTOR PROCESSOR

Stream vector from memory to the CPU

Use pipelined arithmetic units to manipulate data

Eg: Cray-1, Cyber-205

MULTIPROCESSOR

Consists of many fully programmable processors each capable of executing its own program

Shared Address Space ArchitectureClassified into 2 types

– Uniform Memory Access (UMA) Multiprocessors

– Non-Uniform Memory Access (NUMA) Multiprocessors

SHARED ADDRESS SPACE ARCHITECTURE

Virtual address spaces for a collection of processes communicating via shared addresses

Machine physical address space

Shared portion of address space

Private portion of address space

Pn private

Common physical addresses

P2 private

P1 private

P0 private

Uses a central switching mechanism to reach a centralized shared memory

All processors have equal access time to global memory

Tightly coupled system Problem: cache consistency

UMA MULTIPROCESSOR

Switching mechanism

P2 … Pn

Memory banks

C1 C2 CnPi

Processor i

Cache i

UMA MULTIPROCESSOR (cont’d)

Crossbar switching mechanism

I/OCache

Shared- Bus switching mechanism

Mem Mem Mem Mem

I/OCache

Packet-switched network

Network

Mem Mem

NUMA MULTIPROCESSOR

Distributed shared memory combined by local memory of all processors

Memory access time depends on whether it is local to the processor

Caching shared (particularly nonlocal) data?

Network

Mem Cache

Distributed Memory

CURRENT TYPES OF MULTIPROCESSORS

PVP (Parallel Vector Processor)– A small number of proprietary vector processors

connected by a high-bandwidth crossbar switch SMP (Symmetric Multiprocessor)

– A small number of COTS microprocessors connected by a high-speed bus or crossbar switch

DSM (Distributed Shared Memory)– Similar to SMP– The memory is physically distributed among

nodes.

PVP (Parallel Vector Processor)

Crossbar Switch

SM SM SM

VP : Vector Processor SM : Shared Memory

SMP (Symmetric Multi-Processor)

P/C P/C

Bus or Crossbar Switch

SM SM SM

P/C : Microprocessor and Cache

DSM (Distributed Shared Memory)

Custom-Designed Network

DIR : Cache Directory

MULTICOMPUTERS

Consists of many processors with their own memory No shared memory Processors interact via message passing loosely coupled

system

Message-Passing Interconnection

Network

CURRENT TYPES OF MULTICOMPUTERS

MPP (Massively Parallel Processing)– Total number of processors > 1000

Cluster– Each node in system has less than 16

processors.Constellation

– Each node in system has more than 16 processors

MPP (Massively Parallel Processing)

Custom-Designed Network

MB : Memory Bus NIC : Network Interface Circuitry

Cluster

Commodity Network (Ethernet, ATM, Myrinet, VIA)

Bridge Bridge

IOB IOB

LD : Local Disk IOB : I/O Bus

CONSTELLATION

P/CP/C

SM SMNICLD

Custom or Commodity Network

P/CP/C

SM SMNICLD

IOC : I/O Controller

Trend in Parallel Computer Architectures

1997 1998 1999 2000 2001 2002 Years

MPPs Constellations Clusters SMPs

TOWARD ARCHITECTURAL CONVERGENCE

Evolution and role of software have blurred boundary– Send/recv supported on SAS machine via buffers– Can construct global address space on MP

Hardware organization converging too.– Tighter NI integration for MP (low latency, higher bandwidth)– At lower level, hardware SAS passes hardware messages.– Nodes connected by general network and communication

assists. Cluster of workstations/SMPs

– Emergence of fast SAN (System Area Networks)

Converged Architecture of Current

Supercomputers

Interconnection NetworkInterconnection Network

MemoryMemory

PP PP PP PP

MemoryMemory

PP PP PP PP

MemoryMemory

PP PP PP PP

MemoryMemory

PP PP PP PP

MultiprocessorsMultiprocessorsMultiprocessorsMultiprocessors

-1- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Parallel Computer...

Documents

Văn phong khoa học trong Y khoa - tailieuykhoamienphi.com

ParallelProcessing MT06 11 Basic Parallel Algorithmsthai/slides/ParallelProcessing_MT06_11_Basic...Khoa Khoa Học& K ỹThuậtMáyTính–Trường ĐạiHọcBchá KhoaTP. HCM -3-Introduction

Xin chào Bác sĩ - ia-ibaraki.or.jp · ----Khoa tiêu hóa (Khoa dạ dày đường ruột), khoa hô hấp Bị thương, vết đứt ----Ngoại khoa, khoa phẫu thuật thẩm

Tröôøng Ñaïi Hoïc Baùch Khoa Tp - cse.hcmut.edu.vnhungnq/courses/nap/CHUONG0-ReviewTCPIP.pdf · – Hãy suy nghĩđịachỉIP nh ư ... (24 bits dành cho địachỉmạng,

ufm.edu.vn xet tuyen vien chuc 2015.pdf · Khoa Marketing Khoa Kê toán - Kiêm toán Khoa Tài chính - N ân hàn Khoa Công nghê thông tin Khoa Du lich Khoa co bån Khoa Giáo

BOÄ GIAÙO DUÏC VAØ ÑAØO TAÏO TRÖÔØNG ÑAÏI HOÏC BAÙCH KHOA HAØ NOÄI KHOA COÂNG NGHEÄ THOÂNG TIN

MergedFile - hucfl.edu.vncao hoc chú Khoa tiên Anh Khoa TACN Khoa tiên Phá Khoa tiên N a Khoa tiên Trun NN&VHHàn uoc Khoa NN&VH Nhât Bån Khoa Viêt Nam hoc Khoa uoc tê hoc

KHOA KHOA H C MÁY TÍNH

cntt.dlu.edu.vncntt.dlu.edu.vn/Resources/Docs/SubDomain/cntt/Ke hoach giang day nam... · Khoa Toán tin Khoa NV&VHH TÊN GIÅNG VIÊN Khoa LLCT Khoa GDTC Khoa Ngoai ngfr Khoa Luât

1 Programming Languages Implementation of Data Structures Cao Hoaøng Truï Khoa Coâng Ngheä Thoâng Tin Ñaïi Hoïc Baùch Khoa TP. HCM

ufm.edu.vn...Khoa Quån tri kinh doanh Khoa Marketing Khoa Thâm dinh giá - Kinh doanh bât dQng sån Khoa Thucng mai Khoa Du lich Khoa Kê toán - Kiêm toán Khoa Tài chính -

MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

TRÖÔØNG ÑH BAÙCH KHOA TP.HCM - Khoa CÔ KHÍ ÑEÀ THI …

benhvienlaokhoa.vnbenhvienlaokhoa.vn/sites/bvlk_d7/files/q_696_quy...Khoa TK — Alzheimer Khoa Khám bênh a a a a a Khoa Câ cúu dêt u Khoa TMCT-N Khoa Un buóu Khoa Nêi chun

ParallelProcessing KSTN2002 12 BasicParallelAlgorithmscse.hcmut.edu.vn/~nam/PP_Undergraduate/PP_BasicParallelAlgorithms.pdf · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch

bvdkkvyenminh.vnbvdkkvyenminh.vn/uploads/laws/images-1.pdfPhó truðng PK Khoa Nhi Khoa Truyên nhiêm Khoa Truyên nhiêm Khoa Y hQC Cô truyên Khoa Y hQC Cô truyen ... -o thành

TRÖÔØNG ÑAÏI HOÏC BAÙCH KHOA PHOØNG ÑAØO TAÏO bong/162_DS CHINH THUC HBKK.pdf1 1513902 Ngô Thị ánh Tuyết 030497 CK15KHD 100.00 8.67 15 70 8.94 5,400,000 174543422

Khoa Luan Tot Nghiep LuaLai NHGL Khoa 2011

TRÖÔØNG ÑH BAÙCH KHOA TP.HCM - Khoa CÔ KHÍ ÑEÀ THI …nguyenminhhai.weebly.com/uploads/1/0/0/4/10047170/... · Moät thanh hình truï coù ñöôøng kính = d 2,5.mm ,

TRÖÔØNG ÑAÏI HOÏC BAÙCH KHOA THAØNH PHOÀ HOÀ CHÍ MINH