View
215
Download
0
Category
Tags:
Preview:
Citation preview
112/04/21 Part I 1
Models of Parallel Processing
112/04/21 Part I 2
• Parallel processors come in many different varieties.
• Thus, we often deal with abstract models of real machines.
112/04/21 Part I 3
Development of Early Models (1)
• Associative processing (AP) was perhaps the earliest form of parallel processing. – Associative or content-addressable memories (AMs, CAMs),
which allow memory cells to be accessed based on contents rather than their physical locations within the memory array.
– AMI AP architectures are essentially based on incorporating simple processing logic into the memory array so as to remove the need for transferring large volumes of data through the limited-bandwidth interface between the memory and the processor (the von Neumann bottleneck)
112/04/21 Part I 4
Development of Early Models (2)
• the AM/AP model has evolved through the incorporation of additional capabilities, so that it is in essence converging with SIMD-type array processors.
112/04/21 Part I 5
Development of Early Models (3)
• neural networks
• Cellular automata
112/04/21 Part I 6
112/04/21 Part I 7
112/04/21 Part I 8
SIMD Vs. MIMD (1)
• Most early parallel machines had SIMD designs.
• Within the SIMD category, two fundamental design choices exist: – Synchronous versus asynchronous SIMD
• A possible cure is to use the asynchronous version of SIMD, known as SPMD
– Custom- versus commodity-chip SIMD
112/04/21 Part I 9
SIMD Vs. MIMD (2)
• In the 1990s, the MIMD paradigm has become more popular recently.
• MIMD machines are most effective for medium- to coarse-grain parallel applications, where the computation is divided into relatively large subcomputations or tasks whose executions are assigned to the various processors.
112/04/21 Part I 10
SIMD Vs. MIMD (3)
• Within the MIMD class, three fundamental issues or design choices are subjects of ongoing debates in the research community. – MPP-massively or moderately parallel processor
• Is it more cost-effective to build a parallel processor out of a relatively small number of powerful processors or a massive number of very simple processors
– Tightly versus loosely coupled MIMD• network of workstations (NOW), cluster computing, Grid
Computing
– Explicit message passing versus virtual shared memory
112/04/21 Part I 11
Global Vs. Distributed Memory (1)
• Within the MIMD class of paranel processors, memory can be global or distributed.
• Global memory may be visualized as being in a central location where all processors can access it with equal ease.
• memory latency-hiding techniques must be employed. An example of such methods is the use of multithreading.
112/04/21 Part I 12
112/04/21 Part I 13
Global Vs. Distributed Memory (2)
• Examples for both the processor-to-memory and processor-to-processor networks include:
• an abstract model of global-memory computers, known as PRAM.
• One approach to reducing the amount of data that must pass through the processor-to memory interconnection network is to use a private cache memory. (locality of data access, cache coherence problem)
112/04/21 Part I 14
112/04/21 Part I 15
Global Vs. Distributed Memory (3)
• Distributed-memory architectures can be conceptually viewed as in Fig. 4.5.
• In addition to the types of interconnection networks enumerated for shared-memory parallel processors, distributed-memory MIMD architectures can also be interconnected by a variety of direct networks. (as nonuniform memory access (NUMA) architectures)
112/04/21 Part I 16
112/04/21 Part I 17
PRAM Shared-Memory Model (1)
• The theoretical model used for conventional or sequential computers (SISD class) is known as the random-access machine (RAM)
• The parallel version of RAM (PRAM), constitutes an abstract model of the class of global-memory parallel processors. The abstraction consists of ignoring the details of the processor-to-memory interconnection network and taking the view that each processor can access any memory location in each machine cycle, independent of what other processors are doing.
112/04/21 Part I 18
112/04/21 Part I 19
PRAM Shared-Memory Model (2)
• In the formal PRAM model, a single processor is assumed to be active initially. In each computation step, each active processor can read from and write into the shared memory and can also activate another processor.
• Even though the global-memory architecture was introduced as a subclass of the MIMD class, the abstract PRAM model depicted in Fig. 4.6 can be SIMD or MIMD.
112/04/21 Part I 20
112/04/21 Part I 21
PRAM Shared-Memory Model (3)
• This implies that each instruction cycle would have to consume Ω(log p) real time.
• The above point is important when we try to compare PRAM algorithms with those for distributed-memory models. An O(log p)-step PRAM algorithm may not be faster than an O(1og2 p)-step algorithm for a hypercube architecture.
112/04/21 Part I 22
Distributed-Memory or Graph Models (1)
• Given the internal processor and memory structures in each node, a distributed-memory architecture is characterized primarily by the network used to interconnect the nodes.
• This network is usually represented as a graph.
• Important parameters of an interconnec tion network include– Network diameter: the longest of the shortest paths between various pairs
of nodes – Bisection (band)width: the smallest number (total capacity) of links that
need to be cut in order to divide the network into two subnetworks of half the size.
– Vertex or node degree: the number of communication ports required of each node
112/04/21 Part I 23
112/04/21 Part I 24
112/04/21 Part I 25
Distributed-Memory or Graph Models (2)
• Even though the distributed-memory architecture was introduced as a subclass of the MIMD class, machines based on networks of the type shown in Fig. 4.8 can be SIMD- or MIMD-type.
• Fig. 4.9 are available for reducing bus traffic by taking advantage of the locality of communication within small clusters of processors.
112/04/21 Part I 26
Recommended