Upload
vignesh-mohanraj
View
227
Download
0
Embed Size (px)
Citation preview
7/26/2019 UNIT 4 MPP
http://slidepdf.com/reader/full/unit-4-mpp 1/8
Distributed Computing Massively Parallel Processing
Page 1 of 8
DSU, M. Tech CS & IT
UNIT-4
MASSIVELY PARALLEL PROCESSING
What is parallel processing?
Processing of multiple tasks Simultaneously on multiple processors is called parallel
processing
The parallel program consists of multiple active processes (tasks) simultaneously.
A given task is divided into multiple subtasks using a divide-and-conquer technique, and
each sub task is processed on a different Processors.
Programming on a multiprocessor system using the divide-and-conquer technique is called
parallel programming.
Many applications today require more computing power than a traditional sequential
computer can offer. Parallel processing provides a cost-effective solution to this problem by
increasing the number of CPUs in a computer and by adding an efficient communication
system between them.
Need for Parallel Processing?
Computational requirements are ever increasing in the areas of both scientific and business
computing. The technical computing problems, which require high-speed computational power, are related to life sciences, aerospace, geographical information systems,
mechanical design and analysis, and the like.
Sequential architectures are reaching physical limitations as they are constrained by the
speed of light and thermodynamics laws. The speed at which sequential CPUs can operate
is reaching saturation point and hence an alternative way to get high computational speed is
to connect multiple.
Vector processing works well for scientific kinds of problems.
Hardware improvements in pipelining, superscalar, and the like are non-scalable and
require sophisticated compiler technology.
Significant development in networking technology is paving the way for heterogeneous
computing.
Parallel processing is mature and can be exploited commercially.
7/26/2019 UNIT 4 MPP
http://slidepdf.com/reader/full/unit-4-mpp 2/8
Distributed Computing Massively Parallel Processing
Page 2 of 8
DSU, M. Tech CS & IT
Definition of MPP
MPP (massively parallel processing) is the coordinated processing of a program by multiple
processor s that work on different parts of the program, with each processor using its own
operating and memory
FLYNN’S CLASSIFICATION
The core elements of parallel processing are CPUs. Based on the number of instruction and data
streams that can be processed simultaneously, computing systems are classified into the following
four categories:
Single-instruction, single-data (SISD) systems
Single-instruction, multiple-data (SIMD) systems
Multiple-instruction, single-data (MISD) systems
Multiple-instruction, multiple-data (MIMD) systems.
Single-instruction, single-data (SISD) systems
An SISD computing system is a uniprocessor machine capable of executing a single
instruction, which operates on a single data stream,
In SISD, machine instructions are processed sequentially; hence computers adopting
this model are popularly called sequential computing.
Most conventional computers are built using the SISD model. All the instructions
and data to be processed have to be stored in primary memory.
The speed of the processing element in the SISD model is limited by the rate at
which the computer can transfer information internally.
Dominant representative SISD systems are IBM PC, Macintosh, and workstations.
7/26/2019 UNIT 4 MPP
http://slidepdf.com/reader/full/unit-4-mpp 3/8
Distributed Computing Massively Parallel Processing
Page 3 of 8
DSU, M. Tech CS & IT
Single-instruction, multiple-data (SIMD) systems
An SIMD computing system is a multiprocessor machine capable of executing the same
instruction on all the CPUs but operating on different data streams.
Machines based on an SIMD model are well suited to scientific computing since they
involve lots of vector and matrix operations.
For instance, statements such as Ci=Ai * Bi can be passed to all the processing elements
(PEs); organized data elements of vectors A and B can be divided into multiple sets (N-sets
for N PE systems); and each PE can process one dataset.
Multiple-instruction, single-data (MISD) systems
An MISD computing system is a multiprocessor machine capable of executing different
instructions on different PEs but all of them operating on the same data set.
For instance, statements such as y = sin(x) + cos (x) + tan (x) perform different operations
on the same data set.
Machines built using the MISD model are not useful in most of the applications; a few
machines are built, but none of them are available commercially
7/26/2019 UNIT 4 MPP
http://slidepdf.com/reader/full/unit-4-mpp 4/8
Distributed Computing Massively Parallel Processing
Page 4 of 8
DSU, M. Tech CS & IT
Multiple-instruction, multiple-data (MIMD) systems
An MIMD computing system is a multiprocessor machine capable of executing multiple
instructions on multiple data sets.
Each PE in the MIMD model has separate instruction and data streams; hence machines
built using this model are well suited to any kind of application.
Unlike SIMD and MISD machines, PEs in MIMD machines work asynchronously.
MIMD machines are broadly categorized into shared-memory MIMD and distributed-
memory MIMD based on the way PEs are coupled to the main memory.
Shared memory MIMD machines
In the shared memory MIMD model, all the Pes are connected to a single global memory
and they all have access to it. Systems based on this model are also called tightly coupled
multiprocessor systems.
The communication between PEs in this model takes place through the shared memory;
modification of the data stored in the global memory by one PE is visible to all other PEs.
Distributed memory MIMD machines
In the distributed memory MIMD model, all Pes have a local memory. Systems based
on this model are also called loosely coupled multiprocessor systems.
The communication between PEs in this model takes place through the interconnection
network.
7/26/2019 UNIT 4 MPP
http://slidepdf.com/reader/full/unit-4-mpp 5/8
Distributed Computing Massively Parallel Processing
Page 5 of 8
DSU, M. Tech CS & IT
MASSIVELY PARALLEL PROCESSING VERSUS MAP REDUCE
Massively parallel processing is not the only technology available to facilitate the
processing of large volumes of data.
MapReduce, a part of the Apache Hadoop Project, is another technology that accomplishes
the same things MPP does, but with some differences.
PERFORMANCE:
The optimization and distribution components of MPP allow it to manage the distribution of
data among the separate nodes. This speeds up processing time considerably, making it a
better performance choice over MapReduce. Also, MPP databases conform to the ACID.
SCALIBILITY
The highly specialized hardware used by MPP systems make scalability a difficult and
costly proposition.
MapReduce and Hadoop, however, can be deployed to inexpensive commodity servers,
allowing the clusters of nodes to grow as needed.
DEPLOYMENT AND MAINTENANCE
MPP is generally easy to deploy and maintain.
Hadoop and MapReduce, on the other hand, can turn out to be a major implementation
project requiring expensive and specialized expertise that may not be available in-house.
7/26/2019 UNIT 4 MPP
http://slidepdf.com/reader/full/unit-4-mpp 6/8
Distributed Computing Massively Parallel Processing
Page 6 of 8
DSU, M. Tech CS & IT
DATA RESTRICTIONS
Unlike MPP, MapReduce can handle unstructured data without the need to pre-process it.
LANGUAGE
The language behind MapReduce’s control mechanism is primarily Java. MPP uses SQL,
making it easier to use and more cost effective.
SQL-based tools are supported with MPP.
SCATTER AND GATHER
Gather and scatter are two fundamental data-parallel operations, where a large number of
data items are read (gathered) from or are written (scattered) to given locations. Scatter and gather operations are two fundamental operations in many scientific and
enterprise computing applications. These operations are implemented as native collective
operations in message passing interfaces (MPI) to define communication patterns across the
processors
Gather and scatter are dual operations. A scatter performs indexed writes to an array, and a
gather performs indexed reads from an array. We define the two operations in Figure .
The array L for the scatter contains distinct write locations for each Rin tuple, and that for
the gather the read locations for each Rout tuple. Essentially, the scatter consists of
sequential reads and random writes
The Message Passing Interface (MPI) is a library of subroutines (in Fortran) or function
calls (in C) that can be used to implement a message-passing program.
MPI allows the coordination of a program running as multiple processes in a distributed-
memory environment, yet it is flexible enough to also be used in a shared-memory
environment. MPI programs can be used and compiled on a wide variety of single platforms
or (homogeneous or heterogeneous) clusters of computers over a network.
7/26/2019 UNIT 4 MPP
http://slidepdf.com/reader/full/unit-4-mpp 7/8
Distributed Computing Massively Parallel Processing
Page 7 of 8
DSU, M. Tech CS & IT
MPI provides the following routines for collective communication:
MPI_Bcast() – Broadcast (one to all)
MPI_Reduce() – Reduction (all to one)
MPI_Allreduce() – Reduction (all to all)
MPI_Scatter() – Distribute data (one to all)
MPI_Gather() – Collect data (all to one)
MPI_Alltoall() – Distribute data (all to all)
MPI_Allgather() – Collect data (all to all)
Example:
We illustrate the collective communication commands to scatter data and gather results.
Point-to-point communication happens via a send and a recv (receive) command.
Consider the addition of 100 numbers on a distributed memory 4-processor computer.
A parallel algorithm to sum 100 numbers proceeds in four stages:
o distribute 100 numbers evenly among the 4 processors;
o
Every processor sums 25 numbers;
o Collect the 4 sums to the manager node; and
o Add the 4 sums and print the result
o Scattering an array of 100 number over 4 processors and gathering the partial sums
at the 4 processors to the root is displayed in Figure 1.
The scatter and gather are of the collective communication type, as every process in the
universe participates in this operation.
The MPI commands to scatter and gather are respectively MPI Scatter and MPI Gather.
The specifications of the MPI command to scatter data from one member to all members of
a group are described in Table 1. Table 2 contains the specifications of the MPI command
to gather data from all members to one member in a group.
7/26/2019 UNIT 4 MPP
http://slidepdf.com/reader/full/unit-4-mpp 8/8
Distributed Computing Massively Parallel Processing
Page 8 of 8
DSU, M. Tech CS & IT
Note: This material has been circulated on self-risk. Nobody can be held responsible if anything is wrong or if improper
information or insufficient information provided in it. Please give me feedback @ [email protected]