92
August 23, 2006 Talk at SASTRA 1 By P.S.Dhekne, BARC [email protected] Parallel, Cluster and Grid Computing

August 23, 2006Talk at SASTRA1 By P.S.Dhekne, BARC [email protected] Parallel, Cluster and Grid Computing

Embed Size (px)

Citation preview

August 23, 2006 Talk at SASTRA 1

By

P.S.Dhekne, BARC

[email protected]

Parallel, Cluster and Grid Computing

August 23, 2006 Talk at SASTRA 2

High Performance Computing

• Branch of computing that deals with extremely powerful computers and the applications that use them

• Supercomputers: Fastest computer at any given point of time

• HPC Applications: Applications that cannot be solved by conventional computers in a reasonable amount of time

August 23, 2006 Talk at SASTRA 3

Supercomputers

• Characterized by very high speed, very large memory

• Speed measured in terms of number of floating point operations per second (FLOPS)

• Fastest Computer in the world: “Earth Simulator” (NEC, Japan) – 35 Tera Flops

• Memory in the order of hundreds of gigabytes or terabytes

August 23, 2006 Talk at SASTRA 4

HPC Technologies

• Different approaches for building supercomputers– Traditional : Build faster CPUs

• Special Semiconductor technology for increasing clock speed

• Advanced CPU architecture: Pipelining, Vector Processing, Multiple functional units etc.

– Parallel Processing : Harness large number of ordinary CPUs and divide the job between them

August 23, 2006 Talk at SASTRA 5

Traditional Supercomputers

• Eg: CRAY• Very complex architecture• Very high clock speed results in very high heat

dissipation and advanced cooling techniques (Liquid Freon / Liquid Nitrogen)

• Custom built or produced as per order• Extremely expensive• Advantages: Program development is

conventional and straight forward

August 23, 2006 Talk at SASTRA 6

Alternative to Supercomputer

• Parallel Computing: the use of multiple computers or processors working together on a single problem; harness large number of ordinary CPUs and divide the job between them– each processor works on its section of the

problem– processors are allowed to exchange

Sequential Parallel

1

10000

1

2500

2501

5000

5001

7500

7501

10000

cpu 1

cpu 2

cpu 3

cpu 4

• Big advantages of parallel computers: 1. total computing performance multiples of processors used2. total very large amount of memory to fit very large

programs3. Much lower cost and can be developed in India

information with other processors via fast interconnect path

August 23, 2006 Talk at SASTRA 7

Types of Parallel Computers

• The parallel computers are classified as

– shared memory

– distributed memory

• Both shared and distributed memory systems have:1. processors: now generally commodity processors

2. memory: now general commodity DRAM/DDR

3. network/interconnect: between the processors or memory

August 23, 2006 Talk at SASTRA 8

Interconnect MethodThere is no single way to connect bunch of processors• The manner in which the nodes are connected - Network

& Topology• Best choice would be a fully connected network (every

processor to every other). Unfeasible for cost and scaling reasons : Instead, processors are arranged in some variation of a grid, torus, tree, bus, mesh or hypercube.

3-d hypercube 2-d mesh 2-d torus

August 23, 2006 Talk at SASTRA 9

Block Diagrams …

Memory

Interconnection Network

P1 P2 P3 P4 P5

A Shared Memory Parallel Computer

Processors

August 23, 2006 Talk at SASTRA 10

Block Diagrams …

Interconnection Network

P1 P2 P3 P4 P5

M1 M2 M3 M4 M5

A Distributed Memory Parallel Computer

August 23, 2006 Talk at SASTRA 11

Performance Measurements

• Speed of a supercomputer is generally denoted in FLOPS (Floating Point Operations per second)– MegaFlops (MFLOPS), Million (106)FLOPS– GigaFLOPS (GFLOPS), Billion (109)FLOPS– TeraFLOPS (TFLOPS), Trillion (1012) FLOPS

August 23, 2006 Talk at SASTRA 12

Sequential vs. Parallel Programming

• Conventional programs are called sequential (or serial) programs since they run on one cpu only as in a conventional (or sequential) computer

• Parallel programs are written such that they get divided into multiple pieces, each running independently and concurrently on multiple cpus.

• Converting a sequential program to a parallel program is called parallelization.

August 23, 2006 Talk at SASTRA 13

Terms and Definitions

• Speedup of a parallel program:= Time taken on 1 cpus / Time taken on ‘n’ cpus

• Ideally Speedup should be ‘n’

August 23, 2006 Talk at SASTRA 14

Terms and Definitions

• Efficiency of a parallel program:= Speedup / No. of processors

• Ideally efficiency should be 1 (100 %)

August 23, 2006 Talk at SASTRA 15

Problem areas in parallel programs

• Practically, speedup is always less than ‘n’ and efficiency is always less than 100%

• Reason 1: Some portions of the program cannot be run in parallel (cannot be split)

• Reason 2: Data needs to be communicated among the cpus. This involves time for sending the data and time in waiting for the data

• The challenge in parallel programming is to suitably split the program into pieces such that speedup and efficiencies approach the maximum

August 23, 2006 Talk at SASTRA 16

Parallelism

• Property of an algorithm that lends itself amenable to parallelization

• Parts of the program that has inherent parallelism can be parallelized (divided into multiple independent pieces that can execute concurrently)

August 23, 2006 Talk at SASTRA 17

Types of parallelism

• Control parallelism (Algorithmic parallelism):– Different portions (or subroutines/functions) can

execute independently and concurrently

• Data parallelism – Data can be split up into multiple chunks and

processed independently and concurrently– Most scientific applications exhibit data

parallelism

August 23, 2006 Talk at SASTRA 18

Parallel Programming Models

• Different approaches are used in the development of parallel programs

• Shared Variable Model: Best suited for shared memory parallel computers

• Message Passing Model: Best suited for distributed memory parallel computers

August 23, 2006 Talk at SASTRA 19

Message Passing

• Most commonly used method of parallel programming

• Processes in a parallel program use messages to transfer data between themselves

• Also used to synchronize the activities of processes

• Typically consists of send/receive operations

August 23, 2006 Talk at SASTRA 20

In the absence of any standardization initial

parallel machines were designed with varied

architectures having different network

topologies

BARC started Supercomputing development

to meet computing demands of in-house users

with the aim to provide inexpensive high-end

computing since 1990-91 and have built

several models

How we started

August 23, 2006 Talk at SASTRA 21

Selection Of Main Components

• Architecture• Simple • Scalable• Processor Independent

• Inter Connecting Network• Scalable bandwidth• Architecture independent• Cost effective

• Parallel Software Environment• User friendly• Portable• Comprehensive Debugging tools

August 23, 2006 Talk at SASTRA 22

SINGLE CLUSTER OF ANUPAM

TC

X25

EthernetSCSI

other systems

Node 1 slave

Node 15 slave

Node 0 master

Node 2 slave

860/XP ,50MHZ 128 KB-512KB cache & 64-Mb-256 Mb memory

MULTIBUS II

TERMINALS

DISKS

August 23, 2006 Talk at SASTRA 23

0 1 15

wscsi

0 1 15

wscsi

0 1 15

wscsi

0 1 15

wscsi

64 NODE ANUPAM CONFIGURATIONY

Y

X X

MB II BUS

MB II BUS

MB II BUS

MB II BUS

August 23, 2006 Talk at SASTRA 24

8 NODE ANUPAM CONFIGURATION

25Talk at SASTRAAugust 23, 2006

ANUPAM APPLICATIONS

Finite Element Analysis

Pressure Contour in LCA Duct

64-Node ANUPAM

Protein Structures

3-D Plasma Simulations

August 23, 2006 Talk at SASTRA 26

0 5 10 15 200

2

4

6

8

10

12

14

CP

U T

ime

(Hou

rs)

No.of Processors

Problem Specification:

Cylindrical geometry

Radius = 8 km

Height= 8 km

No. of mesh points 80,000

No. of Energy groups 42

SN order 16

Conclusions:

Use of 10 processors of the BARC computer system reduces the run time by 6 times.

Simulations on BARC Computer System

2-D Atmospheric Transport Problem

Estimation of Neutron-Gamma Dose up to 8 km from the source

August 23, 2006 Talk at SASTRA 27

OTHER APPLICATIONS OF ANUPAM SYSTEM

* Protein Structure Optimization* Protein Structure Optimization* AB Initio Electronic Structure Calculations* AB Initio Electronic Structure Calculations* Neutron Transport Calculations* Neutron Transport Calculations* AB Initio Molecular Dynamics Simulations* AB Initio Molecular Dynamics Simulations* Computational Structure Analysis* Computational Structure Analysis* Computational Fluid Dynamics ( ADA, LCA)* Computational Fluid Dynamics ( ADA, LCA)* Computational Turbulent flow* Computational Turbulent flow* Simulation Studies in Gamma-Ray Astronomy* Simulation Studies in Gamma-Ray Astronomy* Finite Element Analysis of Structures* Finite Element Analysis of Structures* * Weather ForecastingWeather Forecasting

August 23, 2006 Talk at SASTRA 28

Key Benefits

• Simple to use

• ANUPAM uses user familiar Unix environment with

large memory & specially designed parallelizing tools

• No parallel language needed

• PSIM – parallel simulator runs on any Unix based

system

• Scalable and processor independent

August 23, 2006 Talk at SASTRA 29

Bus based architecture• Dynamic Interconnection network providing full

connectivity and high speed and TCP/IP support• Simple and general purpose industry back-plain bus• Easily available off-the-shelf, low cost• MultiBus, VME Bus, Futurebus … many solutions

Disadvantages• One communication at a time• Limited scalability of applications in bus based systems• Lengthy development cycle for specialized hardware• i860, Multibus-II reaching end of line, so radical change

in architecture was needed

August 23, 2006 Talk at SASTRA 30

Typical Computing, Memory & Device Attachment

CPUMemory

Bus

Memory

DeviceCard

Input/Output Bus

August 23, 2006 Talk at SASTRA 31

Cache

CPU

LocalMemory

RemoteMemory

SPEED SIZE COST/BIT

Memory Hierarchy

August 23, 2006 Talk at SASTRA 32

Ethernet: The Unibus of the 80s (UART of the 90s)

computeserver

printserver

fileserver

commserver

Clients

2Km

August 23, 2006 Talk at SASTRA 33

Ethernet: The Unibus of the 80s• Ethernet designed for

– DEC: Interconnect VAXen, terminals– Xerox: enable distributed computing (SUN Micro)

• Ethernet evolved into a hodge podge of nets and boxes

• Distributed computing was very hard, evolving into– expensive, assymmetric, hard to maintain,– client server for a VendorIX– apps are bound to a configuration & VendorIX!– network is NOT the computer

• Internet model is less hierarchical, more democratic

August 23, 2006 Talk at SASTRA 34

Networks of workstations (NOW)

• New concept in parallel computing and parallel computers

• Nodes are full-fledged workstations having cpu, memory, disks, OS etc.

• Interconnection through commodity networks like Ethernet, ATM, FDDI etc.

• Reduced Development Cycle, mostly restricted to software

• Switched Network topology

NODE-01NODE-01 NODE-02NODE-02 NODE-03NODE-03 NODE-04NODE-04 NODE-05NODE-05 NODE-06NODE-06 NODE-07NODE-07 NODE-08NODE-08

NODE-09NODE-09 NODE-10NODE-10 NODE-11NODE-11 NODE-12NODE-12 NODE-13NODE-13 NODE-14NODE-14 NODE-15NODE-15 NODE-16NODE-16

FILE SERVERFILE SERVERFAST ETHERNET SWITCH

UPLINKUPLINK

Typical ANUPAM x86 Cluster

CAT-5 CABLE

August 23, 2006 Talk at SASTRA 36

ANUPAM - Alpha

• Each node is a complete Alpha workstation with 21164 cpu, 256 MB memory, Digital UNIX OS etc.

• Interconnection thru ATM switch with fiber optic links @ 155 Mbps

August 23, 2006 Talk at SASTRA 37

PC Clusters : Multiple PCs• Over the last few years, computing power of Intel PCs

have gone up considerably (from 100 MHz to 3.2 GHz in 8 years) with fast, cheap network & disk (in built )

• Intel processors beating conventional RISC chips in performance

• PCs are freely available from several vendors• Emergence of free Linux as a robust, efficient OS with

plenty of applications• Linux clusters (use of multiple PCs) are now rapidly

gaining popularity in academic/research institutions because of low cost, high performance and availability of source code

August 23, 2006 Talk at SASTRA 38

Trends in ClusteringClustering is not a new idea, it has become affordable, can be build easily (plug&play) now. Even Small colleges have it.

August 23, 2006 Talk at SASTRA 39

LB Cluster - Network load distribution and LB

HA Cluster - Increase the Availability of systems

HPC Cluster (Scientific Cluster) - Computation-intensive

Web farms - Increase HTTP/SEC

Rendering Cluster – Increase Graphics speed

HPC : High Performance Computing HA : High Availability LB : Load Balancing

Cluster based Systems

Clustering is replacing all traditional Computing platforms and can be configured depending on the method and applied areas

`

August 23, 2006 Talk at SASTRA 40

Computing Trends• It is fully expected that the substantial and exponential

increases in performance of IT will continue for the foreseeable future ( at least next 50 years) in terms of– CPU Power ( 2X – every 18 months)

– Memory Capacity (2X – every 18 months)

– LAN/WAN speed (2X – every 9 months)

– Disk Capacity (2X – every 12 months)

• It is expected that all computing resources will continue to become cheaper and faster, though not necessarily faster than the computing problems we are trying to solve.

August 23, 2006 Talk at SASTRA 41

Processor Speed Comparison

1356810Intel Itanium-2,

900 MHz (64 bit)

6

776621Alpha, 1GHz (64 bit)5

571511Alpha, 833 MHz (64 bit)

4

840852Pentium-IV, 2.4 GHz3

591574Pentium-IV, 1.7 GHz2

191231Pentium-III, 550 MHz1

SPECfp_base 2000

SPECint_base 2000

ProcessorSr. No.

August 23, 2006 Talk at SASTRA 42

Technology Gaps

• Sheer CPU speed is not enough

• Matching of Processing speed, compiler performance,

cache size and speed, memory size and speed, disk size

and speed, and network size and speed, interconnect &

topology is also important

• Application and middleware software also adds to

performance degradation if not good

August 23, 2006 Talk at SASTRA 43

Interconnect-Related Terms• Most critical component of HPC still remains to be

interconnect technology and network topology

• Latency: – Networks: How long does it take to start sending a "message"?

Measured in microseconds- startup time

– Processors: How long does it take to output results of some operations, such as floating point add, divide etc., which are pipelined?)

• Bandwidth: What data rate can be sustained once the message is started? Measured in Mbytes/sec or Gbytes/sec

August 23, 2006 Talk at SASTRA 44

High Speed Networking

• Network bandwidth is improving– LAN are having 10, 100, 1000, 10000 Mbps

– WAN are based on ATM with 155, 622, 2500 Mbps

• With constant advances in Information & Communication technology

- Processors and Networks are merging into one infrastructure

- System Area or Storage Area Networks (Myrinet, Cray-link, Fiber channel, SCI etc): Low latency, High Bandwidth, Scalable to large numbers of nodes

August 23, 2006 Talk at SASTRA 45

Processor Interconnect Technology

6

2-102500Quadric Switch5

<400 nsec10,000InfiniBand4

1.5-142500SCI based Woulfkit3

1001000Gigabit Ethernet2

60100Fast Ethernet1

Latency time

Microseconds

Bandwidth

MBits/sec

Communication Technology

Sr. No.

10-G Ethernet 10,000 <100

August 23, 2006 Talk at SASTRA 46

Interconnect Comparison

Feature Fast Ethernet Gigabit SCI

Latency 88.07µs 44.93µs (16.88 µs) 5.55 µs (1.61 µs)

Bandwidth 11 Mbytes/Sec 90 Mbytes/Sec 250 Mbytes/Sec

Across Machines Two processes in a given machine

August 23, 2006 Talk at SASTRA 47

Switched Network Topology

• Interconnection Networks such as ATM, Ethernet etc. are available as switched networks

• Switch implements a dynamic interconnection network providing all-to-all connectivity on demand

• Switch allows multiple independent communications simultaneously

• Full duplex mode of communication

• Disadvantages: Single point of failure, Finite capacity• Disadvantages: Scalability, Cost for higher node count

August 23, 2006 Talk at SASTRA 48

Scalable Coherent Interface (SCI)

• High Bandwidth, Low latency SAN interconnect for

clusters of workstations (IEEE 1596)

• Standard for point to point links between computers

• Various topologies possible: Ring, Tree, Switched

Rings, Torus etc.

• Peak Bandwidth: 667 MB/s, Latency < 5 microseconds

August 23, 2006 Talk at SASTRA 49

ARUNA 2-D TORUS – LOGICAL SCHEMATIC

18

17

16

15

14

13

12

11

28

27

26

25

24

23

22

21

38

37

36

35

34

33

32

31

48

47

46

45

44

43

42

41

58

57

56

55

54

53

52

51

68

67

66

65

64

63

62

61

78

77

76

75

74

73

72

71

88

87

86

85

84

83

82

81Aruna16 Aruna12 Aruna08 Aruna04 Aruna02 Aruna06 Aruna10 Aruna14

Aruna15 Aruna11 Aruna07 Aruna03 Aruna01 Aruna05 Aruna09 Aruna13

Aruna31 Aruna27 Aruna23 Aruna19 Aruna17 Aruna21 Aruna25 Aruna29

Aruna47 Aruna43 Aruna39 Aruna35 Aruna33 Aruna37 Aruna41 Aruna45

Aruna63 Aruna59 Aruna55 Aruna51 Aruna49 Aruna53 Aruna57 Aruna61

Aruna64 Aruna60 Aruna56 Aruna52 Aruna50 Aruna54 Aruna58 Aruna62

Aruna48 Aruna44 Aruna40 Aruna36 Aruna34 Aruna38 Aruna42 Aruna46

Aruna32 Aruna28 Aruna24 Aruna20 Aruna18 Aruna22 Aruna26 Aruna30

1 2 3 4 5 6 7 8 X

Y

8

7

6

5

4

3

2

1

 

Gigabit Ethernet SCI

Total latency (s)

Latency within node (s)

Total Latency (s)

Latency within node (s)

44.93 16.88 5.55 1.61

Bandwidth Vs Packet Sizes - Gigabit and SCI

0

50

100

150

200

250

300

8 32 128 512 2K 8K 32K 128K 512K 2M

Packet sizes

Ban

dwid

th M

B/s

GbE MTU=1500 GbE MTU=3000 GbE MTU=4500GbE MTU=6000 GbE MTU=7500 GbE MTU=9000SCI

HPL over Gigabit and SCI

0

50

100

150

200

250

300

350

No. of processors

Gigabit

SCI

ANUPAM-Xeon Performance

August 23, 2006 Talk at SASTRA 50

No. Of nodes : 128Compute Node: Processor:Dual Intel Pentium Xeon @ 2.4 GHz

Memory :2 GB per nodeFile Server: Dual Intel based with RAID 5, 360 GBInterconnection networks: 64 bit Scalable Coherent Interface (2D Torus connectivity)Software: Linux, MPI, PVM, ANULIB

Anulib Tools: PSIM, FFLOW, SYN, PRE, S_TRACE

Benchmarked Performance: 362 GFLOPS for High Performance Linpack

ANUPAM P-Xeon Parallel Supercomputer

August 23, 2006 Talk at SASTRA 51

ANUPAM clusters

Sustained speed on 84 P-III processors: 15 GFLOPSYear of introduction :- 2001

Year: 2002, Sustained speed on 64 P-IV cpus : 72 GFLOPS

Sustained speed on 128 Xeon processors :- 365 GFLOPSYear of introduction :- 2003

ANU64

ASHVA

ARUNA

August 23, 2006 Talk at SASTRA 52

ANUPAM-Pentium

Date Model Node-Microprocessor Inter Comm. Mflops

Oct/98 4-node PII Pentium PII/266Mhz Ethernet/100 248

Mar/99 16-node PII Pentium PII/333Mhz Ethernet/100 1300

Mar/00 16 node PIII Pentium PIII/550 Mhz Gigabit Eth. 3500

May/01 84 node PIII Pentium PIII/650 Mhz Gigabit Eth. 15000

June/02 64 Node PIV Pentium PIV/1.7GHz Giga & SCI 72000

August/03 128 Node-Xeon Pentium/Xeon 2.4 Ghz Giga & SCI 362000

ANUPAM series of super computers after 1997

August 23, 2006 Talk at SASTRA 53

Table of comparison ( with precise ( 64 bit ) computations ) 

Program name 1 + 4 Anupam Alpha

1 + 8 Anupam Alpha

Cray XMP 216

T-80 (24 Hr- forecast)

14 minutes 11 minutes 12.5 minutes

     All timings are Wall clock times

August 23, 2006 Talk at SASTRA 54

BARC’s New Super Computing Facility

BARC’s new Supercomputing facility was inaugurated by Honorable PM, Dr. Manmohan Singh on 15th November 2005.

A 512-node ANUPAM-AMEYA Supercomputer was developed with a speed of 1.7 Teraflop for HPC benchmark.

A 1024 node Supercomputer ( ~ 5 Tera flop) is planned during 2006-07

Being used by in-house users

512 Node ANUPAM-AMEYAExternal View of New Super Computing Facility

August 23, 2006 Talk at SASTRA 55

Support equipment

• Terminal servers– Connect serial consoles from 16 nodes onto a single

ethernet link– Consoles of each node can be accessed using the terminal

servers and management network• Power Distribution Units

– Network controlled 8 outlet power distribution unit– Facilities such as power sequencing, power cycling of

each node possible– Current monitoring

• Racks– 14 racks of 42U height, 1000 mm depth, 600 mm width

August 23, 2006 Talk at SASTRA 56

Software components

• Operating System on each node of the cluster is Scientific Linux 4.1 for 64 bit architecture– Fully compatible with Redhat Enterprise Linux– Kernel version 2.6

• ANUPRO Parallel Programming Environment

• Load Sharing and Queuing System

• Cluster Management

August 23, 2006 Talk at SASTRA 57

ANUPRO Programming Environment

• ANUPAM supports following programming interfaces– MPI– PVM– Anulib– BSD Sockets

• Compilers– Intel Fortran Compiler– Portland Fortran Compiler

• Numerical Libraries– BLAS (ATLAS and MKL implementations)– LAPACK (Linear Algebra Package)– Scalapack (Parallel Lapack)

• Program development tools– MPI performance monitoring tools (Upshot, Nupshot, Jumpshot)– ANUSOFT tool suite (FFLOW, S_TRACE, ANU2MPI, SYN)

August 23, 2006 Talk at SASTRA 58

Load Sharing and Queuing System

• Torque based system resource manager• Keep track of available nodes in the system• Allot nodes to jobs• Maintain job queues with job priority, reservations• User level commands to submit jobs, delete jobs,

find out job status, find out number of available nodes

• Administrator level commands to manage nodes, jobs and queues, priorities, reservations

August 23, 2006 Talk at SASTRA 59

ANUNETRA : Cluster Management System

• Management and Monitoring of one or more clusters from a single interface

• Monitoring functions:– Status of each node and different metrics (load, memory, disk space,

processes, processors, traffic, temperature and so on)– Jobs running on the system– Alerts to the administrators in case of malfunctions or anomalies– Archival of monitored data for future use

• Management functions:– Manage each node or groups of nodes (reboots, power cycling,

online/offline, queuing and so on)– Job management

August 23, 2006 Talk at SASTRA 60

Metric View on Ameya

Node View on Ameya

August 23, 2006 Talk at SASTRA 61

SMART: Self Monitoring And Rectifying Tool

• Service running on each node which keeps track of things happening in the system– Hanging jobs– Services terminated abnormally

• SMART takes corrective action to remedy the situation and improve availability of the system

August 23, 2006 Talk at SASTRA 62

Accounting System

• Maintains database entries for each and every job run on the system– Job ID, user name, number of nodes, – Queue name, API (mpi, pvm, anulib)– Submit time, start and end time– End status (finished, cancelled, terminated)

• Computes system utilization

• User wise, node wise statistics for different periods of time

August 23, 2006 Talk at SASTRA 63

Accounting system – Utilization plot

August 23, 2006 Talk at SASTRA 64

Other tools

• Console logger– Logs all console messages of each node into a database for

diagnostics purposes

• Sync tool– Synchronizes important files across nodes

• Automated backup– Scripts for taking periodic backups of user areas onto the

tape libraries

• Automated installation service – Non-interactive installation of node and all required

software

August 23, 2006 Talk at SASTRA 65

PFS gives a different view of I/O system with its unique architecture and hence provides an alternative platform for development of I/O intensive applications

Data(File) striping in a distributed environment.

Supports collective I/O operations.

Interface as close to a standard LINUX interface as possible.

Fast access to file data in parallel environment irrespective of how and where file is distributed.

Parallel file scatter and gather operations

Parallel File System (PFS)

August 23, 2006 Talk at SASTRA 66

Architecture of PFS

MITMIT

SERVERSERVER

M I TM I T

D I TD I T

PFS DAEMONPFS DAEMON

I/O DAEMONI/O DAEMON

LI TLI T

I/O DAEMONI/O DAEMON

LI TLI T

I/O DAEMONI/O DAEMON

LI TLI T

I/O ManagerI/O Manager

RequestRequest

DataData

NODE 1 NODE 2 NODE N

PFS PFS ManagerManager

August 23, 2006 Talk at SASTRA 67

Processing ( parallelization of computation)

I/O ( parallel file system)

Visualization (parallelized graphic pipeline/ Tile Display Unit)

Complete solution to scientific problemsby exploiting parallelism for

August 23, 2006 Talk at SASTRA 68

Domain-specific Automatic Parallelization (DAP)

� Domain: a class of applications such as FEM applications

� Experts use domain-specific knowledge

� DAP is a combination of expert system and parallelizing compiler

� Key features: interactive process, experience- based heuristic techniques, and visual environment

August 23, 2006 Talk at SASTRA 69

Operation and Management Tools

• Manual installation of all nodes with O.S., compilers, Libraries etc is not only time consuming it is tedious and error prone

• Constant monitoring of hardware/networks and software is essential to report healthiness of the system while running 24/7 operation

• Debugging and communication measurement tools are needed

• Tools are also needed to measure load, free CPU, predict load, checkpoint restart, replace failed node etc.

We have developed all these tools to enrich ANUPAM software environment

August 23, 2006 Talk at SASTRA 70

• Programming so many nodes concurrently remains a major barrier for most applications

- Source code should be known & parallisable

- Scalable algorithm development is not an easy task

- All resources are allotted for a single job

- User has to worry about message passing, synchronization and scheduling of his job

- 15% users only require these solutions, rest can manage with normal PCs

Fortunately lot of free MPI codes and even parallel solvers are now available

Still there is large gap between technology & usage as parallel tools are not so user friendly

Limitations of Parallel Computing

August 23, 2006 Talk at SASTRA 71

Evolution in Hardware

• Compute Nodes:– Intel i860

– Alpha 21x64

– Intel x86

• Interconnection Network:– Bus : MultiBus-II, Wide SCSI

– Switched Network: ATM, Fast Ethernet, Gigabit Ethernet

– SAN: Scalable Coherent Interface

August 23, 2006 Talk at SASTRA 72

Evolution in Software

• Parallel Program Development API: – ANULIB (Proprietary) to MPI (Standard)

• Runtime environment– I/O restricted to master only (860) to Full I/O (Alpha

and x86)– One program at a time (860) to Multiple Programs to

Batch operations

• Applications– In-house parallel to Ready made parallel applications– Commercially available parallel software

August 23, 2006 Talk at SASTRA 73

Issues in building large clusters

• Scalability of interconnection network• Scalability of software components

– Communication Libraries– I/O Subsystem– Cluster Management Tools– Applications

• Installation and Management Procedures• Troubleshooting Procedures

August 23, 2006 Talk at SASTRA 74

Other Issues in operating large clusters

• Space Management– Node form factor– Layout of the nodes– Cable routing and weight

• Power Management

• Cooling arrangements

August 23, 2006 Talk at SASTRA 75

The P2P Computing• Computing based on P2P architecture allows to share distributed

resources with each other with or without the support from a server. • How do you manage under utilized resources?

- It is seen that utilization of desktop PC is typically <10 %, and this percentage is decreasing even further as PCs are becoming more powerful

- Large organizations must be having more than thousand PCs, each delivering > 20 MFlops and this power is growing with every passing day ……Trick is to use Cycle Stealing mode

- Each PC now has about 20Gbyte disc capacity 80Gb X 1000 = 80 Terabyte storage space is available ; Very large File

storage – How do you harness power of so many PCs in a large organization?

…….. Issue of “Owership” hurdle , to be resolved– Latency & bandwidth of LAN environment is quite adequate for P2P

computing………Space management no problem; use PCs wherever they are!!

August 23, 2006 Talk at SASTRA 76

INTERNET COMPUTING• Today you can’t run your jobs on the Internet• Internet Computing using idle PC’s, is becoming an important

computing platform (Seti@home,Napster,Gnutella,Freenet, KaZak)– www is now a promising candidate for core component of wide

area distributed computing environment.– Efficient Client/server models & protocols– Transparent networking, navigation & GUI with multimedia

access & dissemination for data visualization– Mechanism for distributed computing such as CGI.Java

• With improved performance (price/performance) & the availability of Linux, Web Services ( SOAP, WSDL, UDDI,WSFL), COM technology it is easy to develop loosely coupled distributed applications

August 23, 2006 Talk at SASTRA 77

Difficulties in present systems– As technology is constantly changing there is a need for regular

upgrade/enhancement

– Cluster/Servers are not fail safe and fault tolerant.

– Many systems are dedicated to a single application, thus idle when

application has no load

– Many clusters in the organization remain idle

– For operating a computer centre 75 % cost come from

environment upkeep, staffing, operation and maintenance.

– Computers, Networks, Clusters, Parallel Machines and Visual

systems are not tightly coupled by software; difficult for users to

use it

August 23, 2006 Talk at SASTRA 78

Computer Assisted Science & Engineering CASE

High Speed Network

Disks

PCs, SMPsClusters

Problem Solving Environment

RAID

Visual Data Server

Analysis – a very general modelCan we tie all components tightly by software?

Menu-Template- Solver- Pre & Post- Mesh

August 23, 2006 Talk at SASTRA 79

User Access Point

Resource Broker

Grid Resources

Result

GRID CONCEPT

August 23, 2006 Talk at SASTRA 80

Are Grids a Solution?

Goals of Grid Computing

Reduce computing costs

Increase computing resources

Reduce job turnaround time

Enable parametric analyses

Reduce Complexity to Users

Increase Productivity

Technology Issues

Clusters

Internet infrastructure

MPP solver adoption

Administration of desktop

Use middleware to automate

Virtual Computing Centre

“Dependable, consistent, pervasive access to resources”

“Grid Computing” means different things to different people.

August 23, 2006 Talk at SASTRA 81

What is needed?

Reply Choice

Computational ResourcesClusters

MPP

Workstations

MPI, PVM,Condor...

RequestBroker

Scheduler

Database

Client - RPC like

MatlabMathematicaC, Fortran Java, Perl Java GUI

Gatekeeper

ISP

August 23, 2006 Talk at SASTRA 82

Why Migrate Processes ? LOAD BALANCING

Reduce average response time Speed up individual jobs Gain higher throughput

MOVE PROCESS CLOSER TO ITS RESOURCES

Use resources effectively Reduce network traffic

INCREASE SYSTEMS RELIABILITY MOVE PROCESS TO A MACHINE HOLDING / CONFIDENTIAL DATA

August 23, 2006 Talk at SASTRA 83

ProcessFILE-SERVER

FILE-SERVER

Process

ProcessProcess

ProcessProcess

Process

Process

Process

PR-JOB3

PR-JOB1PR-JOB2

ProcessPR-JOB1

PR-JOB2Process

Process

ProcessProcess

PR-JOB2

PR-PARLPR-PARL

PR-PARL

PR-PARLPR-PARL

PR-JOB3

Process

Process

August 23, 2006 Talk at SASTRA 84

What does the Grid do for you?• You submit your work

• And the Grid– Finds convenient places for it to be run

– Organises efficient access to your data

• Caching, migration, replication

– Deals with authentication to the different sites that you will be using

– Interfaces to local site resource allocation mechanisms, policies

– Runs your jobs, Monitors progress, Recovers from problems,Tells you

when your work is complete

• If there is scope for parallelism, it can also decompose your work

into convenient execution units based on the available resources,

data distribution

August 23, 2006 Talk at SASTRA 85

User Interface (UI)User Interface (UI): The place where users logon to the Grid

Computing Element (CE)Computing Element (CE): A batch queue on a site’s computers where the user’s job is executed

Storage Element (SE)Storage Element (SE): provides (large-scale) storage for files

Resource Broker (RB)Resource Broker (RB): Matches the user requirements with the available resources on the Grid

Main components

Information SystemInformation System: Characteristics and status of CE and SE (Uses “GLUE schema”)

August 23, 2006 Talk at SASTRA 86

INTERNET

• Virtual organisations negotiate with sites to agree access to resources

• Grid middleware runs on each shared resource to provide– Data services– Computation

services– Single sign-on

• Distributed services (both people and middleware) enable the grid

Typical current grid

E-infrastructure is the key !!!

August 23, 2006 Talk at SASTRA 87

Biomedical applications

EGEE tutorial, Seoul 88

Earth sciences applications

• Earth Observations by Satellite – Ozone profiles

• Solid Earth Physics – Fast Determination of mechanisms

of important earthquakes

• Hydrology – Management of water resources

in Mediterranean area (SWIMED)

• Geology– Geocluster: R&D initiative of the

Compagnie Générale de Géophysique

A large variety of applications is the key !!!

August 23, 2006 Talk at SASTRA 90

GARUDA

• Department of Information Technology

(DIT), Govt. of India, has funded CDAC to

deploy computational grid named

GARUDA as Proof of Concept project.

• It will connect 45 institutes in 17 cities in

the country at 10/100 Mbps bandwidth.

August 23, 2006 Talk at SASTRA 91

Other Grids in India

• EU-IndiaGrid (ERNET, C-DAC, BARC,TIFR,SINP,PUNE UNIV, NBCS)

• Coordination with Geant for Education Research

• DAE/DST/ERNET MOU for Tier II LHC Grid (10 Univ)

• BARC MOU with INFN, Italy to setup Grid research Hub

• C-DAC’s GARUDA Grid

• Talk about Bio-Grid and Weather-Grid

August 23, 2006 Talk at SASTRA 92

Summary• There have been three generations of ANUPAM, all with

different architectures, hardware and software• Usage of ANUPAM has increased due to standardization

in programming models and availability of parallel software

• Parallel processing awareness has increased among users• Building parallel computers is a learning experience• Development of Grid Computing is equally challenging

August 23, 2006 Talk at SASTRA 93

THANK YOUTHANK YOU