20
1 1 Grid Computing and NetSolve Jack Dongarra 2 Between Now and the End Today March 30 th Grids and NetSolve April 6 th Felix Wolf: More on Grid computing, peer-to-peer, grid infrastructure etc April 13 th Performance Measurement with PAPI (Dan Terpstra) April 20 th : Debugging (Shirley Moore) April 25 th (Monday): Presentation of class projects 3 More on the In-Class Presentations Start at 1:00 on Monday, 4/25/05 Roughly 20 minutes each Use slides Describe your project, perhaps motivate via application Describe your method/approach Provide comparison and results See me about your topic 4 Distributed Computing Concept has been around for two decades Basic idea: run scheduler across systems to runs processes on least-used systems first Maximize utilization Minimize turnaround time Have to load executables and input files to selected resource Shared file system File transfers upon resource selection 5 Examples of Distributed Computing Workstation farms, Condor flocks, etc. Generally share file system SETI@home project, United Devices, etc. Only one source code; copies correct binary code and input data to each system Napster, Gnutella: file/data sharing NetSolve Runs numerical kernel on any of multiple independent systems, much like a Grid solution 6 SETI@home: Global Distributed Computing Running on 500,000 PCs, ~1000 CPU Years per Day 485,821 CPU Years so far Sophisticated Data & Signal Processing Analysis Distributes Datasets from Arecibo Radio Telescope

Grid Computing and NetSolve - netlib.org

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

1

1

Grid Computing andNetSolve

Jack Dongarra

2

Between Now and the End

Today March 30th

Grids and NetSolveApril 6th

Felix Wolf: More on Grid computing, peer-to-peer, grid infrastructure etc

April 13th

Performance Measurement with PAPI (Dan Terpstra) April 20th:

Debugging (Shirley Moore)April 25th (Monday):

Presentation of class projects

3

More on the In-Class Presentations

Start at 1:00 on Monday, 4/25/05Roughly 20 minutes eachUse slidesDescribe your project, perhaps motivate via applicationDescribe your method/approachProvide comparison and results

See me about your topic

4

Distributed Computing

Concept has been around for two decadesBasic idea: run scheduler across systems to runs processes on least-used systems first

Maximize utilizationMinimize turnaround time

Have to load executables and input files to selected resource

Shared file systemFile transfers upon resource selection

5

Examples of Distributed Computing

Workstation farms, Condor flocks, etc.Generally share file system

SETI@home project, United Devices, etc.Only one source code; copies correct binary code and input data to each system

Napster, Gnutella: file/data sharingNetSolve

Runs numerical kernel on any of multiple independent systems, much like a Grid solution

6

SETI@home: Global Distributed Computing

Running on 500,000 PCs, ~1000 CPU Years per Day485,821 CPU Years so far

Sophisticated Data & Signal Processing AnalysisDistributes Datasets from Arecibo Radio Telescope

2

7

SETI@home

Use thousands of Internet-connected PCs to help in the search for extraterrestrial intelligence.Uses data collected with the Arecibo Radio Telescope, in Puerto Rico When their computer is idle or being wasted this software will download a 300 kilobyte chunk of data for analysis. The results of this analysis are sent back to the SETI team, combined with thousands of other participants.

Largest distributed computation project in existence

~ 400,000 machinesAveraging 27 Tflop/s

Today many companies trying this for profit.

8

Grid Computing - from ET toAnthrax

9

United Devices and Cancer Screening

10

Grid Computing

Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals—in the absence of central control, omniscience, trust relationships.Resources (HPC systems, visualization systems & displays, storage systems, sensors, instruments, people) are integrated via ‘middleware’ to facilitate use of all resources.

11

Why Grids?

Resources have different functions, but multiple classes resources are necessary for most interesting problems.Power of any single resource is small compared to aggregations of resourcesNetwork connectivity is increasing rapidly in bandwidth and availabilityLarge problems require teamwork and computation

12

Why Grids?

A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour1,000 physicists worldwide pool resources for petaop analyses of petabytes of dataCivil engineers collaborate to design, execute, & analyze shake table experimentsClimate scientists visualize, annotate, & analyze terabyte simulation datasetsAn emergency response team couples real time data, weather model, population data

3

13

Why Grids? (contd.)

A multidisciplinary analysis in aerospace couples code and data in four companiesA home user invokes architectural design functions at an application service providerAn application service provider purchases cycles from compute cycle providersScientists working for a multinational soap company design a new productA community group pools members’ PCs to analyze alternative designs for a local road

14

The Grid Problem

Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resourceFrom “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”

Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals --assuming the absence of…

central location,central control, omniscience, existing trust relationships.

15

Elements of the ProblemResource sharing

Computers, storage, sensors, networks, …Sharing always conditional: issues of trust, policy, negotiation, payment, …

Coordinated problem solvingBeyond client-server: distributed data analysis, computation, collaboration, …

Dynamic, multi-institutional virtual organisationsCommunity overlays on classic org structuresLarge or small, static or dynamic

16

Grid vs. Internet/Web Services?

We’ve had computers connected by networks for 20 yearsWe’ve had web services for 5-10 yearsThe Grid combines these things, and brings additional notions

Virtual OrganizationsInfrastructure to enable computation to be carried out across these

Authentication, monitoring, information, resource discovery, status, coordination, etc

Can I just plug my application into the Grid?No! Much work to do to get there!

17

What does this mean now for users and developers?There is a grand vision of the future

Collecting resources around the world into VosSeamless access to them, with a single signonNEW applications to exploit of them in unique ways!Today we want to help you prepare for this

There is a frustrating reality of the presentThese technologies are not yet fully matureNot fully deployedNot consistent across even single Vos

But centers and funding agencies worldwide are pushing this very, very hard

Better get ready nowYou can help! Work with your centers to get this deployed

18

Network Bandwidth Growth

Network vs. computer performanceComputer speed doubles every 18 monthsNetwork speed doubles every 9 monthsDifference = order of magnitude per 5 years

1986 to 2000Computers: x 500Networks: x 340,000

2001 to 2010Computers: x 60Networks: x 4000

Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

4

19

The GridTo treat CPU cycles and software like commodities.

Napster on steroids.

Enable the coordinated use of geographically distributed resources – in the absence of central control and existing trust relationships. Computing power is produced much like utilities such as power and water are produced for consumers.Users will have access to “power” on demand “When the Network is as fast as the computer’s internal links, the machine disintegrates across the Net into a set of special purpose appliances”

Gilder Technology Report June 2000

20

Computational Grids and Electric Power Grids

Why the Computational Grid is like the Electric Power Grid

Electric power is ubiquitousDon’t need to know the source of the power (transformer, generator) or the power company that serves it

Why the Computational Grid is different from the Electric Power Grid

Wider spectrum of performance Wider spectrum of servicesAccess governed by more complicated issues

Security Performance Socio-political factors

21

What is Grid Computing? Resource sharing & coordinated problem

solving in dynamic, multi-institutional virtual organizations

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

IMAGING INSTRUMENTS

COMPUTATIONALRESOURCES

LARGE-SCALE DATABASES

DATA ACQUISITION ,ANALYSIS

ADVANCEDVISUALIZATION

22

The Computational Grid is…

…a distributed control infrastructure that allows applications to treat compute cycles as commodities.Power Grid analogy

Power producers: machines, software, networks, storage systemsPower consumers: user applications

Applications draw power from the Grid the way appliances draw electricity from the power utility.

SeamlessHigh-performanceUbiquitousDependable

23

Introduction

Grid systems can be classified depending on their usage:

ComputationalGrid

GridSystems

Collaborative

Data Grid

ServiceGrid

HighThroughput

On Demand

Multimedia

DistributedSupercomputing

24

Introduction

Computational Grid:denotes a system that has a higher aggregate capacity than any of its constituent machine it can be further categorized based on how the overall capacity is used

Distributed Supercomputing Grid:executes the application in parallel on multiple machines to reduce the completion time of a job

5

25

Introduction

Grand challenge problems typically require a distributed supercomputing Grid – one of the motivating factors of early Grid research –still driving in some quartersHigh throughput Grid:

increases the completion rate of a stream of jobs arriving in real timeASIC or processor design verifications tests would be run on a high throughput Grid

26

Introduction

Data Grid: systems that provide an infrastructure for synthesizing new information from data repositories such as digital libraries or data warehousesapplications for these systems would be special purpose data mining that correlates information from multiple different high volume data sources

27

Introduction

Service Grid:systems that provide services that are not provided by any single machinesubdivided based on the type of service they provide

collaborative Grid:connects users and applications into collaborative workgroups -- enable real time interaction between humans and applications via a virtual workspace

28

Introduction

Multimedia Grid:provides an infrastructure for real time multimedia applications -- requires the support quality of service across multiple different machines whereas a multimedia application on a single dedicated machine can be deployed without QoSsynchronization between network and end-point QoS

29

Introduction

demand Grid:category dynamically aggregates different resources to provide new servicesdata visualization workbench that allows a scientist to dynamically increase the fidelity of a simulation by allocating more machines to a simulation would be an example

30

The Grid

6

31

The Grid Architecture Picture

Resource Layer

High speed networks and routers

Computers Data bases Online instruments

Service Layers

User Portals

Authentication

Co- Scheduling

Naming & Files Events

Grid Access & InfoProblem SolvingEnvironments

Application SciencePortals

Resource Discovery& Allocation Fault Tolerance

Software

32

Atmospheric Sciences Grid

Real time data

Data Fusion

General Circulation model

Regional weather model

Photo-chemical pollution model Particle dispersion model

TopographyDatabase

TopographyDatabase

VegetationDatabase

VegetationDatabaseBushfire modelEmissions

InventoryEmissions Inventory

33

Standard Implementation

GASS

Real time data

Data Fusion

General Circulation model

Regional weather model

Photo-chemical pollution model Particle dispersion model

TopographyDatabase

TopographyDatabase

VegetationDatabase

VegetationDatabaseEmissions

InventoryEmissions Inventory

MPIMPI

MPI

GASS/GridFTP/GRC

MPI

MPI

Bushfire model GASS

Change Models

34

Challenges in Grid Computing

Reliable performanceTrust relationships between multiple security domainsDeployment and maintenance of grid middleware across hundreds or thousands of nodesAccess to data across WAN’sAccess to state information of remote processesWorkflow / dependency managementDistributed software and license managementAccounting and billing

35

Basic Grid Architecture

Clusters and how grids are different than clustersDepartmental Grid ModelEnterprise Grid ModelGlobal Grid Model

36

Advantages of Grid Computing

Use Resources Scattered Across the World• Access to more computing power• Better access to data• Utilize unused cyclesComputing at a new level of complexity and scaleIncreased Collaboration across Virtual Organizations (VO )• groups of organizations that use the Grid to share

resources

7

37

Examples of Grids

TeraGrid – NSF funded linking 5 major research sites at 40 Gbs (www.teragrid.org)European Union Data Grid – grid for applications in high energy physics, environmental science, bioinformatics (www.eu-datagrid.org)Access Grid – collaboration systems using commodity technologies (www.accessgrid.org)Network for Earthquake Engineering Simulations Grid - grid for earthquake engineering (www.nees.org)

38

Teragrid Network

39 40

...and enabling broad based collaborations

GloriadNLR UltraScience NetTeraGrid

41

Grid Possibilities

A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour1,000 physicists worldwide pool resources for petaflop analyses of petabytes of dataCivil engineers collaborate to design, execute, & analyze shake table experimentsClimate scientists visualize, annotate, & analyze terabyte simulation datasetsAn emergency response team couples real time data, weather model, population data

42

Some Grid Usage Models

Distributed computing: job scheduling on Grid resources with secure, automated data transferWorkflow: synchronized scheduling and automated data transfer from one system to next in pipeline (e.g. compute-viz-storage)Coupled codes, with pieces running on differnet systems simultaneouslyMeta-applications: parallel apps spanning multiple systems

8

43

Grid Usage Models

Some models are similar to models already being used, but are much simpler due to:

single sign-onautomatic process schedulingautomated data transfers

But Grids can encompass new resources likes sensors and instruments, so new usage models will arise

44

Example Application Projects

Earth Systems Grid: environment (US DOE)EU DataGrid: physics, environment, etc. (EU)EuroGrid: various (EU)Fusion Collaboratory (US DOE)GridLab: astrophysics, etc. (EU)Grid Physics Network (US NSF)MetaNEOS: numerical optimization (US NSF)NEESgrid: civil engineering (US NSF)Particle Physics Data Grid (US DOE)

45

Some Grid Requirements –Systems/Deployment Perspective

Identity & authenticationAuthorization & policyResource discoveryResource characterizationResource allocation(Co-)reservation, workflowDistributed algorithmsRemote data accessHigh-speed data transferPerformance guaranteesMonitoring

AdaptationIntrusion detectionResource managementAccounting & paymentFault managementSystem evolutionEtc.Etc.

46

Some Grid Requirements –User Perspective

Single allocation: if any at allSingle sign-on: authentication to any Grid resources authenticates for all othersSingle compute space: one scheduler for all Grid resourcesSingle data space: can address files and data from any Grid resourcesSingle development environment: Grid tools and libraries that work on all grid resources

47

Programming & Systems ChallengesThe programming problem

Facilitate development of sophisticated applnsFacilitate code sharingRequires prog. envs: APIs, SDKs, tools

The systems problemFacilitate coordinated use of diverse resourcesFacilitate infrastructure sharing: e.g., certificate authorities, info servicesRequires systems: protocols, servicesE.g., port/service/protocol for accessing information, allocating resources

48

The Systems Challenges:Resource Sharing Mechanisms That…

Address security and policy concerns of resource owners and usersAre flexible enough to deal with many resource types and sharing modalitiesScale to large number of resources, many participants, many program componentsOperate efficiently when dealing with large amounts of data & computation

9

49

The Security Problem

Resources being used may be extremely valuable & the problems being solved extremely sensitiveResources are often located in distinct administrative domains

Each resource may have own policies & proceduresThe set of resources used by a single computation may be large, dynamic, and/or unpredictable

Not just client/serverIt must be broadly available & applicable

Standard, well-tested, well-understood protocolsIntegration with wide variety of tools

50

The Resource Management Problem

Enabling secure, controlled remote access to computational resources and management of remote computation

Authentication and authorizationResource discovery & characterizationReservation and allocationComputation monitoring and control

51

Grid Systems Technologies

Systems and security problems addressed by new protocols & services. E.g., Globus:

Grid Security Infrastructure (GSI) for securityGlobus Metadata Directory Service (MDS) for discoveryGlobus Resource Allocations Manager (GRAM) protocol as a basic building block

Resource brokering & co-allocation services

GridFTP, IBP for data movement

52

The Programming Problem

How does a user develop robust, secure, long-lived applications for dynamic, heterogeneous, Grids?Presumably need:

Abstractions and models to add to speed/robustness/etc. of developmentTools to ease application development and diagnose common problemsCode/tool sharing to allow reuse of code components developed by others

53

Grid Programming Technologies

“Grid applications” are incredibly diverse (data, collaboration, computing, sensors, …)

Seems unlikely there is one solution

Most applications have been written “from scratch,” with or without Grid servicesApplication-specific libraries have been shown to provide significant benefitsNo new language, programming model, etc., has yet emerged that transforms things

But certainly still quite possible

54

Examples of GridProgramming Technologies

MPICH-G2: Grid-enabled message passingCoG Kits, GridPort: Portal construction, based on N-tier architecturesGDMP, Data Grid Tools, SRB: replica management, collection managementCondor-G: simple workflow managementLegion: object models for Grid computingNetSolve: Network enabled solverCactus: Grid-aware numerical solver framework

Note tremendous variety, application focus

10

55

MPICH-G2: A Grid-Enabled MPI

A complete implementation of the Message Passing Interface (MPI) for heterogeneous, wide area environments

Based on the Argonne MPICH implementation of MPI (Gropp and Lusk)

Globus services for authentication, resource allocation, executable staging, output, etc.Programs run in wide area without changeSee also: MetaMPI, PACX, STAMPI, MAGPIE

www.globus.org/mpi 56

Grid Events

Global Grid Forum: working meetingMeets 3 times/year, alternates U.S.-Europe, with July meeting as major event

HPDC: major academic conferenceHPDC-11 in Scotland with GGF-8, July 2002

Other meetings includeIPDPS, CCGrid, EuroGlobus, Globus Retreats

www.gridforum.org, www.hpdc.org

57

Useful References

Book (Morgan Kaufman)www.mkp.com/grids

Perspective on Grids“The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, IJSA, 2001www.globus.org/research/papers/anatomy.pdf

All URLs in this section of the presentation, especially:

www.gridforum.org, www.grids-center.org, www.globus.org

58

Emergence of Grids

But Grids enable much more than apps running on multiple computers (which can be achieved with MPI alone)

virtual operating system: provides global workspace/address space via a single loginautomatically manages files, data, accounts, and security issuesconnects other resources (archival data facilities, instruments, devices) and people (collaborative environments)

59

Grids Are Inevitable

Inevitable (at least in HPC):leverages computational power of all available systemsmanages resources as a single system--easier for usersprovides most flexible resource selection and management, load sharingresearchers’ desire to solve bigger problems will always outpace performance increases of single systems; just as multiple processors are needed, ‘multiple multiprocessors’will be deemed so

60

Globus Grid Services

The Globus toolkit provides a range of basic Grid services

Security, information, fault detection, communication, resource management, ...

These services are simple and orthogonalCan be used independently, mix and matchProgramming model independent

For each there are well-defined APIsStandards are used extensively

E.g., LDAP, GSS-API, X.509, ... You don’t program in Globus, it’s a set of tools like Unix

11

61

Basic Grid Building Blocks

Client

Request

Agent

Choice

Computational Resources

Reply Clusters

MPP

Workstations

MPI, Condor,...

RPC-like

NetSolve – Solving computational

problems remotely

Condor –harnessing idle workstationsfor high-throughput computing

OwnerAgent

ExecutionAgent

ApplicationProcess

CustomerAgent

ApplicationProcess

ApplicationAgent

Data &ObjectFiles

CkptFiles

ObjectFiles

RemoteI/O &Ckpt

ObjectFiles

Submission Execution

IBP – Internet Backplane Protocol is middleware for managing and using remote storage.

62

Maturation of Grid ComputingResearch focus moving from building of basic infrastructure and application demonstrations to

MiddlewareUsable production environmentsApplication performanceScalability Globalization

Development, research, and integration happening outside of the original infrastructure groups

Grids becoming a first-class tool for scientific communities

GriPhyN (Physics), BIRN (Neuroscience), NVO (Astronomy), Cactus (Physics), …

63

Widespread interest from government in developing computational Grid platforms

Broad Acceptance of Grids as a Critical Platform for Computing

NSF’s CyberinfrastructureNASA’s Information Power Grid

DOE’s Science Grid64

Broad Acceptance of Grids as a Critical Platform for Computing

Widespread interest from industry in developing computational Grid platformsIBM, Sun, Entropia, Avaki, Platform, …

On August 2, 2001, IBM announced a new corporate initiative to support and exploit

Grid computing. AP reported that IBM

was investing $4 billion into building 50 computer server farms

around the world.

AVAKI

65

Grids Form the Basis of a National Information Infrastructure

TeraGrid will provide in aggregate

• 13.6 trillion calculations per second• Over 600 trillion bytes of immediately accessible data• 40 gigabit per second network speed• Provide a new paradigm for data-oriented computing

• Critical for disaster response, genomics, environmental modeling, etc.

August 9, 2001: NSF Awarded $53,000,000

to SDSC/NPACI and NCSA/Alliance

for TeraGrid

66

Distributed and Parallel Systems

Distributedsystemshetero-geneous

Massivelyparallelsystemshomo-geneous

Grid

bas

ed

Com

putin

gBe

owul

f clus

ter

Netw

ork

of w

sCl

uste

rs w

/sp

ecia

l int

erco

nnec

t

Entro

pia

ASCI

Tflo

ps(1

30 T

flop/

s)

Gather (unused) resourcesSteal cyclesSystem SW manages resourcesSystem SW adds value10% - 20% overhead is OKResources drive applicationsTime to completion is not criticalTime-shared

Bounded set of resources Apps grow to consume all cyclesApplication manages resourcesSystem SW gets in the way5% overhead is maximumApps drive purchase of equipmentReal-time constraintsSpace-shared

SETI

@ho

me

(27

Tflo

p/s)

Para

llel D

ist m

em

12

67

Basic Usage Scenarios

Grid based numerical library routines

User doesn’t have to have software library on their machine, LAPACK, SuperLU, ScaLAPACK, PETSc, AZTEC, ARPACK

Task farming applications“Pleasantly parallel” executioneg Parameter studies

Remote application executionComplete applications with user specifying input parameters and receiving output

“Blue Collar” Grid Based Computing

Does not require deep knowledge of network programmingLevel of expressiveness right for many usersUser can set things up, no “su” requiredIn use today, up to 200 servers in 9 countries

Can plug into Globus, Condor, NINF, …

68

NetSolve Network Enabled Server

NetSolve is an example of a grid based hardware/software server.Easy-of-use paramountBased on a RPC model but with …

resource discovery, dynamic problem solving capabilities, load balancing, fault tolerance asynchronicity, security, …

Other examples are NEOS from Argonne and NINF Japan.Use resources, not tie together geographically distributed resources, for a single application.

69

NetSolve: The Big Picture

AGENT(s)

AC

S1 S2

S3 S4

Client

Matlab

Mathematica

C, Fortran

Web

Schedule

Database

No knowledge of the grid required, RPC like.

IBP Depot

70

NetSolve: The Big Picture

AGENT(s)

AC

S1 S2

S3 S4

Client

Matlab

Mathematica

C, Fortran

Web

Schedule

Database

No knowledge of the grid required, RPC like.

A, BIBP Depot

71

NetSolve: The Big Picture

AGENT(s)

AC

S1 S2

S3 S4

Client

Matlab

Mathematica

C, Fortran

Web

Schedule

Database

No knowledge of the grid required, RPC like.

HandlebackIBP Depot

72

NetSolve: The Big Picture

AGENT(s)

AC

S1 S2

S3 S4

Client

Answer (C)

S2 !

Request

Op(C, A, B)

Matlab

Mathematica

C, Fortran

Web

Schedule

Database

No knowledge of the grid required, RPC like.

A, B

OP, handle

IBP Depot

13

73

NetSolve Agent

Name server for the NetSolve system.Information Service

client users and administrators can query the hardware and software services available.

Resource schedulermaintains both static and dynamic information regarding theNetSolve server components touse for the allocation of resources

Agent

74

NetSolve Agent

Resource Scheduling (cont’d):CPU Performance (LINPACK).Network bandwidth, latency.Server workload.Problem size/algorithm complexity.Calculates a “Time to Compute.” for each appropriate server.Notifies client of most appropriate server.

Agent

75

NetSolve - Load BalancingNetSolve agent :

predicts the execution times and sorts the servers

Prediction for a server based on :• Its distance over the network

- Latency and Bandwidth- Statistical Averaging

• Its performance (LINPACK benchmark)• Its workload• The problem size and the algorithm complexity

Cached data Quick estimate

workload out of date ?76

Function Based Interface.Client program embeds call from NetSolve’s API to access additional resources.Interface available to C, Fortran, Matlab, Mathematica, and Java.Opaque networking interactions.NetSolve can be invoked using a variety of methods: blocking, non-blocking, task farms, …

NetSolve Client

Client

77

NetSolve Client

Intuitive and easy to use.Matlab Matrix multiply e.g.:

A = matmul(B, C);

A = netsolve(‘matmul’, B, C);

• Possible parallelisms hidden.

Client

78

NetSolve Client

i. Client makes request to agent.

ii. Agent returns list of servers.

iii. Client tries each one in turn untilone executes successfully or list is exhausted.

Client

14

79

NetSolve - MATLAB Interface

>> define sparse matrix A>> define rhs>> [x, its] = netsolve( ‘itmeth’, ‘petsc’, A, rhs );…>> [x, its] = netsolve( ‘itmeth’, ‘aztec’, A, rhs ); >> [x, its] = netsolve( ‘solve’, ‘superlu’, A, rhs ); >> [x, its] = netsolve( ‘solve’, ‘ma28’, A, rhs );

Synchronous Call

Asynchronous Calls also available 80

NetSolve - FORTRAN Interface

parameter( MAX = 100)double precision A(MAX,MAX), B(MAX)integer IPIV(MAX), N, INFO, LWORKinteger NSINFO

call DGESV(N,1,A,MAX,IPIV,B,MAX,INFO)

Easy to ‘switch’ to NetSolve

call NETSL(‘DGESV()’,NSINFO,N,1,A,MAX,IPIV,B,MAX,INFO)

81

Hiding the Parallel Processing

User maybe unaware of parallel processing

NetSolve takes care of the starting the message passing system, data distribution, and returning the results.

82

Problem Description File

Problem Description File defines problem specification used to add functional modules to NetSolve server.Wrapper to provide binding between the NetSolve client interface and server function being integrated.Complex syntax defines input/output objects, calling sequences, libraries to link, etc…Parsed by NetSolve to create “service” program.

83

Generating New Services in NetSolve

Add additional functionalityDescribe the interface (arguments)Generate wrapperInstall into server

Java G

UI

NetSolveParser/

Compiler

@PROBLEM degsv@DESCRIPTIONThis is a linear solver for dense matrices from the LAPACKLibrary. Solves Ax=b.@INPUT 2@OBJECT MATRIX DOUBLE ADouble precision matrix@OBJECT VECTOR DOUBLE bRight hand side@OUTPUT 1@OBJECT VECTOR DOUBLE x…

ServerService

Service

Service

Service

NewService

New Service Added!

84

Problem Description Specification

Specifies the calling interface between GridSolve and the service routine

Original NetSolve problem description filesStrange notationDifficult for users to understand

Previous attempts to simplify involved GUI front-ends

In GridSolve, the format is totally re-designed Specified in a manner similar to normal function prototypesSimilar to Ninf

15

85

GridSolve Problem Description (DGESV)

SUBROUTINE dgesv(IN int N, IN int NRHS,INOUT double A[LDA][N], IN int LDA,OUT int IPIV[N], INOUT double B[LDB][NRHS],IN int LDB, OUT int INFO)

"This solves Ax=b using LAPACK"LANGUAGE = "FORTRAN"LIBS = "$(LAPACK_LIBS) $(BLAS_LIBS)"COMPLEXITY = "2.0*pow(N,3.0)*(double)NRHS"MAJOR="COLUMN"

SUBROUTINE DGESV(N,NRHS,A,LDA,IPIV,B,LDB,INFO)INTEGER INFO, LDA, LDB, N, NRHSINTEGER IPIV( * )DOUBLE PRECISION A( LDA, * ), B( LDB, * )

Original Fortran Subroutine:

GridSolve IDL Specification:

86

GridSolve Interface Definition Language

Data types: int, char, float, doubleArgument passing modes:

IN -- input only; not modifiedINOUT -- input and outputOUT -- output onlyVAROUT -- variable length output only dataWORKSPACE -- server-side allocation of workspace; not passed as part of calling sequence

Argument sizeSpecified as expression using scalar arguments, e.g. ddot(IN int n, IN double dx[n*incx], IN int incx, …All typical operators supported (+, -, *, /, etc).

87

Problem AttributesMAJOR

Row or column major; depends on the implementation of the service routine

LANGUAGELanguage in which the service routine is implemented; currently C or Fortran

LIBSAdditional libraries to be linked

COMPLEXITYTheoretical complexity of the service routine, specified in terms of the arguments

88

Building the Servicescd GridSolve/src/problemmake check

When building a new service, the server should be restarted, but thereafter it is not necessary.

For more detailed documentation, consult the manual:

cd GridSolve/doc/ugmakeghostview ug.ps

89

NetSolve:How to Install Software

ComputationalModules

NetSolve problemdescription files

NetSolveserver daemonClient

stubs

• User can install new components• Problem description files• Java applet to generate them

Javaapplet

90

NetSolve: How It Works

ComputationalModules

NetSolve problemdescription files

NetSolveserver daemonClient

stubsRegister

Query

Reply

Request

• Problem description files• Client download stubs at run-time• Problem description files are portable• Java applet to generate them

16

91

Task Farming -Multiple Requests To Single Problem

A Solution:Many calls to netslnb( ); /* non-blocking */

Farming Solution:Single call to netsl_farm( );

Request iterates over an “array of input parameters.”

Adaptive scheduling algorithm.

Useful for parameter sweeping, and independently parallel applications.

92

Data Persistence

Chain together a sequence of NetSolve requests.Analyze parameters to determine data dependencies. Essentially a DAG is created where nodes represent computational modules and arcs represent data flow.Transmit superset of all input/output parameters and make persistent near server(s) for duration of sequence execution.Schedule individual request modules for execution.

93

netsl(“command1”, A, B, C);netsl(“command2”, A, C, D);netsl(“command3”, D, E, F);

Client Server

command1(A, B)

result C

Client Server

command2(A, C)

result D

Client Server

command3(D, E)

result F

netsl_begin_sequence( );netsl(“command1”, A, B, C);netsl(“command2”, A, C, D);netsl(“command3”, D, E, F);netsl_end_sequence(C, D);

Client Server

sequence(A, B, E)

Server

Client Serverresult F

input A,intermediate output C

intermediate output D,input E

Data Persistence (cont’d)

94

NetSolve Authenticationwith Kerberos

Kerberos used to maintain Access Control Lists and manage access to computational resources.NetSolve properly handles authorized and non-authorized components together in the same system.

95

NetSolve Authentication with Kerberos

NetSolve client

NetSolve agent

NetSolve servers

KerberosKDC

Servers registertheir presencewith the agent

and KDC

Client issues problem request;Agent responds with list of servers

Client sends work request to server; server replies requesting

authentication credentials

Client requests ticket from KDC

Client sends ticket and input to server; server authenticates and

returns the solution set

96

Server Software Repository

Dynamic downloading of new software.Enhance servers capabilities without shutdown and restart.Repository maintained independently of server.

Hardware SoftwareHardware Software

NetSolve Server

17

97

NetSolve: A Plug into the Grid

NetSolve

C Fortran

Globusproxy

NetSolveproxy

Ninfproxy

Condorproxy

Gridmiddleware

Resource Discovery

System Management Resource Scheduling

Fault Tolerance

98

NetSolve: A Plug into the Grid

NetSolve

C Fortran

Globus NetSolveservers

Ninfservers

NetSolveservers

Condor

NetSolveservers

Globusproxy

NetSolveproxy

Ninfproxy

Condorproxy

Grid back-ends

Gridmiddleware

Resource Discovery

System Management Resource Scheduling

Fault Tolerance

99

NetSolve: A Plug into the Grid

NetSolve

C Fortran

Matlab Mathematica Custom

Globus NetSolveservers

Ninfservers

NetSolveservers

Condor

NetSolveservers

Globusproxy

NetSolveproxy

Ninfproxy

Condorproxy

PSEfront-ends

Grid back-ends

SCIRun

Gridmiddleware

Remote procedure call

Resource Discovery

System Management Resource Scheduling

Fault Tolerance

100

•UCSD (F. Berman, H. Casanova, M. Ellisman), Salk Institute (T. Bartol), CMU (J. Stiles), UTK (Dongarra, R. Wolski)•Study how neurotransmitters diffuse and activate receptors in synapses•blue unbounded, red singly bounded, green doubly bounded closed,yellow doubly bounded open

NPACI Alpha Project - MCell: 3-D Monte-Carlo Simulation of Neuro-Transmitter Release in Between Cells

101

•Developed at: Salk Institute, CMU•In the past, manually run on available workstations•Transparent Parallelism, Load balancing, Fault-tolerance•Fits the farming semantic and need for NetSolve•Collaboration with AppLeS Project for scheduling tasks

Scrip

……

…...

Scrip

……

…...

Scrip

……

…...

Scrip

……

…...Scrip

……

…...Scrip

……

…...

Scrip

……

…...

Scrip

……

…...

Scrip

……

…...

Scrip

……

…...

script

List of seeds

MCell: 3-D Monte-Carlo Simulation of Neuro-Transmitter Release in Between Cells

102

SCIRun torso defibrillator application –Chris Johnson, U of Utah

Netsolve and SCIRun

18

103

IPARSIntegrated Parallel Accurate Reservoir Simulator.TICAM of UT, Austin, Director, Dr. Mary Wheeler.Portable and Modular reservoir simulator.Models waterflood, black oil, compositional, well management, recovery process …Reservoir and Environmental Simulation.

models black oil, waterflood, compositions3D transient flow of multiple phase

Integrates Existing Simulators.Framework simplified development

Provides solvers, handling for wells, table lookup.Provides pre/postprocessor, visualization.

Full IPARS access without Installation.IPARS Interfaces:

C, FORTRAN, Matlab, Mathematica, and Web.

104

Integrated Parallel Accurate Reservoir Simulator. Mary Wheeler’s group, UT-Austin

Reservoir and Environmental Simulation.models black oil, waterflood, compositions3D transient flow of multiple phase

Integrates Existing Simulators.Framework simplified development

Provides solvers, handling for wells, table lookup.Provides pre/postprocessor, visualization.

Full IPARS access without Installation.IPARS Interfaces:

C, FORTRAN, Matlab, Mathematica, and Web.

WebServer

NetSolveClient

IPARS-enabledServers

WebInterface

105

WebServer

NetSolveClient

IPARS-enabledServers

WebInterface

NetSolve server post-processing for visualization.Possible rendering of visualization via the internet using web browsers.

106

University of Tennessee Deployment: Scalable Intracampus Research Grid SInRG

Federated Ownership: CS, ChemEng., Medical School, Computational Ecology, El. Eng.Real applications, middleware development, logistical networking

The Knoxville Campus has two DS-3 commodity Internet connections and one DS-3 Internet2/Abilene connection. An OC-3 ATM link routes IP traffic between the Knoxville campus, National Transportation Research Center, andOak Ridge National Laboratory. UT participates in several national networking initiatives including Internet2 (I2),Abilene, the federal Next Generation Internet (NGI) initiative, Southern Universities Research Association (SURA)Regional Information Infrastructure (RII), and Southern Crossroads (SoX).

The UT campus consists of a meshed ATM OC-12 being migrated over to switched Gigabit by early 2002.

107

NetSolve Monitor

http://anaka.cs.utk.edu:8080/monitor/signed.html

108

Demo1 – Blocking Calls

This demo runs through 3 calls, sorting, solving a system of linear equations, and finding the eigenvalues of a matrix.

[c] = netsolve('dqsort',b);[x,y,z,info]=netsolve('dgesv',a,b); [a,wr,wi,vl,vr,info]=netsolve('dgeev','N','V',a);

This will invoke a quick sort algorithm and the LAPACK routines for Ax=b and Ax=lx.

It has one input, the size of the problem.

19

109

Demo2 – Non-Blocking Calls

This example shows a non-blocking call to NetSolve.

[rr1]=netsolve_nb('send','dgesv',a1,b1);while status1 < 0, [status1]=netsolve_nb('probe',rr1);

end [aa1,ipiv1,x1,info]=netsolve_nb('wait',rr1);

110

Demo7 – Sparse Matrix

This example solves a sparse matrix problem using the SuperLU, MA28, PETSc, and AZTEC.

[x]=netsolve('sparse_direct_solve', 'SUPERLU',A,rhs,0.3,1);

[x]=netsolve('sparse_direct_solve', 'MA28',A,rhs,0.3,1);

[x,its]=netsolve('sparse_iterative_solve', 'PETSC',A,rhs,1.e-6,500);

[x,its]=netsolve('sparse_iterative_solve', 'AZTEC',A,rhs,1.e-6,500);

111

Demo3 – SuperLU

This example solves a sparse matrix problem using the SuperLU software from Sherry Li and Jim Demmel at Berkeley.

[x]=netsolve('sparse_direct_solve', 'SUPERLU',A,rhs,0.3,1);

112

Demo4 – MA28

This example solves a sparse matrix problem using the MA28 software from Harwell-Rutherford Library

[x]=netsolve('sparse_direct_solve', 'MA28',A,rhs,0.3,1);

113

Demo5 - PETSc

This example solves a sparse matrix problem using the PETSc software from Argonne National Lab.Parallel processing is used in NetSolve to solve the problem.

[x,its]=netsolve('sparse_iterative_solve', 'PETSC',A,rhs,1.e-6,500);

114

Demo6 - AZTEC

This example solves a sparse matrix problem using the AZTEC software from Sandia National Lab.Parallel processing is used in NetSolve to solve the problem.

[x,its]=netsolve('sparse_iterative_solve', 'AZTEC',A,rhs,1.e-6,500);

20

115

Things Not Touched On

Hierarchy of AgentsMore scalable configuration

Monitor NetSolve NetworkTrack and monitor usage

Network statusNetwork Weather Service

Internet Backplane Protocol Middleware for managing and using remote storage.

Fault ToleranceVolker Strumpen’s Porch

Local / Global ConfigurationsAutomated Adaptive Algorithm Selection

Dynamic determine the nest algorithm based on system status and nature of user problem

116

Thanks

Fran Berman Director, San Diego Supercomputer Center

Jay BoisseauDirector, Texas Advanced Computing Center