Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
1
Grid Computing andNetSolve
Jack Dongarra
2
Between Now and the End
Today March 30th
Grids and NetSolveApril 6th
Felix Wolf: More on Grid computing, peer-to-peer, grid infrastructure etc
April 13th
Performance Measurement with PAPI (Dan Terpstra) April 20th:
Debugging (Shirley Moore)April 25th (Monday):
Presentation of class projects
3
More on the In-Class Presentations
Start at 1:00 on Monday, 4/25/05Roughly 20 minutes eachUse slidesDescribe your project, perhaps motivate via applicationDescribe your method/approachProvide comparison and results
See me about your topic
4
Distributed Computing
Concept has been around for two decadesBasic idea: run scheduler across systems to runs processes on least-used systems first
Maximize utilizationMinimize turnaround time
Have to load executables and input files to selected resource
Shared file systemFile transfers upon resource selection
5
Examples of Distributed Computing
Workstation farms, Condor flocks, etc.Generally share file system
SETI@home project, United Devices, etc.Only one source code; copies correct binary code and input data to each system
Napster, Gnutella: file/data sharingNetSolve
Runs numerical kernel on any of multiple independent systems, much like a Grid solution
6
SETI@home: Global Distributed Computing
Running on 500,000 PCs, ~1000 CPU Years per Day485,821 CPU Years so far
Sophisticated Data & Signal Processing AnalysisDistributes Datasets from Arecibo Radio Telescope
2
7
SETI@home
Use thousands of Internet-connected PCs to help in the search for extraterrestrial intelligence.Uses data collected with the Arecibo Radio Telescope, in Puerto Rico When their computer is idle or being wasted this software will download a 300 kilobyte chunk of data for analysis. The results of this analysis are sent back to the SETI team, combined with thousands of other participants.
Largest distributed computation project in existence
~ 400,000 machinesAveraging 27 Tflop/s
Today many companies trying this for profit.
8
Grid Computing - from ET toAnthrax
9
United Devices and Cancer Screening
10
Grid Computing
Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals—in the absence of central control, omniscience, trust relationships.Resources (HPC systems, visualization systems & displays, storage systems, sensors, instruments, people) are integrated via ‘middleware’ to facilitate use of all resources.
11
Why Grids?
Resources have different functions, but multiple classes resources are necessary for most interesting problems.Power of any single resource is small compared to aggregations of resourcesNetwork connectivity is increasing rapidly in bandwidth and availabilityLarge problems require teamwork and computation
12
Why Grids?
A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour1,000 physicists worldwide pool resources for petaop analyses of petabytes of dataCivil engineers collaborate to design, execute, & analyze shake table experimentsClimate scientists visualize, annotate, & analyze terabyte simulation datasetsAn emergency response team couples real time data, weather model, population data
3
13
Why Grids? (contd.)
A multidisciplinary analysis in aerospace couples code and data in four companiesA home user invokes architectural design functions at an application service providerAn application service provider purchases cycles from compute cycle providersScientists working for a multinational soap company design a new productA community group pools members’ PCs to analyze alternative designs for a local road
14
The Grid Problem
Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resourceFrom “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”
Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals --assuming the absence of…
central location,central control, omniscience, existing trust relationships.
15
Elements of the ProblemResource sharing
Computers, storage, sensors, networks, …Sharing always conditional: issues of trust, policy, negotiation, payment, …
Coordinated problem solvingBeyond client-server: distributed data analysis, computation, collaboration, …
Dynamic, multi-institutional virtual organisationsCommunity overlays on classic org structuresLarge or small, static or dynamic
16
Grid vs. Internet/Web Services?
We’ve had computers connected by networks for 20 yearsWe’ve had web services for 5-10 yearsThe Grid combines these things, and brings additional notions
Virtual OrganizationsInfrastructure to enable computation to be carried out across these
Authentication, monitoring, information, resource discovery, status, coordination, etc
Can I just plug my application into the Grid?No! Much work to do to get there!
17
What does this mean now for users and developers?There is a grand vision of the future
Collecting resources around the world into VosSeamless access to them, with a single signonNEW applications to exploit of them in unique ways!Today we want to help you prepare for this
There is a frustrating reality of the presentThese technologies are not yet fully matureNot fully deployedNot consistent across even single Vos
But centers and funding agencies worldwide are pushing this very, very hard
Better get ready nowYou can help! Work with your centers to get this deployed
18
Network Bandwidth Growth
Network vs. computer performanceComputer speed doubles every 18 monthsNetwork speed doubles every 9 monthsDifference = order of magnitude per 5 years
1986 to 2000Computers: x 500Networks: x 340,000
2001 to 2010Computers: x 60Networks: x 4000
Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
4
19
The GridTo treat CPU cycles and software like commodities.
Napster on steroids.
Enable the coordinated use of geographically distributed resources – in the absence of central control and existing trust relationships. Computing power is produced much like utilities such as power and water are produced for consumers.Users will have access to “power” on demand “When the Network is as fast as the computer’s internal links, the machine disintegrates across the Net into a set of special purpose appliances”
Gilder Technology Report June 2000
20
Computational Grids and Electric Power Grids
Why the Computational Grid is like the Electric Power Grid
Electric power is ubiquitousDon’t need to know the source of the power (transformer, generator) or the power company that serves it
Why the Computational Grid is different from the Electric Power Grid
Wider spectrum of performance Wider spectrum of servicesAccess governed by more complicated issues
Security Performance Socio-political factors
21
What is Grid Computing? Resource sharing & coordinated problem
solving in dynamic, multi-institutional virtual organizations
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
IMAGING INSTRUMENTS
COMPUTATIONALRESOURCES
LARGE-SCALE DATABASES
DATA ACQUISITION ,ANALYSIS
ADVANCEDVISUALIZATION
22
The Computational Grid is…
…a distributed control infrastructure that allows applications to treat compute cycles as commodities.Power Grid analogy
Power producers: machines, software, networks, storage systemsPower consumers: user applications
Applications draw power from the Grid the way appliances draw electricity from the power utility.
SeamlessHigh-performanceUbiquitousDependable
23
Introduction
Grid systems can be classified depending on their usage:
ComputationalGrid
GridSystems
Collaborative
Data Grid
ServiceGrid
HighThroughput
On Demand
Multimedia
DistributedSupercomputing
24
Introduction
Computational Grid:denotes a system that has a higher aggregate capacity than any of its constituent machine it can be further categorized based on how the overall capacity is used
Distributed Supercomputing Grid:executes the application in parallel on multiple machines to reduce the completion time of a job
5
25
Introduction
Grand challenge problems typically require a distributed supercomputing Grid – one of the motivating factors of early Grid research –still driving in some quartersHigh throughput Grid:
increases the completion rate of a stream of jobs arriving in real timeASIC or processor design verifications tests would be run on a high throughput Grid
26
Introduction
Data Grid: systems that provide an infrastructure for synthesizing new information from data repositories such as digital libraries or data warehousesapplications for these systems would be special purpose data mining that correlates information from multiple different high volume data sources
27
Introduction
Service Grid:systems that provide services that are not provided by any single machinesubdivided based on the type of service they provide
collaborative Grid:connects users and applications into collaborative workgroups -- enable real time interaction between humans and applications via a virtual workspace
28
Introduction
Multimedia Grid:provides an infrastructure for real time multimedia applications -- requires the support quality of service across multiple different machines whereas a multimedia application on a single dedicated machine can be deployed without QoSsynchronization between network and end-point QoS
29
Introduction
demand Grid:category dynamically aggregates different resources to provide new servicesdata visualization workbench that allows a scientist to dynamically increase the fidelity of a simulation by allocating more machines to a simulation would be an example
30
The Grid
6
31
The Grid Architecture Picture
Resource Layer
High speed networks and routers
Computers Data bases Online instruments
Service Layers
User Portals
Authentication
Co- Scheduling
Naming & Files Events
Grid Access & InfoProblem SolvingEnvironments
Application SciencePortals
Resource Discovery& Allocation Fault Tolerance
Software
32
Atmospheric Sciences Grid
Real time data
Data Fusion
General Circulation model
Regional weather model
Photo-chemical pollution model Particle dispersion model
TopographyDatabase
TopographyDatabase
VegetationDatabase
VegetationDatabaseBushfire modelEmissions
InventoryEmissions Inventory
33
Standard Implementation
GASS
Real time data
Data Fusion
General Circulation model
Regional weather model
Photo-chemical pollution model Particle dispersion model
TopographyDatabase
TopographyDatabase
VegetationDatabase
VegetationDatabaseEmissions
InventoryEmissions Inventory
MPIMPI
MPI
GASS/GridFTP/GRC
MPI
MPI
Bushfire model GASS
Change Models
34
Challenges in Grid Computing
Reliable performanceTrust relationships between multiple security domainsDeployment and maintenance of grid middleware across hundreds or thousands of nodesAccess to data across WAN’sAccess to state information of remote processesWorkflow / dependency managementDistributed software and license managementAccounting and billing
35
Basic Grid Architecture
Clusters and how grids are different than clustersDepartmental Grid ModelEnterprise Grid ModelGlobal Grid Model
36
Advantages of Grid Computing
Use Resources Scattered Across the World• Access to more computing power• Better access to data• Utilize unused cyclesComputing at a new level of complexity and scaleIncreased Collaboration across Virtual Organizations (VO )• groups of organizations that use the Grid to share
resources
7
37
Examples of Grids
TeraGrid – NSF funded linking 5 major research sites at 40 Gbs (www.teragrid.org)European Union Data Grid – grid for applications in high energy physics, environmental science, bioinformatics (www.eu-datagrid.org)Access Grid – collaboration systems using commodity technologies (www.accessgrid.org)Network for Earthquake Engineering Simulations Grid - grid for earthquake engineering (www.nees.org)
38
Teragrid Network
39 40
...and enabling broad based collaborations
GloriadNLR UltraScience NetTeraGrid
41
Grid Possibilities
A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour1,000 physicists worldwide pool resources for petaflop analyses of petabytes of dataCivil engineers collaborate to design, execute, & analyze shake table experimentsClimate scientists visualize, annotate, & analyze terabyte simulation datasetsAn emergency response team couples real time data, weather model, population data
42
Some Grid Usage Models
Distributed computing: job scheduling on Grid resources with secure, automated data transferWorkflow: synchronized scheduling and automated data transfer from one system to next in pipeline (e.g. compute-viz-storage)Coupled codes, with pieces running on differnet systems simultaneouslyMeta-applications: parallel apps spanning multiple systems
8
43
Grid Usage Models
Some models are similar to models already being used, but are much simpler due to:
single sign-onautomatic process schedulingautomated data transfers
But Grids can encompass new resources likes sensors and instruments, so new usage models will arise
44
Example Application Projects
Earth Systems Grid: environment (US DOE)EU DataGrid: physics, environment, etc. (EU)EuroGrid: various (EU)Fusion Collaboratory (US DOE)GridLab: astrophysics, etc. (EU)Grid Physics Network (US NSF)MetaNEOS: numerical optimization (US NSF)NEESgrid: civil engineering (US NSF)Particle Physics Data Grid (US DOE)
45
Some Grid Requirements –Systems/Deployment Perspective
Identity & authenticationAuthorization & policyResource discoveryResource characterizationResource allocation(Co-)reservation, workflowDistributed algorithmsRemote data accessHigh-speed data transferPerformance guaranteesMonitoring
AdaptationIntrusion detectionResource managementAccounting & paymentFault managementSystem evolutionEtc.Etc.
46
Some Grid Requirements –User Perspective
Single allocation: if any at allSingle sign-on: authentication to any Grid resources authenticates for all othersSingle compute space: one scheduler for all Grid resourcesSingle data space: can address files and data from any Grid resourcesSingle development environment: Grid tools and libraries that work on all grid resources
47
Programming & Systems ChallengesThe programming problem
Facilitate development of sophisticated applnsFacilitate code sharingRequires prog. envs: APIs, SDKs, tools
The systems problemFacilitate coordinated use of diverse resourcesFacilitate infrastructure sharing: e.g., certificate authorities, info servicesRequires systems: protocols, servicesE.g., port/service/protocol for accessing information, allocating resources
48
The Systems Challenges:Resource Sharing Mechanisms That…
Address security and policy concerns of resource owners and usersAre flexible enough to deal with many resource types and sharing modalitiesScale to large number of resources, many participants, many program componentsOperate efficiently when dealing with large amounts of data & computation
9
49
The Security Problem
Resources being used may be extremely valuable & the problems being solved extremely sensitiveResources are often located in distinct administrative domains
Each resource may have own policies & proceduresThe set of resources used by a single computation may be large, dynamic, and/or unpredictable
Not just client/serverIt must be broadly available & applicable
Standard, well-tested, well-understood protocolsIntegration with wide variety of tools
50
The Resource Management Problem
Enabling secure, controlled remote access to computational resources and management of remote computation
Authentication and authorizationResource discovery & characterizationReservation and allocationComputation monitoring and control
51
Grid Systems Technologies
Systems and security problems addressed by new protocols & services. E.g., Globus:
Grid Security Infrastructure (GSI) for securityGlobus Metadata Directory Service (MDS) for discoveryGlobus Resource Allocations Manager (GRAM) protocol as a basic building block
Resource brokering & co-allocation services
GridFTP, IBP for data movement
52
The Programming Problem
How does a user develop robust, secure, long-lived applications for dynamic, heterogeneous, Grids?Presumably need:
Abstractions and models to add to speed/robustness/etc. of developmentTools to ease application development and diagnose common problemsCode/tool sharing to allow reuse of code components developed by others
53
Grid Programming Technologies
“Grid applications” are incredibly diverse (data, collaboration, computing, sensors, …)
Seems unlikely there is one solution
Most applications have been written “from scratch,” with or without Grid servicesApplication-specific libraries have been shown to provide significant benefitsNo new language, programming model, etc., has yet emerged that transforms things
But certainly still quite possible
54
Examples of GridProgramming Technologies
MPICH-G2: Grid-enabled message passingCoG Kits, GridPort: Portal construction, based on N-tier architecturesGDMP, Data Grid Tools, SRB: replica management, collection managementCondor-G: simple workflow managementLegion: object models for Grid computingNetSolve: Network enabled solverCactus: Grid-aware numerical solver framework
Note tremendous variety, application focus
10
55
MPICH-G2: A Grid-Enabled MPI
A complete implementation of the Message Passing Interface (MPI) for heterogeneous, wide area environments
Based on the Argonne MPICH implementation of MPI (Gropp and Lusk)
Globus services for authentication, resource allocation, executable staging, output, etc.Programs run in wide area without changeSee also: MetaMPI, PACX, STAMPI, MAGPIE
www.globus.org/mpi 56
Grid Events
Global Grid Forum: working meetingMeets 3 times/year, alternates U.S.-Europe, with July meeting as major event
HPDC: major academic conferenceHPDC-11 in Scotland with GGF-8, July 2002
Other meetings includeIPDPS, CCGrid, EuroGlobus, Globus Retreats
www.gridforum.org, www.hpdc.org
57
Useful References
Book (Morgan Kaufman)www.mkp.com/grids
Perspective on Grids“The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, IJSA, 2001www.globus.org/research/papers/anatomy.pdf
All URLs in this section of the presentation, especially:
www.gridforum.org, www.grids-center.org, www.globus.org
58
Emergence of Grids
But Grids enable much more than apps running on multiple computers (which can be achieved with MPI alone)
virtual operating system: provides global workspace/address space via a single loginautomatically manages files, data, accounts, and security issuesconnects other resources (archival data facilities, instruments, devices) and people (collaborative environments)
59
Grids Are Inevitable
Inevitable (at least in HPC):leverages computational power of all available systemsmanages resources as a single system--easier for usersprovides most flexible resource selection and management, load sharingresearchers’ desire to solve bigger problems will always outpace performance increases of single systems; just as multiple processors are needed, ‘multiple multiprocessors’will be deemed so
60
Globus Grid Services
The Globus toolkit provides a range of basic Grid services
Security, information, fault detection, communication, resource management, ...
These services are simple and orthogonalCan be used independently, mix and matchProgramming model independent
For each there are well-defined APIsStandards are used extensively
E.g., LDAP, GSS-API, X.509, ... You don’t program in Globus, it’s a set of tools like Unix
11
61
Basic Grid Building Blocks
Client
Request
Agent
Choice
Computational Resources
Reply Clusters
MPP
Workstations
MPI, Condor,...
RPC-like
NetSolve – Solving computational
problems remotely
Condor –harnessing idle workstationsfor high-throughput computing
OwnerAgent
ExecutionAgent
ApplicationProcess
CustomerAgent
ApplicationProcess
ApplicationAgent
Data &ObjectFiles
CkptFiles
ObjectFiles
RemoteI/O &Ckpt
ObjectFiles
Submission Execution
IBP – Internet Backplane Protocol is middleware for managing and using remote storage.
62
Maturation of Grid ComputingResearch focus moving from building of basic infrastructure and application demonstrations to
MiddlewareUsable production environmentsApplication performanceScalability Globalization
Development, research, and integration happening outside of the original infrastructure groups
Grids becoming a first-class tool for scientific communities
GriPhyN (Physics), BIRN (Neuroscience), NVO (Astronomy), Cactus (Physics), …
63
Widespread interest from government in developing computational Grid platforms
Broad Acceptance of Grids as a Critical Platform for Computing
NSF’s CyberinfrastructureNASA’s Information Power Grid
DOE’s Science Grid64
Broad Acceptance of Grids as a Critical Platform for Computing
Widespread interest from industry in developing computational Grid platformsIBM, Sun, Entropia, Avaki, Platform, …
On August 2, 2001, IBM announced a new corporate initiative to support and exploit
Grid computing. AP reported that IBM
was investing $4 billion into building 50 computer server farms
around the world.
AVAKI
65
Grids Form the Basis of a National Information Infrastructure
TeraGrid will provide in aggregate
• 13.6 trillion calculations per second• Over 600 trillion bytes of immediately accessible data• 40 gigabit per second network speed• Provide a new paradigm for data-oriented computing
• Critical for disaster response, genomics, environmental modeling, etc.
August 9, 2001: NSF Awarded $53,000,000
to SDSC/NPACI and NCSA/Alliance
for TeraGrid
66
Distributed and Parallel Systems
Distributedsystemshetero-geneous
Massivelyparallelsystemshomo-geneous
Grid
bas
ed
Com
putin
gBe
owul
f clus
ter
Netw
ork
of w
sCl
uste
rs w
/sp
ecia
l int
erco
nnec
t
Entro
pia
ASCI
Tflo
ps(1
30 T
flop/
s)
Gather (unused) resourcesSteal cyclesSystem SW manages resourcesSystem SW adds value10% - 20% overhead is OKResources drive applicationsTime to completion is not criticalTime-shared
Bounded set of resources Apps grow to consume all cyclesApplication manages resourcesSystem SW gets in the way5% overhead is maximumApps drive purchase of equipmentReal-time constraintsSpace-shared
SETI
@ho
me
(27
Tflo
p/s)
Para
llel D
ist m
em
12
67
Basic Usage Scenarios
Grid based numerical library routines
User doesn’t have to have software library on their machine, LAPACK, SuperLU, ScaLAPACK, PETSc, AZTEC, ARPACK
Task farming applications“Pleasantly parallel” executioneg Parameter studies
Remote application executionComplete applications with user specifying input parameters and receiving output
“Blue Collar” Grid Based Computing
Does not require deep knowledge of network programmingLevel of expressiveness right for many usersUser can set things up, no “su” requiredIn use today, up to 200 servers in 9 countries
Can plug into Globus, Condor, NINF, …
68
NetSolve Network Enabled Server
NetSolve is an example of a grid based hardware/software server.Easy-of-use paramountBased on a RPC model but with …
resource discovery, dynamic problem solving capabilities, load balancing, fault tolerance asynchronicity, security, …
Other examples are NEOS from Argonne and NINF Japan.Use resources, not tie together geographically distributed resources, for a single application.
69
NetSolve: The Big Picture
AGENT(s)
AC
S1 S2
S3 S4
Client
Matlab
Mathematica
C, Fortran
Web
Schedule
Database
No knowledge of the grid required, RPC like.
IBP Depot
70
NetSolve: The Big Picture
AGENT(s)
AC
S1 S2
S3 S4
Client
Matlab
Mathematica
C, Fortran
Web
Schedule
Database
No knowledge of the grid required, RPC like.
A, BIBP Depot
71
NetSolve: The Big Picture
AGENT(s)
AC
S1 S2
S3 S4
Client
Matlab
Mathematica
C, Fortran
Web
Schedule
Database
No knowledge of the grid required, RPC like.
HandlebackIBP Depot
72
NetSolve: The Big Picture
AGENT(s)
AC
S1 S2
S3 S4
Client
Answer (C)
S2 !
Request
Op(C, A, B)
Matlab
Mathematica
C, Fortran
Web
Schedule
Database
No knowledge of the grid required, RPC like.
A, B
OP, handle
IBP Depot
13
73
NetSolve Agent
Name server for the NetSolve system.Information Service
client users and administrators can query the hardware and software services available.
Resource schedulermaintains both static and dynamic information regarding theNetSolve server components touse for the allocation of resources
Agent
74
NetSolve Agent
Resource Scheduling (cont’d):CPU Performance (LINPACK).Network bandwidth, latency.Server workload.Problem size/algorithm complexity.Calculates a “Time to Compute.” for each appropriate server.Notifies client of most appropriate server.
Agent
75
NetSolve - Load BalancingNetSolve agent :
predicts the execution times and sorts the servers
Prediction for a server based on :• Its distance over the network
- Latency and Bandwidth- Statistical Averaging
• Its performance (LINPACK benchmark)• Its workload• The problem size and the algorithm complexity
Cached data Quick estimate
workload out of date ?76
Function Based Interface.Client program embeds call from NetSolve’s API to access additional resources.Interface available to C, Fortran, Matlab, Mathematica, and Java.Opaque networking interactions.NetSolve can be invoked using a variety of methods: blocking, non-blocking, task farms, …
NetSolve Client
Client
77
NetSolve Client
Intuitive and easy to use.Matlab Matrix multiply e.g.:
A = matmul(B, C);
A = netsolve(‘matmul’, B, C);
• Possible parallelisms hidden.
Client
78
NetSolve Client
i. Client makes request to agent.
ii. Agent returns list of servers.
iii. Client tries each one in turn untilone executes successfully or list is exhausted.
Client
14
79
NetSolve - MATLAB Interface
>> define sparse matrix A>> define rhs>> [x, its] = netsolve( ‘itmeth’, ‘petsc’, A, rhs );…>> [x, its] = netsolve( ‘itmeth’, ‘aztec’, A, rhs ); >> [x, its] = netsolve( ‘solve’, ‘superlu’, A, rhs ); >> [x, its] = netsolve( ‘solve’, ‘ma28’, A, rhs );
Synchronous Call
Asynchronous Calls also available 80
NetSolve - FORTRAN Interface
parameter( MAX = 100)double precision A(MAX,MAX), B(MAX)integer IPIV(MAX), N, INFO, LWORKinteger NSINFO
call DGESV(N,1,A,MAX,IPIV,B,MAX,INFO)
Easy to ‘switch’ to NetSolve
call NETSL(‘DGESV()’,NSINFO,N,1,A,MAX,IPIV,B,MAX,INFO)
81
Hiding the Parallel Processing
User maybe unaware of parallel processing
NetSolve takes care of the starting the message passing system, data distribution, and returning the results.
82
Problem Description File
Problem Description File defines problem specification used to add functional modules to NetSolve server.Wrapper to provide binding between the NetSolve client interface and server function being integrated.Complex syntax defines input/output objects, calling sequences, libraries to link, etc…Parsed by NetSolve to create “service” program.
83
Generating New Services in NetSolve
Add additional functionalityDescribe the interface (arguments)Generate wrapperInstall into server
Java G
UI
NetSolveParser/
Compiler
@PROBLEM degsv@DESCRIPTIONThis is a linear solver for dense matrices from the LAPACKLibrary. Solves Ax=b.@INPUT 2@OBJECT MATRIX DOUBLE ADouble precision matrix@OBJECT VECTOR DOUBLE bRight hand side@OUTPUT 1@OBJECT VECTOR DOUBLE x…
ServerService
Service
Service
Service
NewService
New Service Added!
84
Problem Description Specification
Specifies the calling interface between GridSolve and the service routine
Original NetSolve problem description filesStrange notationDifficult for users to understand
Previous attempts to simplify involved GUI front-ends
In GridSolve, the format is totally re-designed Specified in a manner similar to normal function prototypesSimilar to Ninf
15
85
GridSolve Problem Description (DGESV)
SUBROUTINE dgesv(IN int N, IN int NRHS,INOUT double A[LDA][N], IN int LDA,OUT int IPIV[N], INOUT double B[LDB][NRHS],IN int LDB, OUT int INFO)
"This solves Ax=b using LAPACK"LANGUAGE = "FORTRAN"LIBS = "$(LAPACK_LIBS) $(BLAS_LIBS)"COMPLEXITY = "2.0*pow(N,3.0)*(double)NRHS"MAJOR="COLUMN"
SUBROUTINE DGESV(N,NRHS,A,LDA,IPIV,B,LDB,INFO)INTEGER INFO, LDA, LDB, N, NRHSINTEGER IPIV( * )DOUBLE PRECISION A( LDA, * ), B( LDB, * )
Original Fortran Subroutine:
GridSolve IDL Specification:
86
GridSolve Interface Definition Language
Data types: int, char, float, doubleArgument passing modes:
IN -- input only; not modifiedINOUT -- input and outputOUT -- output onlyVAROUT -- variable length output only dataWORKSPACE -- server-side allocation of workspace; not passed as part of calling sequence
Argument sizeSpecified as expression using scalar arguments, e.g. ddot(IN int n, IN double dx[n*incx], IN int incx, …All typical operators supported (+, -, *, /, etc).
87
Problem AttributesMAJOR
Row or column major; depends on the implementation of the service routine
LANGUAGELanguage in which the service routine is implemented; currently C or Fortran
LIBSAdditional libraries to be linked
COMPLEXITYTheoretical complexity of the service routine, specified in terms of the arguments
88
Building the Servicescd GridSolve/src/problemmake check
When building a new service, the server should be restarted, but thereafter it is not necessary.
For more detailed documentation, consult the manual:
cd GridSolve/doc/ugmakeghostview ug.ps
89
NetSolve:How to Install Software
ComputationalModules
NetSolve problemdescription files
NetSolveserver daemonClient
stubs
• User can install new components• Problem description files• Java applet to generate them
Javaapplet
90
NetSolve: How It Works
ComputationalModules
NetSolve problemdescription files
NetSolveserver daemonClient
stubsRegister
Query
Reply
Request
• Problem description files• Client download stubs at run-time• Problem description files are portable• Java applet to generate them
16
91
Task Farming -Multiple Requests To Single Problem
A Solution:Many calls to netslnb( ); /* non-blocking */
Farming Solution:Single call to netsl_farm( );
Request iterates over an “array of input parameters.”
Adaptive scheduling algorithm.
Useful for parameter sweeping, and independently parallel applications.
92
Data Persistence
Chain together a sequence of NetSolve requests.Analyze parameters to determine data dependencies. Essentially a DAG is created where nodes represent computational modules and arcs represent data flow.Transmit superset of all input/output parameters and make persistent near server(s) for duration of sequence execution.Schedule individual request modules for execution.
93
netsl(“command1”, A, B, C);netsl(“command2”, A, C, D);netsl(“command3”, D, E, F);
Client Server
command1(A, B)
result C
Client Server
command2(A, C)
result D
Client Server
command3(D, E)
result F
netsl_begin_sequence( );netsl(“command1”, A, B, C);netsl(“command2”, A, C, D);netsl(“command3”, D, E, F);netsl_end_sequence(C, D);
Client Server
sequence(A, B, E)
Server
Client Serverresult F
input A,intermediate output C
intermediate output D,input E
Data Persistence (cont’d)
94
NetSolve Authenticationwith Kerberos
Kerberos used to maintain Access Control Lists and manage access to computational resources.NetSolve properly handles authorized and non-authorized components together in the same system.
95
NetSolve Authentication with Kerberos
NetSolve client
NetSolve agent
NetSolve servers
KerberosKDC
Servers registertheir presencewith the agent
and KDC
Client issues problem request;Agent responds with list of servers
Client sends work request to server; server replies requesting
authentication credentials
Client requests ticket from KDC
Client sends ticket and input to server; server authenticates and
returns the solution set
96
Server Software Repository
Dynamic downloading of new software.Enhance servers capabilities without shutdown and restart.Repository maintained independently of server.
Hardware SoftwareHardware Software
NetSolve Server
17
97
NetSolve: A Plug into the Grid
NetSolve
C Fortran
Globusproxy
NetSolveproxy
Ninfproxy
Condorproxy
Gridmiddleware
Resource Discovery
System Management Resource Scheduling
Fault Tolerance
98
NetSolve: A Plug into the Grid
NetSolve
C Fortran
Globus NetSolveservers
Ninfservers
NetSolveservers
Condor
NetSolveservers
Globusproxy
NetSolveproxy
Ninfproxy
Condorproxy
Grid back-ends
Gridmiddleware
Resource Discovery
System Management Resource Scheduling
Fault Tolerance
99
NetSolve: A Plug into the Grid
NetSolve
C Fortran
Matlab Mathematica Custom
Globus NetSolveservers
Ninfservers
NetSolveservers
Condor
NetSolveservers
Globusproxy
NetSolveproxy
Ninfproxy
Condorproxy
PSEfront-ends
Grid back-ends
SCIRun
Gridmiddleware
Remote procedure call
Resource Discovery
System Management Resource Scheduling
Fault Tolerance
100
•UCSD (F. Berman, H. Casanova, M. Ellisman), Salk Institute (T. Bartol), CMU (J. Stiles), UTK (Dongarra, R. Wolski)•Study how neurotransmitters diffuse and activate receptors in synapses•blue unbounded, red singly bounded, green doubly bounded closed,yellow doubly bounded open
NPACI Alpha Project - MCell: 3-D Monte-Carlo Simulation of Neuro-Transmitter Release in Between Cells
101
•Developed at: Salk Institute, CMU•In the past, manually run on available workstations•Transparent Parallelism, Load balancing, Fault-tolerance•Fits the farming semantic and need for NetSolve•Collaboration with AppLeS Project for scheduling tasks
Scrip
……
…...
Scrip
……
…...
Scrip
……
…...
Scrip
……
…...Scrip
……
…...Scrip
……
…...
Scrip
……
…...
Scrip
……
…...
Scrip
……
…...
Scrip
……
…...
script
List of seeds
MCell: 3-D Monte-Carlo Simulation of Neuro-Transmitter Release in Between Cells
102
SCIRun torso defibrillator application –Chris Johnson, U of Utah
Netsolve and SCIRun
18
103
IPARSIntegrated Parallel Accurate Reservoir Simulator.TICAM of UT, Austin, Director, Dr. Mary Wheeler.Portable and Modular reservoir simulator.Models waterflood, black oil, compositional, well management, recovery process …Reservoir and Environmental Simulation.
models black oil, waterflood, compositions3D transient flow of multiple phase
Integrates Existing Simulators.Framework simplified development
Provides solvers, handling for wells, table lookup.Provides pre/postprocessor, visualization.
Full IPARS access without Installation.IPARS Interfaces:
C, FORTRAN, Matlab, Mathematica, and Web.
104
Integrated Parallel Accurate Reservoir Simulator. Mary Wheeler’s group, UT-Austin
Reservoir and Environmental Simulation.models black oil, waterflood, compositions3D transient flow of multiple phase
Integrates Existing Simulators.Framework simplified development
Provides solvers, handling for wells, table lookup.Provides pre/postprocessor, visualization.
Full IPARS access without Installation.IPARS Interfaces:
C, FORTRAN, Matlab, Mathematica, and Web.
WebServer
NetSolveClient
IPARS-enabledServers
WebInterface
105
WebServer
NetSolveClient
IPARS-enabledServers
WebInterface
NetSolve server post-processing for visualization.Possible rendering of visualization via the internet using web browsers.
106
University of Tennessee Deployment: Scalable Intracampus Research Grid SInRG
Federated Ownership: CS, ChemEng., Medical School, Computational Ecology, El. Eng.Real applications, middleware development, logistical networking
The Knoxville Campus has two DS-3 commodity Internet connections and one DS-3 Internet2/Abilene connection. An OC-3 ATM link routes IP traffic between the Knoxville campus, National Transportation Research Center, andOak Ridge National Laboratory. UT participates in several national networking initiatives including Internet2 (I2),Abilene, the federal Next Generation Internet (NGI) initiative, Southern Universities Research Association (SURA)Regional Information Infrastructure (RII), and Southern Crossroads (SoX).
The UT campus consists of a meshed ATM OC-12 being migrated over to switched Gigabit by early 2002.
107
NetSolve Monitor
http://anaka.cs.utk.edu:8080/monitor/signed.html
108
Demo1 – Blocking Calls
This demo runs through 3 calls, sorting, solving a system of linear equations, and finding the eigenvalues of a matrix.
[c] = netsolve('dqsort',b);[x,y,z,info]=netsolve('dgesv',a,b); [a,wr,wi,vl,vr,info]=netsolve('dgeev','N','V',a);
This will invoke a quick sort algorithm and the LAPACK routines for Ax=b and Ax=lx.
It has one input, the size of the problem.
19
109
Demo2 – Non-Blocking Calls
This example shows a non-blocking call to NetSolve.
[rr1]=netsolve_nb('send','dgesv',a1,b1);while status1 < 0, [status1]=netsolve_nb('probe',rr1);
end [aa1,ipiv1,x1,info]=netsolve_nb('wait',rr1);
110
Demo7 – Sparse Matrix
This example solves a sparse matrix problem using the SuperLU, MA28, PETSc, and AZTEC.
[x]=netsolve('sparse_direct_solve', 'SUPERLU',A,rhs,0.3,1);
[x]=netsolve('sparse_direct_solve', 'MA28',A,rhs,0.3,1);
[x,its]=netsolve('sparse_iterative_solve', 'PETSC',A,rhs,1.e-6,500);
[x,its]=netsolve('sparse_iterative_solve', 'AZTEC',A,rhs,1.e-6,500);
111
Demo3 – SuperLU
This example solves a sparse matrix problem using the SuperLU software from Sherry Li and Jim Demmel at Berkeley.
[x]=netsolve('sparse_direct_solve', 'SUPERLU',A,rhs,0.3,1);
112
Demo4 – MA28
This example solves a sparse matrix problem using the MA28 software from Harwell-Rutherford Library
[x]=netsolve('sparse_direct_solve', 'MA28',A,rhs,0.3,1);
113
Demo5 - PETSc
This example solves a sparse matrix problem using the PETSc software from Argonne National Lab.Parallel processing is used in NetSolve to solve the problem.
[x,its]=netsolve('sparse_iterative_solve', 'PETSC',A,rhs,1.e-6,500);
114
Demo6 - AZTEC
This example solves a sparse matrix problem using the AZTEC software from Sandia National Lab.Parallel processing is used in NetSolve to solve the problem.
[x,its]=netsolve('sparse_iterative_solve', 'AZTEC',A,rhs,1.e-6,500);
20
115
Things Not Touched On
Hierarchy of AgentsMore scalable configuration
Monitor NetSolve NetworkTrack and monitor usage
Network statusNetwork Weather Service
Internet Backplane Protocol Middleware for managing and using remote storage.
Fault ToleranceVolker Strumpen’s Porch
Local / Global ConfigurationsAutomated Adaptive Algorithm Selection
Dynamic determine the nest algorithm based on system status and nature of user problem
116
Thanks
Fran Berman Director, San Diego Supercomputer Center
Jay BoisseauDirector, Texas Advanced Computing Center