Upload
phamdan
View
214
Download
0
Embed Size (px)
Citation preview
HP CCN Meeting, Seattle, November 12, 2005
CCN Grid Collaboration
Lennart JohnssonUniversity of Houston
(CS and TLC2)and
Kungl. Tekniska Hogskolan(NADA and PDC)
HP CCN Meeting, Seattle, November 12, 2005
CCN Grid CollaborationObjectives
• A SIG focused on exchange of experiences?
• A Collaboration focused on (rapidly) maturing and evolving Grid technologies?
• A Collaboration focused on demonstrating the utility of Grids?
HP CCN Meeting, Seattle, November 12, 2005
CCN Grid
• What can CCN Grid collaboration provide that I do not already have?
• Is CCN Grid the best vehicle to get what I want and do not have?
• How can CCN Grid make “me” more competitive?
HP CCN Meeting, Seattle, November 12, 2005
“My” Current Grid Activities• CCN Grid• GGF• Globus Alliance• VGrADS• THEGrid• TIGRE• RENoH
• LCG - Alice • EGEE• NextGrid• SweGrid• Baltic Grid• ICEAGE• DEISA?• OMII-Europe?
HP CCN Meeting, Seattle, November 12, 2005
VGrADS is an NSF-funded Information Technology Research project
Keith CooperKen Kennedy
Charles KoelbelRichard TapiaLinda Torczon
Rich Wolski Fran BermanAndrew ChienHenri Casanova
Carl Kesselman
LennartJohnsson
Dan Reed Jack Dongarra
Plus many graduate students, postdocs, and technical staff!
HP CCN Meeting, Seattle, November 12, 2005
The VGrADS Vision: Distributed Problem Solving• Where We Want To Be
– Transparent Grid computing• Submit job• Find & schedule resources• Execute efficiently
• Where We Are– Low-level hand programming– Programmer needs to manage
• Heterogeneous resources• Computation and data movement scheduling• Fault tolerance and performance adaptation
• What Do We Need?– A more abstract view of the Grid
• Each developer sees a scalable “virtual grid”– Simplified programming models built on the abstract view
• Permit the application developer to focus on the problem
Database
SupercomputerSupercomputerDatabase
SupercomputerSupercomputer
HP CCN Meeting, Seattle, November 12, 2005
Virtual Grid Execution System (vgES)
ApplicationApplication
vgES APIs
vgMON
vgDL
InformationServices
ResourceManagers
vgLAUNCH
vgFABVG
VG
VGVG
DVCW
vgAgent
GridResources
vgDL Description
Virtual Grid
Successfully BoundCandidates
Grid ResourceUniverse
• A Virtual Grid (VG) takes– Shared heterogeneous resources– Scalable information service
• and provides– An hierarchy of application-defined
aggregations (e.g. ClusterOf) with constraints (e.g. processor type) and rankings
• Virtual Grid Execution System (vgES) implements VG
– VG Definition Language (vgDL)– VG Find And Bind (vgFAB)– VG Monitor (vgMON)– VG Application Launch
(VgLAUNCH+DVCW)– VG Resource Info (vgAgent)
HP CCN Meeting, Seattle, November 12, 2005
• Scheduling of workflow computations
– Off-line look-ahead scheduling dramatically improves in makespan(total time)
– Accurate performance models significantly affect quality of scheduling
– Queue wait prediction allows scheduling into batch queues
• Fault tolerance– Diskless checkpointing for linear
algebra computations (application-specific)
– Temporal reasoning for fault prediction
– Optimal checkpoint frequency for iterative applications
CF=1 CF=10
CF=100
Offline
Online0
5000
10000
15000
20000
25000
30000
35000
40000
45000
Sim
ula
ted
Makesp
an
Compute Factors
Scheduling Strategy
Online vs. Offline - Heterogeneous Platform
NoneSimple
Accurate
Heuristic
Random
0
200
400
600
800
1000
1200
Tim
e (m
in)
Performance Model
Scheduler
Performance Models and Schedulers - Heterogeneous Platforms
P0 P1
P3P2P4
P4 = P0 P1 P2 P3
Parityprocessor
Applicationprocessors
Diskless Checkpointing
VGrADS is studying a range of tools for grid programming tasks, including
HP CCN Meeting, Seattle, November 12, 2005
VGrADS Application Collaborations
EMANElectron Micrograph
Analysis
GridSATBoolean Satisfiability
BPEL Workflow Engine
LDM Service
GridFTP Service
WRF Service
vgES
Information Service
Rice SchedulerEnsemble
BrokerVisualization
Service
Data arrive
s
ResourceBroker
Data Mining
Start
End
Static Workflow
DynamicWorkflow
LEADAtmospheric Science
MontageAstronomy
HP CCN Meeting, Seattle, November 12, 2005
THEGrid• Several Universities
– UT, UH, Rice, TTU, TAMU, UTA, UTB, UTEP, SMU, UTD, etc• Many different research facilities used
– Fermi National Accelerator Laboratory– CERN, Switzerland, DESY, Germany, and KEK, Japan– Jefferson Lab– Brookhaven National Lab– SLAC, CA and Cornell– Natural sources and underground labs
• Sizable community, variety of experiments and needs• Very large data sets now! Even larger ones coming!!
HP CCN Meeting, Seattle, November 12, 2005
High Performance Computing Across Texas (HiPCAT) — http://www.hipcat.net
TIGRE - Texas Internet Grid for Research and Education
Research areas of particular interest
biomedicine, energy and the environment, materials science, agriculture, and information technology
HP CCN Meeting, Seattle, November 12, 2005
TIGRE Activities• Assembling a software stack– Start small, add by concensus
• Globus Toolkit 4.0, pre-Web Services and Web Services• GSI OpenSSH• UberFTP• Condor-G
– Will make available to other HiPCAT institutions• Amassing resources
– Allocated by TIGRE institutions• Lonestar (UT Austin): 1024 Xeons + Infiniband + GigE• Hrothgar (Texas Tech): 256 Xeons + Infiniband + GigE• Cosmos (Texas A&M): 128 Itaniums + Numalink• Rice Terascale Cluster: 128 Itaniums + GigE• Eldorado (Houston): 124 Itaniums + GigE + SCI• plus several smaller systems
– Will incorporate other institutions as appropriate to applications
HP CCN Meeting, Seattle, November 12, 2005
RENoH
San Antonio
College StationDallas
El PasoLos Angeles
Kansas CityChicago
Baton RougeJacksonville
LEARN
24 strands
12 strands
NLR
HP CCN Meeting, Seattle, November 12, 2005
HP CCN Meeting, Seattle, November 12, 2005
Wiltel Fiber in the West -- AT&T Fiber in the Southeast
Denver
Seattle
Sunnyvale
LA
San Diego
Chicago Pitts
Wash DC
Raleigh
Jacksonville
Atlanta
KC
Baton Rouge
El Paso -Las Cruces
Phoenix
Pensacola
Dallas
San Ant. Houston
Albuq. Tulsa
New YorkClev
HP CCN Meeting, Seattle, November 12, 2005
ESne
t Sc
ienc
e Dat
a Net
work
(S
DN) co
re
TWC
SNLL
YUCCA MT
BECHTEL-NV
PNNLLIGO
INEEL
LANL
SNLAAlliedSignal
PANTEX
ARM
KCP
NOAA
OSTI ORAU
SRS
JLAB
PPPLINEEL-DCORAU-DC
LLNL/LANL-DC
MIT
ANL
BNL
FNALAMES
4xLAB-DC
NR
EL
LLNL
GA
DOE-ALB
GTN&NNSA
International (high speed)10 Gb/s SDN core10G/s IP core2.5 Gb/s IP coreMAN rings (≥ 10 G/s)OC12 ATM (622 Mb/s)OC12 / GigEthernetOC3 (155 Mb/s)45 Mb/s and less
Office Of Science Sponsored (22)NNSA Sponsored (12)Joint Sponsored (3)Other Sponsored (NSF LIGO, NOAA)Laboratory Sponsored (6)
QWESTATM
42 end user sites
ESnet IP core
SINet (Japan)Japan – Russia (BINP)CA*net4 France
GLORIAD Kreonet2MREN NetherlandsStarTap TANet2Taiwan (ASCC)
AustraliaCA*net4Taiwan(TANet2)
Singaren
ESnet IP core: Packet over SONET Optical Ring and Hubs
ELP HUB
ATL HUB
DC HUB
peering points
MAE-E
PAIX-PAEquinix, etc.
PNW
GPo
P
SEA HUB
ESnet Summer 2005
IP core hubsSNV HUB
Abilene high-speed peering points
Abilene
Abile
ne
CERN(LHCnet – partDOE funded)
GEANT- Germany, France, Italy, UK, etc
NYC HUB
Starlight
Chi NAP
CHI-SL HUB
SNV HUB
Abilene
SNV SDN HUB
JGILBNL
SLACNERSC
SND core hubs
SDSC HUB
Equinix
MAN
LAN
Abile
ne
MAXGPoP
SoXGPoP
SNV SDN HUB
ALB HUB
ORNL
CHI HUB
Not Houston, but wait …There is LEARN
HP CCN Meeting, Seattle, November 12, 2005
LHC Computing Grid (LCG)
Truly heterogeneous system:People, languages, time zones…Complex collaborative effort
LCG prototype service (2003-05)
Truly heterogeneous system:People, languages, time zones…Complex collaborative effort
LCG prototype service (2003-05)
Yerevan
Saclay
Lyon
Dubna
Capetown, ZA
Birmingham
Cagliari
NIKHEF
Catania
BolognaTorino
PadovaIRB
Kolkata, India
OSU/OSCLBL/NERSC
Merida
Bari
Nantes
Houston
RAL
CERN
KrakowGSIBudapestKarlsruhe
Yerevan
Saclay
Lyon
Dubna
Capetown, ZA
Birmingham
Cagliari
NIKHEF
Catania
BolognaTorino
PadovaIRB
Kolkata, India
OSU/OSCLBL/NERSC
Merida
Bari
Nantes
Houston
RAL
CERN
KrakowGSIBudapestKarlsruhe
Yerevan
Saclay
Lyon
Dubna
Capetown, ZA
Birmingham
Cagliari
NIKHEF
Catania
BolognaTorino
PadovaIRB
Kolkata, India
OSU/OSCLBL/NERSC
Merida
Bari
Nantes
Houston
RAL
CERN
KrakowGSIBudapestKarlsruhe
ALICE Physics production US-ATLAS EDG NorduGridDC1: DC1: DC1:
Part of simulation; several tests full productionPile-up; reconstruction (1st test in August02)
Grid in ATLAS DC1 (July 2002 – April 2003)
HP CCN Meeting, Seattle, November 12, 2005
EGEE
ResourceCenter
(Processors, disks)
Grid server Nodes
ResourceCenter
ResourceCenter
ResourceCenter
OperationsCenter
Regional SupportCenter
(Support for ApplicationsLocal Resources)
Regional Support
Regional Support
Regional Support
HP CCN Meeting, Seattle, November 12, 2005
Baltic Grid• KTH• EENet, Tartu• NICPB, Tallinn• IMCS University of Latvia, Riga• RTU – Riga Technical University• Vilnius University• ITPA, Vilnius• IFJPAN, Cracow• PSNC, Poznan• CERN
• Heterogeneous, IA32, IA64(1537 CPU, 29 clusters)
• EGEE/LCG-2/gLite, ARC• SGAS – Grid Accounting
CERN
HP CCN Meeting, Seattle, November 12, 2005
Baltic Grid • Education,training, dissemination and outreach -
IFJPAN lead• Application Identification and Support – VU lead• Policy and Standards – KTH lead• Grid Operations – EENet lead• Network Resource Provisioning – IMCS UL lead• SLAs and account management joint research –
KTH lead
HP CCN Meeting, Seattle, November 12, 2005
Six 100 node, single CPU clusters with GigEinterconnect
WAN – Sunet 10GE
Middleware: EGEE, LCG-2, g-Lite, ARC
SGAS – Grid accounting
10 GE
2.5 GE
HP CCN Meeting, Seattle, November 12, 2005
JSCC RAS
CSCPDC
ECMWF
U Manchester
FZJ
SARAEPCC
HLRS
CINECA
CASPUR ENEA
IDRIS
BSC
RZGLRZ
CINES CSCS
DEISA Sites
HP CCN Meeting, Seattle, November 12, 2005
DEISA Top500 Capacity June 2005Site Location Top500 Linpack (TF)BSC Barcelona, Spain 27.91CINECA Bologna, Italy 6.62CSC Helsinki, Finland 1.17ECMWF Reading, UK 18.48EPCC/HPCx Daresbury, UK 6.19FZJ Julich, Germany 10.28HLRS Stuttgart, Germany 8.92IDRIS Orsay, France 3.11LRZ Munich, Germany 1.65RZG Garching, Germany 2.74SARA Amsterdam, Holland 4.16Total 91.23Total “Public” Europe 187.14
HP CCN Meeting, Seattle, November 12, 2005
DEISA• Dedicated network through GEANT• Global File System• Support of Workflow applications• Global data management• Co-Scheduling services• Portals and Web services• Extreme Computing Inititative (DECI)
HP CCN Meeting, Seattle, November 12, 2005
CCN Grid Objectives• Maturing Grid technologies
– Security– Interoperability– Accounting– (Performance) Monitoring– ……..
• Demonstrating utility of Grids– Persistency/Availability– Scale– Diversity/Uniqueness of resources– ……..
HP CCN Meeting, Seattle, November 12, 2005
What to do?• Each organization has its own policies for access and
reporting • can we agree on a common application and reporting mechanism
and format? • what information about users can we share, to keep sponsors
happy, or the “FBI” when required, or auditors, or …?
• Monitoring and accounting• Software environments and tools• MoUs – roles of engagement• Need driving applications (data, collaboration,
computing, …) • Need goals and timelines (aligned with funded activities)