Upload
decima
View
32
Download
5
Tags:
Embed Size (px)
DESCRIPTION
CSG Research Computing Jim Pepin USC CTO/Director HPCC. HPCC. Provide common facilities and services for a large cross section of the university that requires leading edge computational and networking resources. Leverage USC central resources with externally funded projects. Overview. - PowerPoint PPT Presentation
Citation preview
CSG Research Computing
Jim PepinUSC
CTO/Director HPCC
HPCC
Provide common facilities and services for a large cross section of the university that requires leading edge computational and networking resources.
Leverage USC central resources with externally funded projects.
Overview
Sponsored by ISD (Information Services Division of USC) and ISI (Information Sciences Institute) User community
ISI LAS Engineering School of Medicine IMSC ICT Others
Current Resources
High Performance Computing Resources Linux Cluster (~1000nodes/2000cpus, 2Gb/sec
Myrinet) 20TB shared disk, 18GB - 40GB local disk per node. Ranks in top 10 for academic clusters.
Myrinet switch is 768 nodes. Adding nodes funded by USC research groups. Sun Core Servers (E15k shared memory)
72 processors, 288GB memory, 30TB shared disk Mass Storage Facilities (Unitree)
18,000 tape capacity
Funding Sources
ISD (University) Resources 1.5M M/S and Equipment budget
Software/Maintenance .4M Generic capital 1.0M Other .1M
3 FTEs direct support 2 FTEs system staff offset
Los Nettos/LAAP 2.0M
Condo Arrangements 50k-250k one off capital purchases
Cluster Power Usage Math
42 nodes/cabinet 200 watts/node. 8.4Kw/cabinet 1000 nodes 24 cabinets 1 control cabinet per 8 cabinets of compute servers 8 control cabinets 32 cabinets per 100 nodes
268Kw per 1000 nodes 100 Tons of a/c per 1000 nodes Roughly 400KW total power use for 1000 nodes 1500-2000 sq feet of space.
Current Software
Cluster software from IBM (xcat) is core of facility. Stable production environment. MPI is basic message passing Globus/NMI work is proceeding with Carl’s help in funding plus ISD
resources. Leverages with campus need for global directory More later.
Solaris and Unitree are core for Mass Storage support. We need to look at other mass storage opportunities.
Issues We need to be able to support faculty/researchers with tools and
consulting to help them effectively use large-scale resources. Many packages exist on HPCC resources, with no local support to
help use them.
“Middleware”
Globus as base with NMI architecture for campus. GT2 moving to GT3
SCEC/ISI Condor as lightweight job manager in user rooms PBS/Maui on Cluster and Computation side of E15k
Issues Kx509 bridge from Kerberos USC PKI lite CA is base.
Only hosts and services. NMI based.
Pubcookie (Kerberos back-end) Uses host certs from PKI lite CA
Shib for some prototype library apps (scholar’s portal) Campus GDS/PR using NMI schemes (eduperson etc)
HPCC Governance
HPCC faculty advisory group Meets 4-5 times a year Provides guidance to DCIO and CTO “Final” Decisions are in ISD (CIO/DCIO) Usual mode is agreement
Time allocation No recharge Large project reviewed by faculty allocation group
Some projects over 500k node hours Condo users get dedicated nodes and cost sharing
Research leverage Condo Cost sharing External funding Grid construction
Next generation network
CTO/HPCC Projects
Advanced Networking Projects Calren-2
2xGb service today . 10Gb service in next 2 years.
Fiber/wavelength services(CENIC/National Lambda Rail) Online for west coast.
Look at L2 possibilities to build shared ‘spaces’. Look to leverage for project like Optiputer ITR.
1 Wilshire colo facilities See if we can use that space to facilitate ETF proposal.
Optiputer ITR as way to help network expansion.
CTO/HPCC Projects
Leverage HPCC efforts at ISI with ISD Resources Clusters
Expand cluster to ~2000 nodes centrally owned. Expand cluster for other groups (condo model).
Mass Storage Look into large scale storage for groups like VHF project and
other high end storage needs. (fractional petabytes) Globus/NMI
Provide campus leadership for Global directory services and identity management. (authentication and authorization).
Networking Research
CTO/HPCC Projects
Fiber is a major part of the HPCC’s ability to service large scale computational needs. The following slides show what we have today and how it can be used.
Fiber Facilities
Lease dark fiber. Started with dark fiber 3 years ago.
Pioneer in this area. DWP (Department of Water and Power) USC franchise area fiber for campus access. Leverage new players (NLR/Cenic). Use for USC, LAAP and Los Nettos projects. Built-out today using low cost CWDM and 15540s.
10Gbps ethernet backbone in place Fall ‘02 Built-out fiber to Caltech/JPL/VHF(Shoah) and
other Los Nettos sites.
Fiber Facilities
Lease more dark fiber. Harvey Mudd. Build second path to USC for disaster recovery.
Install DWDM gear from CENIC deal with Cisco. 1Gb wavelengths in first phase (fall 04) 10Gb wavelengths in summer 04.
Use to enable projects like Optiputer and ETF.
Experiment with optical switching hardware as ‘fiber patch panel’ for development of shared ‘computer centers’.
Original USC Fiber Backbone
DowntownClinic
UPC
HSC
ISI
ICT
1 wilshire
Original
4 strand SM DWPfiber
External fiber plant
Caltech
818 VHF
JPL
HSC1 wilshire
UPC
ISI
ICT
Today’s Fiber and Gigaman circuits
Tustin
HMC
gigaman
fiber
Colo Facilities
Acquired space in 1 wilshire (original site).
3 years ago. DWP fiber is core. Use to connect to exchanges and others ISPs.
Extend to potentially other ‘1 Wilshire’ buildings. Use new Campus Level 3 fiber as means.
House routers and l2 equipment. Provide space on USC campus for partners
partners. Enables Pacific Wave Exchange Point.
Exchange Point/Research
Foundry Bigirons802.1q vlans
10Gb
10Gb
Gb ports100m ports
Gb ports100mb ports
Gb ports100mb ports
1 wilshire
UPC
ISI
Gb
10Gb
HSC
818 7thGb
Experimental Networking
Networking research community California Institutes for Science and Innovation
(CITRIS, CalIT2, Nano Systems, BioMedical) San Diego Super Computer Center CACR ISI Teragrid/Distributed Terascale Facility UCSB/Dan Blumenthal optical labs
Future Resource Goals
High Performance Computing Resources Linux Cluster (2048nodes/4096cpus, 2Gb/sec Myrinet)
60TB shared disk, 36GB - 72GB local disk per node. Rank in top 5 for academic clusters.
Start 64 bit nodes in summer 04. Switch fabric will expand past 1024 nodes with ability to condo other
users. Plan to add more nodes funded by USC research groups (condo) Goal
would be 3000+ nodes total. Sun Core Servers (E15k shared memory)
72 processors, 288GB memory, 300TB disk Use this system for high end data users (large scale databases) and video users.
Mass Storage Facilities (Unitree today) 18,000 tape capacity PB online as goal in 3 years.
3 Year Strategy
Next step after 32 bit pentium. Need to determine what will replace Xeons. One
answer is opteron or IA64, but we need to start to develop clusters in this space and benchmark. Much of the code will need reworking at user level.
Find ways to cost share with local cluster purchasers. “Condo” housing of medium to large clusters will be important.
Build “Grid-U”
3 Year Strategy
As cluster expand into the 2-4k node space power and A/C become significant issues (along with floor space).
We need to develop several major partners to allow HPCC to be the central piece of joint proposals from USC for such initiatives as ETF and future cyber infrastructure proposals.
Example is shared submission for Major Research Instrument grant.
3 Year Strategy
Networking Futures Expand Exchange Point (R/E, Pacific Wave) 10Gb at all sites Layer 1 facilities (Optiputer type connections) Re-design/RFP for campus network this month
Design network with ‘enclaves’ for research or academic support
Much higher internal bandwidth (10Gb core-core, at least 1Gb to all buildings 10Gb to major research centers)
How to provide comprehensive security without unacceptable friction.