38
June 2007 The State of TeraGrid: A National Production Cyberinfrastructure Facility Charlie Catlett, Chair, TeraGrid Forum University of Chicago and Argonne National Laboratory [email protected] Dane Skow, Director, TeraGrid GIG University of Chicago and Argonne National Laboratory [email protected] www.teragrid.org ©UNIVERSITY OF CHICAGO THESE SLIDES MAY BE FREELY USED PROVIDING THAT THE TERAGRID LOGO REMAINS ON THE SLIDES, AND THAT THE SCIENCE GROUPS ARE ACKNOWLEDGED IN CASES WHERE SCIENTIFIC IMAGES ARE USED. (SEE SLIDE NOTES FOR CONTACT INFORMATION)

June 2007 The State of TeraGrid: A National Production Cyberinfrastructure Facility Charlie Catlett, Chair, TeraGrid Forum University of Chicago and Argonne

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

  • Slide 1
  • June 2007 The State of TeraGrid: A National Production Cyberinfrastructure Facility Charlie Catlett, Chair, TeraGrid Forum University of Chicago and Argonne National Laboratory [email protected] Dane Skow, Director, TeraGrid GIG University of Chicago and Argonne National Laboratory [email protected] www.teragrid.org UNIVERSITY OF CHICAGO THESE SLIDES MAY BE FREELY USED PROVIDING THAT THE TERAGRID LOGO REMAINS ON THE SLIDES, AND THAT THE SCIENCE GROUPS ARE ACKNOWLEDGED IN CASES WHERE SCIENTIFIC IMAGES ARE USED. (SEE SLIDE NOTES FOR CONTACT INFORMATION)
  • Slide 2
  • June 2007 SDSC TACC UC/ANL NCSA ORNL PU IU PSC NCAR Caltech USC/ISI UNC/RENCI UW Resource Provider (RP) Software Integration Partner Grid Infrastructure Group (UChicago) 9 Resource Providers, One Facility
  • Slide 3
  • June 2007 TeraGrid Vision TeraGrid will create integrated, persistent, and pioneering computational resources that will significantly improve our nations ability and capacity to gain new insights into our most challenging research questions and societal problems. Our vision requires an integrated approach to the scientific workflow including obtaining access, application development and execution, data analysis, collaboration and data management.
  • Slide 4
  • June 2007 TeraGrid Objectives DEEP Science: Enabling Petascale Science Make Science More Productive through an integrated set of very-high capability resources Address key challenges prioritized by users WIDE Impact: Empowering Communities Bring TeraGrid capabilities to the broad science community Partner with science community leaders - Science Gateways OPEN Infrastructure, OPEN Partnership Provide a coordinated, general purpose, reliable set of services and resources Partner with campuses and facilities
  • Slide 5
  • June 2007 How Are We Doing? Scientific impact TeraGrid is enabling significant scientific advances spanning most of the NSF Directorates. The panel believes the TG is proceeding well to meet its main objectives Our team A careful evaluation of the governance of the TG led the panel to a better appreciation of the innovative methods being used to enhance collaboration and consensus building across the geographically dispersed participants in the project. Science Gateways Program impressed by the potential for Science Gateways to enable new communities and to provide new integrated technologies to a broader scientific audience. Executive Summary of NSF Site Review report, February 2007.
  • Slide 6
  • June 2007 Where Can We Improve? Communication of Impact better publicize the transformative science that has been done using the TG grid services consider a breakdown of the science nuggets, making it clear which the projects use their integrated grid resources and services, and which ones could have been done using the resources of a single RP. Education, Outreach, Training, Inclusion The panel feels that stronger outreach and education are integral in the successful integration of TG in the scientific research environment. Four recommendations: outreach to underrepresented groups, leverage EOT of other projects, expand EOT partnerships (with societies, etc.), harvest user-developed EOT materials. Understand Growth and Encourage New Communities more detailed tracking and follow-up of the DAC allocations. Document actual use of each SG, including (user and discipline based) demographics. more widespread promotion of the SGs to potential user communities. NSF Site Review report, February 2007.
  • Slide 7
  • June 2007 Critical Areas of Technical Work Security and Authorization The development and implementation of authentication and authorization models, which enable a transparent integration of current and future resources, remains an important technical challenge that must be addressed promptly. Scheduling make automated metascheduling a reality
  • Slide 8
  • June 2007 Pressing Forward: Organization Our review indicates that the current organization provides us a strong platform for moving forward. And a strong platform for exploring optimization More efficient movement from discussion to consensus and translation of consensus to action More effectively tapping strategic expertise across the project Key TeraGrid Organizational Building Blocks Persistent working groups Agile RATs Rich communications (weekly all-hands, quarterly management, etc.) RP Forum as a representative body Advisory Groups need further optimization GIG: Executive Steering Committee (ESC) Overall TeraGrid: Cyberinfrastructure User Advisory Committee (CUAC)
  • Slide 9
  • June 2007 Next Steps on Organization Governance RAT RP Forum as a basis for consensus-based democracy Understand leadership roles in RPF, GIG Cyberinfrastructure User Advisory Committee strong need for a science advisory board that could provide strategic guidance to TeraGrid Focus specifically on TeraGrid a clearly delineated mission for this panel that distinguishes its focus from those of other advisory groups associated with the TeraGrid New GIG Leadership!
  • Slide 10
  • June 2007 On a Personal Note Thank you
  • Slide 11
  • June 2007 Where Are We Now? 2006 Was a break-out year Growth by every metric New science successes First gateways into production Initial, strong adoption of new grid capabilities
  • Slide 12
  • June 2007 AllocationsFY05FY06% Change LRAC proposals awarded62 (13 new)88 (22 new) +42(+69) MRAC proposals awarded70 (50 new)160 (92 new) +129(+84) TeraGrid DAC proposals awarded123 (115 new)229 (209 new) +86(+82) Active TeraGrid PIs3611,019 +182 Usage NUs Requested (LRAC/MRAC/DAC)1.3 B2.96 B +130 NUs Awarded844 M1.92 B +128 NUs Available (max)881 M2.23 B +153 NUs Delivered (% util)565 M (64%)1.28 B (57%) +129(-11) NUs used by TG Staff10.4 M10.1 M Jobs run594,7561,686,686 +185 Users (Total) Users with active accounts during the year1,7124,190 +145 Users charging jobs during the year8761,731 +98 Users with active accounts on December 311,4683,126 +113 User Home Institutions (users charging jobs)151265 +76 US states (incl DC/PR) (users charging jobs)3747 +27 Users by Allocation Size LRAC Users (# charging jobs)509 (238)1,152 (496) +126(+108) MRAC Users (# charging jobs)542 (248)1,087 (423) +101(+71) DAC Users (# charging jobs)661 (365)1,948 (783) +195(+116)
  • Slide 13
  • June 2007 TeraGrid Resource Change Compute Additions: NCSA (Cobalt, Copper, Xeon Linux Supercluster, Condor Cluster) SDSC (DataStar p655, DataStar p690, BlueGene) Purdue (Condor+) TACC (LoneStar+) IU: (BigRed) PSC: (BigBen+) Storage Additions: GPFS-WAN: (+800TB) IU: Tape Archive SDSC: Data Collections, Database Service Retirements: PSC: (TCS1) IU: ( - IA 32 & 64) Upcoming: TACC (Ranger - Jan 2008), NCAR (Frost - Dec 2007 ?) HPCOPS NCSA (Abe - June 2007 ?) + ? 2nd Track 2 Machine ($30M) (announce Oct 2007) Track 1 Machine ($200M) (announce Oct 2007)
  • Slide 14
  • June 2007 TeraGrid Usage 33% Annual Growth Specific AllocationsRoaming Allocations 200 100 Normalized Units (millions) TeraGrid currently delivers an average of 400,000 cpu-hours per day -> ~20,000 CPUs DC Dave Hart ([email protected])
  • Slide 15
  • June 2007 TeraGrid User Community Dave Hart ([email protected])
  • Slide 16
  • June 2007 TeraGrid User Community Gateways Dave Hart ([email protected]) Growth Target
  • Slide 17
  • June 2007 TeraGrid is Like an Accelerator... Deep Dedicated Experiments Unique Machine Wide Education Center Computing Center Open Fishing Buffalo
  • Slide 18
  • June 2007 or more like Many Accelerators Not Even HEP Has Solved the Challenge of Integrating These !!
  • Slide 19
  • June 2007 Sergiu Sanielevici ([email protected]) Advanced Support for TeraGrid Applications (ASTA)
  • Slide 20
  • June 2007 Searching for New Crystal Structures Deem (Rice) Searching for new 3-D zeolite crystal structures. Database of 3.4M+ structures created in 1 year (20,000x) Were working with a major oil company to look at the structures in hopes of finding new catalysts for chemical and petrochemical applications, said Deem. This project could not have been accomplished in a one to three- year time frame without the TeraGrid. http://www.physorg.com/news85 255507.html
  • Slide 21
  • June 2007 Predicting Storms Hurricanes and tornadoes cause massive loss of life and damage to property Underlying physical systems involve highly non-linear dynamics so computationally intense Data comes from multiple sources real time derived from streams of data from sensors Archived in databases of past storms Infrastructure challenges: Data mine instrument radar data for storms Allocate supercomputer resources automatically to run forecast simulations Monitor results and retarget instruments. Log provenance and metadata about experiments for auditing. Slides Courtesy Dennis Gannon and LEAD Collaboration
  • Slide 22
  • June 2007 Experience so far First release to support WxChallenge: the new collegiate weather forecast challenge The goal: forecast the maximum and minimum temperatures, precipitation, and maximum sustained wind speeds for select U.S. cities. to provide students with an opportunity to compete against their peers and faculty meteorologists at 64 institutions for honors as the top weather forecaster in the nation. 79 users ran 1,232 forecast workflows generating 2.6TBybes of data. Over 160 processors were reserved on Tungsten from 10am to 8pm EDT(EST), five days each week National Spring Forecast First use of user initiated 2Km forecasts as part of that program. Generated serious interest from National Severe Storm Center.
  • Slide 23
  • June 2007 Solve any Rubiks Cube in 26 moves? Rubik's Cube is perhaps the most famous combinatorial puzzle of its time. > 43 quintillion states (4.3x10^19) Gene Cooperman and Dan Kunkle of Northeastern Univ. just proved any state can be solved in 26 moves. 7TB of distributed storage on TeraGrid allowed them to develop the proof URL: http://www.physorg.com/news99843195.html
  • Slide 24
  • June 2007 TeraGrid is Like a CAMAC Crate Standard instrumentation infrastructure (backplane) Wide variety of components built on top of that infrastructure Can be easily partitioned, federated and replicated CTSS v4 (6/07): Small core plus optional kits CTSS v2 (slightly smaller) CTSS v3 (add web services, even smaller)... CTSS v1 (30+ pkgs)
  • Slide 25
  • June 2007 Lower Integration Barriers; Improved Scaling Initial Integration: Implementation-based Coordinated TeraGrid Software and Services (CTSS) Provide software for heterogeneous systems, leverage specific implementations to achieve interoperation. Evolving understanding of minimum required software set for users Emerging Architecture: Services-based Core services: capabilities that define a TeraGrid Resource Authentication & Authorization Capability Information Service Auditing/Accounting/Usage Reporting Capability Verification & Validation Mechanism Significantly smaller than the current set of required components. Provides a foundation for value-added services. Each Resource Provider selects one or more added services, or kits Core and individual kits can evolve incrementally, in parallelLower User Barriers; Increase Security
  • Slide 26
  • June 2007 Use Modality Community Size (est. number of projects) Batch Computing on Individual Resources 850 Exploratory and Application Porting 650 Workflow, Ensemble, and Parameter Sweep 160 Science Gateway Access 100 Remote Interactive Steering and Visualization 35 Tightly-Coupled Distributed Computation 10 TeraGrid Usage Modes in CY2006 Grid-y Users
  • Slide 27
  • June 2007 Monthly Use of Selected Grid Capabilities January 2005 through April 2007 MyCluster CPUs MyCluster Jobs Globus GRAM Jobs QBETS Queries Globus GRAM Users Synchronous cross-site jobs
  • Slide 28
  • June 2007 DAC Roaming Behavior 2006 Data Count of resource_n ame TG DACs Total TGSUs 11431,745,314 260919,461 346664,231 416351,340 58183,271 65153,083 7164,270 813,878 916,979 10225,121 12197,774 Grand Total 2844,214,722 Analysis and Chart courtesy Dave Hart, SDSC 284 active DACs in 2006. ~10X growth !! 25%
  • Slide 29
  • June 2007 Information Services (MDS) Kit Registry
  • Slide 30
  • June 2007 Drives This
  • Slide 31
  • June 2007 And This Google Earth GIN demo http://www.physorg.com/news82811067.html
  • Slide 32
  • June 2007 TeraGrid is a Social Network TeraGrid conference is going great ! LRAC/MRAC liaisons SGW community very successful Mailing list/phonecon/Wiki Transitioning to consulting model CI Days - campus outreach OSG/Internet2/NLR/EDUCAUSE/ MSI-CIEC partnership HPC University OSG, Shodor, Krell, OSC, NCSI, MSI-CIEC partnership CI-TEAM Workshop - July 9-11 For CI-TEAM awardees and aspiring grantees - apply now! Education and Outreach Engaging thousands of people
  • Slide 33
  • June 2007 TeraGrid Science Gateways Initiative: Community Interface to Grids Common Web Portal or application interfaces (database access, computation, workflow, etc). Back-End use of TeraGrid computation, information management, visualization, or other services. 4 talks on cross-grid work at this Conference. 3 Science track.
  • Slide 34
  • June 2007 HPC University Goals Advance researchers HPC skills Search catalog of live and self-paced training Schedule series of training courses Gap analysis of materials to drive development Work with educators to enhance the curriculum Search catalog of HPC resources Schedule workshops for curricular development Leverage good work of others Offer Student Research Experiences Enroll in HPC internship opportunities Offer Student Competitions Publish Science and Education Impact Promote via TeraGrid Science Highlights, iSGTW Publish education resources to NSDL-CSERD
  • Slide 35
  • June 2007 Workshop and Training Sites in 2007 TeraGrid RP Minority Serving Institution Research 1 Univ. 2/4 Yr. College Workshop Conference Tutorial TeraGrid 07
  • Slide 36
  • June 2007 Commercial/Public World is moving FAST! Two Examples of Communities: Search for Jim Gray Over 12,000 people helped search ! Facebook Group on Math Love Song (search Finite Simple Group by Klein Four on YouTube)
  • Slide 37
  • June 2007 TeraGrid is: Operations We have facilities/services on which users rely We provide infrastructure on which other providers build AND R&D Were learning how to do distributed, collaborative science on a global, federated infrastructure Were learning how to run multi-institution shared infrastructure
  • Slide 38
  • June 2007 Looking to the Future Focus on Operations and Transparency Add more resources into TeraGrid Framework Documentation and Training Data Movement Scheduling and Info Services Federation: Partner Grids, Campuses
  • Slide 39
  • June 2007 Backup Slides
  • Slide 40
  • June 2007 A Walk Down Memory Lane 1985: TCP/IP won network dominance wars 1 MIPS analysis Machine = $750,000 1990: Rise of the RISC workstation 65 MB hard drive = $350 1995: Web Browsers (Mosaic) hit the scene (Internet Begins) #1 Machine on Top 500 = 250 Gflops (Nov. 1996) 2000: Y2K Bug/Napster Raises IT Awareness Triumph of Linux/Farms in HEP 2005: Search becomes King/Crowdsourcing TeraGrid begins operation The era of the Production Grid Begins 2010: ?? What Will Be Effects of MultiCore ??
  • Slide 41
  • June 2007 TeraGrid Objectives DEEP Science: Enabling Petascale Science Make Science More Productive through an integrated set of very-high capability resources Address key challenges prioritized by users WIDE Impact: Empowering Communities Bring TeraGrid capabilities to the broad science community Partner with science community leaders - Science Gateways OPEN Infrastructure, OPEN Partnership Provide a coordinated, general purpose, reliable set of services and resources Partner with campuses and facilities
  • Slide 42
  • June 2007 Real-Time Usage Mashup Alpha version Mashup tool - Maytal Dahan, Texas Advanced Computing Center ([email protected]) 309 Jobs running across 9,336 processors at 22:34 06/02/2007
  • Slide 43
  • June 2007 Org Chart
  • Slide 44
  • June 2007 Summary of Publications study
  • Slide 45
  • June 2007 Networking SDSC UC/ANLPSC TACC ORNL LA DEN NCSA NCAR Abilene 2x10G 1x10G PU IPGrid IU CHI 1x10G 1x10G each 2x10G 1x10G 3x10G each Cornell 1x10G
  • Slide 46
  • June 2007 TeraGrid User Community Growth Begin TeraGrid Production Services (October 2004) Incorporate NCSA and SDSC Core (PACI) Systems and Users (April 2006) Decommissioning of systems typically causes slight reductions in active users. E.g. December 2006 is due to decommissioning of Lemeux (PSC). FY05FY06 New User Accounts9482,692 Avg. New Users per Quarter315365* Active Users1,3503,228 All Users Ever1,7994,491 (*FY06 new users/qtr excludes Mar/Apr 2006)
  • Slide 47
  • June 2007 TeraGrid Resources For Scientific Discovery Computing - over 250 TFlops and growing Common help desk and consulting requests CTSS software environment Remote visualization servers and visualization software Data Management Over 20 Petabytes of storage Over 100 Scientific Data Collections Broadening Participation in TeraGrid Over 20 Science Gateways Advanced Support for TeraGrid Applications Education and training events and resources Access Common allocations mechanism - DAC, MRAC and LRAC Security Shibboleth testbed underway for campus authentication
  • Slide 48
  • June 2007 TeraGrid Projects by Institution Blue: 10 or more PIs Red: 5-9 PIs Yellow: 2-4 PIs Green: 1 PI 1000 projects, 3200 users TeraGrid allocations are available to researchers at any US educational institution by peer review. Exploratory allocations can be obtained through a biweekly review process. See www.teragrid.org.
  • Slide 49
  • June 2007 Popular Resources for DAC Awards
  • Slide 50
  • June 2007 Grid Service Usage (PreWS GRAM) Daily INCA Reporter (http://tinyurl.com/23ugbm) courtesy Kate Ericson, SDSChttp://tinyurl.com/23ugbm
  • Slide 51
  • June 2007 Daily GT4 WS Invocation Reports Graph courtesy Tony Rimovsky, NCSA
  • Slide 52
  • June 2007 User Portal Additions
  • Slide 53
  • June 2007 Data as Resource What can we say here about progress/status ?