16
Research Computing and Cyberinfrastructure Th S i bili Mdl P S The Sustainability Model at P enn State Vijay K. Agarwala Senior Director, Research Computing and Cyberinfrastructure The Pennsylvania State University University Park PA 16802 USA University Park, PA 16802 USA [email protected] May 3 rd –5 th , 2010 at Cornell University Center for Advanced Computing 1

Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Research Computing and Cyberinfrastructure ‐Th S i bili M d l P SThe Sustainability Model at Penn State

Vijay K. AgarwalaSenior Director, Research Computing and Cyberinfrastructure

The Pennsylvania State UniversityUniversity Park PA 16802 USAUniversity Park, PA 16802 USA

[email protected]

May 3rd – 5th, 2010 at Cornell University Center for Advanced Computing

1

Page 2: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Organizational Structure of Research Computing and CI at Penn StateUniversity  Presidenty

Senior Vice President for Finance and Business

Senior Vice President for Research and 

Provost and Executive Vice PresidentFinance and Business

Dean of Graduate SchoolPresident

Vice Provost for  Information Technology and Chief Information 

Offi

Associate Vice PresidentOffice of Physical Plant

RCC Advisory CommitteeDirector of  Director f

Officery

Director of Research

RCC Executive Committee

Up to seven faculty members and research administrators from the University Research 

Council

Telecommunication and Networking 

Services 

DirectorInstitute of Cyber 

Science

Associate VP for Research

Computing and CyberinfrastructureExecutive Director

High Performance Computing Systems

4 staff members

Domain Specific Consulting Support

4 staff members

Visualization and Telecollaborative Systems

4 staff members

Software Development andProgramming Support

4 t ff b4 staff members 4 staff members 4 staff members 4 staff members

2

Page 3: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Research Computing and Cyberinfrastructure• Pro ide s stems ser ices b researching c rrent practices in operating s stem file s stem data• Provide systems services by researching current practices in operating system, file system, data 

storage, job scheduling as well as computational support related to compilers, parallel computations, libraries, and other software support. Also supports visualization of large datasets by innovative means to gain better insight from the results of simulations.

• Enable large‐scale computations and data management by building and operating several state‐of‐the art computational clusters and machines with a variety of architectures. 

• Consolidate and thus significantly increase the research computing resources available to each faculty participant. Faculty members can frequently exceed their share of the machine to meetfaculty participant. Faculty members can frequently exceed their share of the machine to meet peak computing needs.

• Provide support and expertise for using programming languages, libraries, and specialized data and software for several disciplines. 

• Investigate emerging visual computing technologies and implement leading‐edge solutions in a cost‐effective manner to help faculty better integrate data visualization tools and immersive facilities in their research and instruction.

• Investigate emerging architectures for numerically‐intensive computations and work with early‐Investigate emerging architectures for numerically intensive computations and work with earlystage companies. For example: interconnects, networking, and graphics processors for computations. 

• Help build inter‐ and intra‐institutional research communities using cyberinfrastructure h l itechnologies.

• Maintain close contacts with NSF and DoE funded national centres, and help faculty members with porting and scaling of codes across different systems.

3

Page 4: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Programs, Libraries, and Application Codes in Support of Computational Research

• Compilers and Debuggers: DDT, GNU Compilers, Intel Compiler Suite, NVIDIA CUDA, PGI Compiler Suite TotalViewCompiler Suite, TotalView

• Computational Biology: BLAST, BLAST+, EEGLAB, FSL, MRIcro, MRICron, SPM5, SPM8, RepeatMasker, wuBlast

• Computational Chemistry and Material Science: Accelyrs Materials Studio Amber CCP4• Computational Chemistry and Material Science: Accelyrs Materials Studio, Amber, CCP4, CHARMM , CPMD, Gamess, Gaussian 03, GaussView, Gromacs, LAMMPS, NAMD, NWChem, Rosetta, Shrondinger Suite, TeraChem, ThermoCalc , VASP, WIEN2K, WxDragon

• Finite Element Solvers: ABAQUS, LS‐DYNA, MD/Nastran and MD/Patran

• Fluid Dynamics: Fluent, GAMBIT, OpenFOAM, Pointwise

• Mathematical and Statistical Libraries and Applications: AMD ACML, ATLAS, BLAS, IMSL, LAPACK, GOTO, Intel MKL, Mathematica, MATLAB, Distributed MATLAB,NAG, PETSc, R, SAS, WSMP

• Multiphysics: ANSYS, Comsol

• Optimization: AMPL, CPLEX, GAMS, Matlab Optimization Toolbox, Matgams, OPL

• Parallel Libraries: OpenMPI, Parallel IMSL, ScaLAPACK

• Visualization Software: Avizo, Grace, IDL, Tecplot, VisIt, VMD

All ft i t ll ti d i b f lt Th ft t kAll software installations are driven by faculty. The software stack on every system is customized and entirely shaped by faculty needs.

4

Page 5: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Compute EnginesCore Compute EnginesSystem Nodes Cores Memory (GB) Interconnect

Lion‐XO 64 192 1280 SDR Infiniband

Lion‐XB 16 128 512 SDR Infiniband

A 48‐port 10 GigE switch connects all compute engines and storage 1280 core system coming online in Summer 2010 All i hLion‐XC 140 560 1664 SDR Infiniband

Lion‐XJ 144 1152 5120 DDR Infiniband

Lion‐XI 64 512 4096 DDR Infiniband

f b d

All compute engines  together to deliver  40 million core hours in 2010 Emerging technologies in the hands of the user community –ScaleMP and GPU cluster

Lion‐XK 64 512 4096 DDR Infiniband

Lion‐XH 48 384 2304 QDR Infiniband

Lion‐CLSF 1 virtual node using ScaleMP

128 768 QDR Infiniband

technology

CyberStar 224 2048 7680 QDR Infiniband

Hammer 8 64 1024 GigE

T l 2 16 32 Gi ETesla 2 servers 16 32 GigE

1 NVIDIA S1070 960 16 GigE

StoragegSystem Capacity Performance

IBM GPFS Parallel File System 550 TB Can sustain 5+ GB/s

5

Page 6: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Job SchedulingScheduling Tools• Torque is the resource managerTorque is the resource manager• Moab is the job scheduler

Participating Faculty Partners• Each partner group gets its own queue in Torque• Each partner queue is assigned a fair‐share usage target in Moab equivalent to the group’s 

percentage of ownership in the specific compute engine• A 30 day fair‐share window is used to track partner group usage • When a partner group is below its target, the jobs in its queue are given a priority boost• When a partner group is at or above its target, the jobs in its queue have no priority boost but are 

also not penalized (they are treated on par with general users)• Within a partner group queue there is a smaller fair‐share priority between members of that groupWithin a partner group queue there is a smaller fair share priority between members of that group 

to keep anyone from monopolizing the group queue• Partner group queues do not limit the number of cores its members  use• Partner group queues have by default a two week limit on the run time of jobs but this limit can be 

adjusted as necessaryadjusted as necessary

General Users• General users have access to a public queue and a small fair‐share priority• General users are limited in the number of processor cores they can have in use (32) and the length 

of their jobs (24 hours)

6

Page 7: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

8 12 groups exceeded their “target share” at some point during the periodOnly four groups remained below the “target share” for this period

6

7group1group1

group2group2

group3group3

group4group4

5

arget U

sage

group4group4

group5group5

group6group6

group7group7

3

4

f Actua

l to Ta group8group8

group9group9

group10group10

group11group11

2

3

Ratio

 of group11group11

group12group12

group13group13

group14group14

1group15group15

group16group16

max usagemax usage

0Usage  of Usage  of LIONLION‐‐XJ XJ compute cluster  between January 1, 2010 compute cluster  between January 1, 2010 ‐‐ April 22, 2010April 22, 2010

7

Page 8: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Visualization Services

Staff members provide consulting, teach seminars, assist faculty and support facilities for visualization and VR.

Recent support areas include:

• Visualization and VR system design and deployment

• 3D Modeling applications: FormZ Maya• 3D Modeling applications: FormZ, Maya

• Data Visualization applications: OpenDX, VTK, VisIt

• 3D VR development and device libraries: VRPN, VMRL, JAVA3D, OpenGL, OpenSG CaveLibOpenSG, CaveLib

• Domain specific visualization tools: Avizo (Earth, Fire, Green, Wind), VMD, SCIRun

• Telecollaborative tools and facilities: Access Grid, inSORS, VNC

• Parallel graphics for scalable visualization: Paraview, DCV, VisIt

• Programming support for graphics (e.g. C/C++, JAVA3D, Tcl/Tk, Qt)

8

Page 9: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Visualization Facilities

Our goal is to foster more effective use of visualization and VR techniques in research and teaching across colleges and disciplines via strategic deployment of facilities and related support. 

• Locating facilities strategically across campus for convenient access by targeted disciplines and user communitiesdisciplines and user communities

• Leveraging existing applications and workflows so that visualization and VR can be natural extensions to existing work habits for the users being served

• Providing outreach seminars end user training and ongoing staff support in use of• Providing outreach seminars, end‐user training and ongoing staff support in use of the facilities

• Working on an ongoing basis with academic partners to develop and adapt these resources more precisely to fit their needsp y

• Helping to identify and pursue funding opportunities for further enhancement of these efforts as they take root

9

Page 10: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Visualization Facilities

• Immersive Environments Lab: in partnership with School of h d d hArchitecture and Landscape Architecture

• Immersive Construction Lab: in partnership with Architecture EngineeringArchitecture Engineering

• Visualization/VR Lab: in partnership with Materials Simulation Center

• Visualization/VR Lab: in partnership with Computer Science and Engineering

• Sports Medicine VR Lab: a partnership between Kinesiology, Athletics and Hershey Medical Center

10

Page 11: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Building a strong University based Computing Center (UBCC)Top 10 practices

• Appeal to a broad constituency: science, engineering, medicine, business, humanities, liberal arts. Provision 25% of compute cycles with minimal requirements of proposal and review.

• Flexibility in system configuration and adaptability to faculty needs: work with partners on their queue requirements, accommodate temporary priority boost.

• Keep barriers to faculty participation low: as low as a $5000 investment for a contributing partner with priority access Try before you buy program and guestcontributing partner with priority access. Try‐before‐you‐buy program and guest  status for prospective faculty partners.

• Maximize system utilization: consistently above 90%, even contributing partners don't have instantaneous access.

• Extensive software stack, and rapid turnaround in installation of new software: rich set of tools, compilers, libraries, community codes and ISV codes.

• Provide consulting with subject matter expertise: differentiate the UBCC from “flop shops” and cloud providersflop shops  and cloud providers.

• Strong commitment to training: teaching in classes, seminars, workshops.• Provide accurate and daily system utilization data.• Build strong partnerships with hardware and software vendorsBuild strong partnerships with hardware and software vendors.• Make emerging technologies and test beds available to faculty and students.

11

Page 12: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Cloud Computing – How will it impact Research CI at Universities?

Cloud computing : Virtualized/remote computing resource from which users can purchase what they need, when they need it

A clearinghouse for buyers and sellers of Digital Analysis Computing (DAC) services

Computing resources for commercial and academic community

Offers HPC services to industry, government, and academiaAffiliated with

12

Page 13: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Cloud Computing – How will it impact Research CI at Universities?

• Google+IBM cloud - a platform to computer science researchers for building g p p gcloud applications. Combination of Google machines, IBM blade center and system x servers

• NSF awarded $5.5 million to fourteen universities through its Cluster Exploratory (CluE) programExploratory (CluE) program

• A global open source test bed, Open Cirrus for the advancement of cloud computing research

• University of Illinois, KIT (Germany), iDA (Singapore), ETRI (Korea), MIMOSUniversity of Illinois, KIT (Germany), iDA (Singapore), ETRI (Korea), MIMOS (Malaysia), Russian Academy of Sciences

13

• Computing in the Cloud (CiC) – NSF Solicitation 10-550 (~ $5 mil in FY 2010)

Page 14: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

Cloud Computing – How will it impact Research CI at Universities?

• What impact buying of commercial cloud services will have on future development of campus, regional and national CI?

• How tightly should the campus CI be integrated with that of• How tightly should the campus CI be integrated with that of private companies in a geographic region? Would it promote stronger partnership between academia and industry?

• Would new policies and guidelines emerge from federal funding agencies on incorporating cloud services in the grants? Would cloud service providers have to be US‐based for data and security reasons?

Demand for cloud computing services by campus based researchers will depend on how research computation related CI p pis funded and deployed in the future, and just as importantly, how well the university based computing centers (UBCC) respond to it and stay competitive.p y p

14

Page 15: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

A model for University‐based Computing Center (UBCC)Assumes capital expenditures have been incurred for a data center as well as hardware and software. For example, a $10 million investment today in a green data center with a PUE of 1.25 and lower levels of redundancy can create a facility with: 3,200 sq. ft. of raised floor, 3,000 sq. ft. of office space, 1.5 ‐ 2.0 megawatts of electrical power, limited capacity UPS, and HVAC systems, all housed in a 12,000 sq. ft. building. A similar $10 million investment in hardware today can build aggregate peak computing capacity between 250 and 400 teraflops (25,000 – 40,000 cores) and 10 ‐ 20 petabytes of storage. 

Annual Operating Expenditurep g pHardware: $2.50 million  replacing 25% of installed compute capacity each yearSoftware: $0.25 million  annual licensing costs Utility cost:  $1.00 million  (80 racks, avg. 15 KW/rack)Staff (16 people):$2 00 million in salary and benefitsStaff (16 people):$2.00 million  in salary and benefitsOther expenses: $0.25 millionTotal: $6.00 million  annual investment in support of research, teaching & outreach.

Once initial capital expenditures have been made it is possible to deliver compute cycles with aOnce initial capital expenditures have been made, it is possible to deliver compute cycles, with a high level of system and computational staff support, at a cost between $0.025 to $0.04 per core hour (2.4 ‐ 3.0 GHz cores). 

Universities are more likely to commit such institutional funds if Federal funding agencies (NSF NIHUniversities are more likely to commit such institutional funds if Federal funding agencies (NSF, NIH, DoE, NASA, DoC) provide incentives by way of targeted support for UBCC.

15

Page 16: Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads › SRCC › Agarwala.pdf · 2010-05-05 · • Multiphysics: ANSYS, Comsol • Optimization:AMPL,

The case for UBCC: why Federal agencies need to provide direct support• Incentive for institutional investment: research universities will leverage targeted federal funding and commit

more funds in supporting computational and data-driven research and teaching.• Efficient use of resources: UBCC, with its campus-level computing cloud, has proven to be highly optimal and

cost-effective in meeting a large portion of computing needs across a range of disciplines. Breakthrough science is becoming possible on small-to-medium sized systems. The research community at campuses frequently relies on campus-based HPC systems and staff to speed up their development and discovery cycle. The campus based HPC systems are designed and their software stack frequently fine tuned in response to the computational needs of local research community and with the sole goal of increasing computational productivity of faculty and students.

• Seed computational science education: UBCC is a critical enabler by providing hands-on advanced training to students and faculty. UBCC nurtures sustained interest in research computing and HPC technologies amongst faculty, graduate students, and more significantly, reaches the undergraduate student community.

• Workforce Development: There is a widening skills gap i.e. shortage of people with disciplinary knowledge, skills in scalable algorithm and code development for large systems, and the ability to think across disciplinary boundaries and integrate modeling and computational techniques from different areas. There is an equally pronounced shortage of skilled people in system administration, in building and deploying various parts of the software stack, code scaling, and in adoption of large-scale computing in disciplinary areas. Support for UBCC will go a long way in addressing the shortage of skilled personnel and help with wider and deeper adoption of HPC technologies in both academia and industry UBCC have a critical role to play in developing the next generation oftechnologies in both academia and industry. UBCC have a critical role to play in developing the next generation of HPC professionals and also in moderating the outsourcing of engineering services when it is due to shortage of skilled personnel in the US.

• Industrial outreach: UBCC is uniquely positioned and highly effective in forging partnerships between industry and academia to make regional businesses globally competitive. It can leverage its CI assets, couple it with what industry may have and in effect build "local collaboratories" around which faculty students and industrial R&Dindustry may have, and in effect build local collaboratories around which faculty, students, and industrial R&D personnel can collaborate. In doing so, UBCC will support development of advanced modeling technologies, help overcome barriers of "high-perceived cost" that are preventing its widespread adoption by small and medium-sized businesses, and drive the adoption of modeling and simulation as an integral part of their research and product development cycles. UBCC can provide large-scale computing services and the continuity of contact between faculty/students and industry personnel.y y

• Build a healthier HPC ecosystem: If more UBCCs are funded, it will increase diversity and number of participants, and the rate of innovation and number of companies in this important industry sector where US companies are global leaders.

16