Upload
leduong
View
221
Download
1
Embed Size (px)
Citation preview
Campus Bridging in XSEDE,
OSG, and Beyond
Dan Fraser, OSG
Jim Ferguson, NICS
Andrew Grimshaw, UVA
Rich Knepper, IU
David Lifka, Cornell
Internet2 Spring Members’ Meeting, Arlington, VA
Overview
• Campus Bridging introduction
• Open Science Grid and software deployment
• XSEDE Campus Bridging Program
• Global Federated File System
• Campus Initiatives
• RedCloud
• POD at IU
April 23, 2012
Campus Bridging: Formation
April 23, 2012
• NSF Advisory Committee for CyberInfrastructure
• Community input at multiple workshops
• Surveys
• ACCI Campus Bridging Task Force Report
• http://pti.iu.edu/campusbridging
Campus Bridging: Concepts
• Making it easier for users to transition from their
laptop to large-scale resources
• Gathering best practices to deploy resources in
a way that makes them familiar to users
• Providing training and documentation that
covers research computation at multiple scales
April 23, 2012
OSG Campus Bridging (Campus High Throughput Computing Infrastructures)
Dan Fraser
OSG Production Coordinator
Campus Infrastructure Lead
Internet2 Spring Members Meeting
Arlington, VA
Sept 24, 2012
The Open Science Grid
The Open Science Grid (OSG) has
focused on campuses from its inception.
All OSG computing power comes from
campuses. OSG has a footprint on over
100 campuses in the US and abroad.
http://display.grid.iu.edu
OSG Campus Bridging Focus
Focus on the Researcher (…or Artist)
One step at a time
Simple interfaces are good
Engaging with the Campus
Campuses each have their own “culture”
Terminology
Access patterns
Security
Operational styles
Processes (autN, autZ, monitoring,
accounting, data management, …)
The most fundamental issues are not
technological in nature
Campus Bridging = Cultural Bridging
Campus Bridging Direction
Help the researcher use local resources
Run on a local cluster (on campus)
Run on several local clusters
Use/share resources with a collaborator
on another campus
Access to the national cyberinfrastructure
OSG (and also XSEDE) resources
(BTW, OSG is also an XSEDE service provider)
Submit Locally, Run Globally
OSG Campus Bridging Today
Campus
OSG Cloud
PBS
LSF
Submit Host
(Bosco)
Condor
Local User Credential
External
Campus
(could also submit to XSEDE)
Local Cluster
Summary
OSG is focused on the researcher/artist
Campus bridging = cultural bridging
A single submit model (Bosco) can be
useful
OSG is exploring how best to collaborate
with XSEDE on campus bridging
April 23, 2012
Introduction to XSEDE and its Campus
Bridging Program
Jim Ferguson, XSEDE TEOS team
Education, Outreach & Training Director, NICS
Acknowledgements
• Craig Stewart, Rich Knepper, and Therese Miller of Indiana University, and others on the XSEDE campus bridging team.
• John Towns, NCSA, XSEDE PI
14
XD Solicitation/XD Program
• eXtreme Digital Resources for Science and Engineering (NSF 08-571) – High-Performance Computing and Storage Services
• aka Track 2 awardees
– High-Performance Remote Visualization and Data Analysis Services
• 2 awards; 5 years; $3M/year • proposals due November 4, 2008
– Integrating Services (5 years, $26M/year) • Coordination and Management Service (CMS)
– 5 years; $12M/year
• Technology Audit and Insertion Service (TAIS) – 5 years; $3M/year
• Advanced User Support Service (AUSS) – 5 years; $8M/year
• Training, Education and Outreach Service (TEOS) – 5 years, $3M/year
XSEDE Vision The eXtreme Science and Engineering
Discovery Environment (XSEDE): enhances the productivity of scientists and
engineers by providing them with new and innovative capabilities
and thus facilitates scientific discovery while enabling
transformational science/engineering and innovative educational programs
16
Science requires diverse digital capabilities
• XSEDE is a comprehensive, expertly managed and evolving set of advanced heterogeneous high-end digital services, integrated into a general-purpose infrastructure.
• XSEDE is about increased user productivity
– increased productivity leads to more science
– increased productivity is sometimes the difference between a feasible project and an impractical one
17
XSEDE’s Distinguishing Characteristics -
Governance
• World-class leadership – partnership will be led by NCSA, NICS, PSC, TACC and SDSC
• CI centers with deep experience
– partners who strongly complement these CI centers with expertise in science, engineering, technology and education
• Balanced governance model – strong central management provides rapid response to issues
and opportunities – delegation and decentralization of decision-making authority – openness to genuine stakeholder participation
• stakeholder engagement, advisory committees
– improved professional project management practices • formal risk management and change control
18
What is campus bridging?
• Term originated by Ed Seidel as he charged six task forces of the NSF Advisory Committee for Cyberinfrastructure. Considerable info and final ACCI Task force report, online at pti.iu.edu/campusbridging
• The Taskforce definition:
Campus bridging is the seamlessly integrated use of cyberinfrastructure operated by a scientist or engineer with other cyberinfrastructure on the scientist’s campus, at other campuses, and at the regional, national, and international levels as if they were proximate to the scientist . . .
• Catchy name, great ideas … interest > our ability to implement yet
• Vision:
– Help XSEDE create the software, tools and training that will allow excellent interoperation between XSEDE infrastructure researchers local (campus) cyberinfrastructure;
– Enable excellent usability from the researcher’s standpoint for a variety of modalities and types of computing: HPC, HTC, and data intensive computing
– Promote better use of local, regional and national CI resources
19
Campus Bridging Use Cases
– InCommon Authentication
– Economies of scale in training and usability
– Long term remote interactive graphic session
– Use of data resources from campus on XSEDE, or from XSEDE at a campus
– Support for distributed workflows spanning XSEDE and campus-based data, computational, and/or visualization resources
– Shared use of computational facilities mediated or facilitated by XSEDE
– Access to “____ as a Service” mediated or facilitated by XSEDE
20
• Leveraging the XSEDE process of implementing new services via the systems engineering process (Architecture & Design => Software Development and Integration => Operations), begin deploying some new services that deliver campus bridging services
• Communicate effectively via Campus Champions, advocates for XSEDE now located at over 100 institutions.
• Develop relationship and terminology with OSG, as they have been bridging between institutions for several years.
21
Year 2 Strategic Plan
Year 2 Strategic Plan
• Complete planned pilot projects
• GFFS Pilot Program
• CUNY – PI: Paul Muzio
• KU – PI: Thorbjorn Axelsson
• Miami – PI: Joel Zysman
• TAMU – PI: Guy Almes
• Begin delivering selected campus-bridging related tools
– GFFS
– Documentation
– ROCKS Rolls
• Communicate “what is campus bridging”
22
Example: More consistency in CI setups
=> economies of scale for all
23
• In reality, the four cluster admins depicted here
being in agreement are all right.
• Experienced cluster admins all learned how to
use what they learn when the tools were still
developing, so the tool each sysadmin knows
the best is the tool that lets that sysadmin do
their work the best
• The only way to develop consistency is to
provide installers that will make their work easier
• The XSEDE architecture group is developing
installers for file management tools
• *A la Steven Colbert, the “4 out of 5…” comment
is not intended to be a factual statement
Your Comments, Please!
• Do we have the right direction?
• What is Campus Bridging to You?
24
25
Thank You! [email protected]
April 23, 2012
Global Federated File System
GFFS
Andrew Grimshaw
• Basic idea and canonical use cases
• Accessing the GFFS
• Attaching (provisioning) data to the GFFS
• Deployment
27
SEQ_3
Biochemistry Biology
Partner Institution
SEQ_2 SEQ_1
Partner Institution Research Institution
APP 2 APP 1
Cluster 1
Cluster 2
Cluster N
Processing
APP 1
APP 2
APP N
Applications
PDB
NCBI
EMBL
SEQ_1
Data
Basic idea:
Map resources into a global directory structure
Map global directory structure into local file system
Canonical use cases
Definitions – Resource is {compute | job | data | identity | …}
– Access means create, read, update, delete
1. Access center resource from campus
2. Access campus resource from center
3. Access campus resource from another campus – Sharing file system or instrument data
– Sharing clusters
29
• Basic idea and canonical use cases
• Accessing the GFFS
• Attaching (provisioning) data to the GFFS
• Deployment
30
Accessing the GFFS
• Via a file system mount – Global directory structure mapped directly into the local
operating system via FUSE mount
• XSEDE resources regardless of location can be accessed via the file system – Files and directories can be accessed by programs and shell
scripts as if they were local files – Jobs can be started by copying job descriptions into directories – One can see the jobs running or queued by doing an “ls”. – One can “cd” into a running job and access the working
directory where the job is running directly
31
mkdir XSEDE
nohup grid fuse –mount local:XSEDE &
E.g., Access a job’s running directory
32
Accessing the GFFS
• Via a command line tools, e.g.,
– cp local:fred.txt /home/grimshaw/fred.txt
– rm /home/grimshaw/fred.txt
33
GUI Grid Client
• Typical folder based tool
• Tools to define, run, manage jobs
• Tools to manage “grid” queues
• Tools to “export” data • Grid shell
– Shell has tab completion, history, help, scripting, etc.
34
GUI Grid Client: View Access Control
• To view access control information: Browse to and highlight resource, then select Security tab
35
• Basic idea and canonical use cases
• Accessing the GFFS
• Attaching (provisioning) data to the GFFS
• Deployment
36
Sarah’s TACC workspace
Exporting (mapping) data into the Grid
Data clients Data clients
Linux TACC Windows
• Links directories and
files from source location
to GFFS directory and
user-specified name
• Presents unified view of
the data across
platforms, locations,
domains, etc.
• Sarah controls
authorization policy.
Sarah’s instrument in the lab Sarah’s department
file server
Exporting/sharing data
• Can also export – Windows shares
– Directory structures via ssh (slow – like sshFX)
38
grid export /containers/Big-State-U/Sarah-server /development/sources /home/Sarah/dev
• User selects – Server that will perform the export – Directory path on that server – Path in GFFS to link it to
• Basic idea and canonical use cases
• Accessing the GFFS
• Attaching (provisioning) data to the GFFS
• Deployment
39
Deployment
• Sites that wish to export or share resources must run a Genesis II or UNICORE 6 container
• There will be an installer for the GFFS package for SPs, and a “Campus Bridging” package
• There is an installer for client side access
• There are training materials
– Used at TG 11
– In the process of being turned into videos
40
On-Demand Research Computing
- Infrastructure as a Service -
- Software as a Service -
www.cac.cornell.edu/redcloud
Infrastructure as a Service (IaaS) Cloud
Red Cloud provides on-demand:
• Computing Cycles: Virtual Servers in Cloud “Instances”
• Storage: Virtual Disks in Elastic Block Storage (“EBS”) Volumes
Red Cloud
Virtual Servers
“Cloud Instances”
Virtual Disks
“Elastic Block
Storage (EBS)”
Cloud Management
Virtual Server Users
www.cac.cornell.edu/redcloud
with MATLAB
GridFTP Server
MyProxy Server
Web Server
SQL Server
Compute Nodes Dell C6100 NVIDIA
Tesla M2070s
Head Node
GPU Chassis Dell C410x
DDN Storage
www.cac.cornell.edu/redcloud
Software as a Service (SaaS) Cloud
Motivation
• Research computing means many different things…
– Scientific workflows have different requirements at each step
– Cloud is only part of the solution
– Connecting to and from other CI resources is important
• Nobody likes a bad surprise
– Transparency, no hidden costs
– Need a way to bound financial risk
• Economies of scale
– Sharing hardware and software where it makes sense
– Pay for what you need, when you need it
• Customized environments for various disciplines
– Collaboration tools
– Data storage & analysis tools
– Flexibility to support different computing models (e.g. Hadoop)
www.cac.cornell.edu/redcloud
Provides
Predictable, Reproducible, Reliable Performance
We publish hardware specifications (CPU, RAM, network) and do not oversubscribe.
Convenient
Need system up and running yesterday.
Need a big fast machine for only a few months, weeks or days.
Need a small server to run continuously.
No Hidden Costs
No cost for network traffic in or out of the cloud.
Fast Access to Your Data
Fast data transfers via 10Gb Ethernet in or out of the cloud at no additional charge.
Globus Online access
Economies of scale
IaaS: Infrastructure
SaaS: Software
Expert Help
System, application, and programming consulting are available.
Easy Budgeting with Subscriptions
No billing surprises!
IaaS is Amazon API Compatible
Migrate when your requirements outgrow Red Cloud.
www.cac.cornell.edu/redcloud
Some Use Cases to Consider • Support for Scientific Workflows
– Pre & post-processing of data and results
– Data analysis
– Globus Online for fast reliable data transfer
• https://www.globusonline.org/
• Collaboration
– Wiki hosting
– Customized data analysis & computational environments
• Web Portals
– Science Gateways
– Domain Specific Portals
– Hub Zero
• http://hubzero.org/pressroom
• http://nanohub.org
• Event-Driven Science
– https://opensource.ncsa.illinois.edu/confluence/display/SGST/Semantic+Geostreaming+Toolkit
• Education, Outreach & Training
– Pre-configured systems & software tools providing consistent training platform
– Common laboratory computing environment
• Bursting
– Additional software and hardware on demand
www.cac.cornell.edu/redcloud
Subscription-based Recovery Model
*A core year is equal to 8585 hours
Each subscription account includes 50GB of storage
with MATLAB
Cornell University $500/core year*
Other Academic $750/core year
Institutions
Cornell University $750/core year
Other Academic $1200/core year
Institutions
www.cac.cornell.edu/redcloud
What if ???
Consulting Additional
Storage
Cornell Users $59.90/hour $0.91/GB/year
Other Academic $85.47/hour $1.45/GB/year
Institutions
www.cac.cornell.edu/redcloud
Copyright © 2011 Penguin Computing, Inc. All rights reserved
Penguin Computing / IU Partnership
HPC “cluster as a service” and
Cloud Services Internet2 Spring Members’ Meeting 2012
Rich Knepper ([email protected]) Manager, Campus Bridging Indiana University
Copyright © 2011 Penguin Computing, Inc. All rights reserved
On-demand HPC system
> Compute, storage, low latency fabrics, GPU, non-virtualized
Robust software infrastructure
> Full automation
> User and administration space controls
> Secure and seamless job migration
> Extensible framework
> Complete billing infrastructure
Services
> Custom product design
> Site and workflow integration
> Managed services
> Application support
HPC support expertise
> Skilled HPC administrators
> Leverage 13 yrs serving HPC market
What is POD
Internet (150Mb, burstable to 1Gb)
51
Clouds look serene enough - But is ignorance bliss?
In the cloud, do you know:
> Where your data are?
> What laws prevail over the physical
location of your data?
> What license you really agreed to?
> What is the security (electronic /
physical) around your data?
> And how exactly do you get to that
cloud, or get things out of it?
> How secure your provider is
financially? (The fact that
something seems unimaginable, like
cloud provider such-and-such going
out of business abruptly, does not
mean it is impossible!)
Photo by http://www.flickr.com/photos/mnsc/
http://www.flickr.com/photos/mnsc/2768391365/siz
es/z/in/photostream/
http://creativecommons.org/licenses/by/2.0/
Penguin Computing & IU partner for “Cluster as a Service”
Just what it says: Cluster as a Service
Cluster physically located on IU’s campus, in IU’s Data Center
Available to anyone at a .edu or FFRDC (Federally Funded
Research and Development Center)
To use it:
> Go to podiu.penguincomputing.com
> Fill out registration form
> Verify via your email
> Get out your credit card
> Go computing
This builds on Penguin’s experience - currently host Life
Technologies' BioScope and LifeScope in the cloud
(http://lifescopecloud.com)
We know where the data are … and they are secure
An example of NET+ Services / Campus Bridging
"We are seeing the early emergence of a meta-university — a transcendent, accessible, empowering, dynamic, communally constructed framework of open materials and platforms on which much of higher education worldwide can be constructed or enhanced.” Charles Vest, president emeritus of MIT, 2006
NET+ Goal: achieve economy of scale and retain reasonable measure of control See: Brad Wheeler and Shelton Waggener. 2009. Above-Campus Services: Shaping the Promise of Cloud Computing for Higher Education. EDUCAUSE Review, vol. 44, no. 6 (November/December 2009): 52-67.
Campus Bridging goal – make it all feel like it’s just a peripheral to your laptop (see pti.iu.edu/campusbridging)
Copyright © 2011 Penguin Computing, Inc. All rights reserved
True On-Demand HPC for Internet2
Creative Public/Private model to address HPC shortfall
Turning lost EC2 dollars into central IT expansion
Tiered channel strategy expansion to EDU sector
Program and discipline-specific enhancements under way
Objective third party resource for collaboration
> EDU, Federal and Commercial
IU POD – Innovation Through Partnership
56
POD IU (Rockhopper) specifications
Server Information
Architecture Penguin Computing Altus 1804
TFLOPS 4.4
Clock Speed 2.1GHz
Nodes 11 compute; 2 login; 4 management; 3 servers
CPUs 4 x 2.1GHz 12-core AMD Opteron 6172 processors per compute node
Memory Type Distributed and Shared
Total Memory 1408 GB
Memory per Node 128GB 1333MHz DDR3 ECC
Local Scratch Storage 6TB locally attached SATA2
Cluster Scratch 100TB Lustre
Further Details
OS CentOS 5
Network QDR (40Gb/s) Infiniband, 1Gb/s ethernet
Job Management Software SGE
Job Scheduling Software SGE
Job Scheduling policy Fair Share
Access keybased ssh login to headnodes
remote job control via Penguin's PODShell
Package name Summary
COAMPS Coupled ocean / atmosphere meoscale prediction system
Desmond Desmond is a software package developed at D. E. Shaw Research to perform high-speed molecular
dynamics simulations of biological systems on conventional commodity clusters.
GAMESS GAMESS is a program for ab initio molecular quantum chemistry.
Galaxy Galaxy is an open, web-based platform for data intensive biomedical research.
GROMACS GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations
of motion for systems with hundreds to millions of particles.
HMMER HMMER is used for searching sequence databases for homologs of protein sequences, and for making
protein sequence alignments.
Intel compilers and libraries
LAMMPS LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular
Massively Parallel Simulator.
MM5
The PSU/NCAR mesoscale model (known as MM5) is a limited-area, nonhydrostatic, terrain-following
sigma-coordinate model designed to simulate or predict mesoscale atmospheric circulation. The model is
supported by several pre- and post-processing programs, which are referred to collectively as the MM5
modeling
system.
mpiBLAST mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST.
NAMD NAMD is a parallel molecular dynamics code for large biomolecular systems.
Available applications at POD IU (Rockhopper)
Copyright © 2011 Penguin Computing, Inc. All rights reserved
Questions?
Thank you!
59