Upload
daniel-s-katz
View
243
Download
2
Tags:
Embed Size (px)
DESCRIPTION
How local, regional, and national cyberinfrastructure can be coordinated and linked to advance science and engineering, based on experiences and lessons from the Center for Computation & Technology at LSU (ideas, funding, implementation), plus some thoughts on what might be done differently if we were starting today. Presented at First Workshop - Center for Computational Engineering & Sciences, Unicamp, Campinas, Brazil 10 APR 2014
Citation preview
www.ci.anl.gov www.ci.uchicago.edu
Advancing Science through Coordinated Cyberinfrastructure
Daniel S. Katz [email protected] Senior Fellow, ComputaBon InsBtute, University of Chicago & Argonne NaBonal Laboratory Affiliate Faculty, Center for ComputaBon & Technology, Louisiana State University Adjunct Associate Professor, Electrical and Computer Engineering, LSU
www.ci.anl.gov www.ci.uchicago.edu
2 Advancing Science through CI – [email protected]
Topics
• What we did in Louisiana from 2006-‐2010 • What I would do differently now
• A short video to highlight some addiBonal issues that I hope the Center for ComputaBonal Engineering & Sciences will keep in mind
www.ci.anl.gov www.ci.uchicago.edu
3 Advancing Science through CI – [email protected]
Louisiana
• Area: 134 382 km2 (33/51) • Population: 4 533 000 (2010, 25/51) • GDP: $208 billion (2009, 24/51)
• GDP/person: $45 700 (2009, 21/51) • In Poverty: 17% (2009, 44/51)
• High School Degree: 82% (2009, 46/51) • BS Degree: 21% (2009, 47/51) • Advanced Degree: 7% (2009, 48/51)
State Goals: talented workforce, great compeBBveness, strong educaBonal system, increased economic development
www.ci.anl.gov www.ci.uchicago.edu
4 Advancing Science through CI – [email protected]
PITAC Report Summary: • “ComputaBonal science -‐-‐ the use of
advanced compuBng capabiliBes to understand and solve complex problems -‐-‐ is criBcal to scienBfic leadership, economic compeBBveness, and naBonal security. It is one of the most important technical fields of the 21st century because it is essenBal to advances throughout society.”
• “UniversiBes must significantly change organizaBonal structures: mulBdisciplinary & collaboraBve research are needed [for US] to remain compeBBve in global science”
Complex problems: Innova1ons will occur at boundaries
www.ci.anl.gov www.ci.uchicago.edu
5 Advancing Science through CI – [email protected]
Big Science and Infrastructure
• Higgs* boson discovery announced at CERN July 4, 2012 • Instrument: Large Hadron Collider (LHC) • Infrastructure
– CompuBng Hardware: Worldwide LHC CompuBng Grid (WLCG): 235,000 cores across 36 countries, including OpenScience Grid (OSG, US), European Grid Infrastructure (EGI, Europe), ...
– Data: ~20 PB of data created in 2011-‐2012 – Soiware: grid middleware, physics analysis applicaBons, ... – Networks – EducaBon &
Training • Data generated
centrally, moved (~3 PB/week) across mulB-‐Bered infrastructure to be compuBng upon
www.ci.anl.gov www.ci.uchicago.edu
6 Advancing Science through CI – [email protected]
Big Science and Infrastructure
• Hurricanes affect humans • MulB-‐physics: atmosphere, ocean, coast, vegetaBon, soil
– Sensors and data as inputs
• Humans: what have they built, where are they, what will they do – Data and models as inputs
• Infrastructure: – Urgent/scheduled processing,
workflow systems – Soiware applicaBons, workflows – Networks
– Decision-‐support systems, visualizaBon
– Data storage, interoperability
www.ci.anl.gov www.ci.uchicago.edu
7 Advancing Science through CI – [email protected]
Long-‐tail Science and Infrastructure
• Exploding data volumes & powerful simulaBon methods mean that more researchers need advanced infrastructure
• Such “long-‐tail” researchers cannot afford expensive experBse and unique infrastructure
• Challenge: Outsource and/or automate Bme-‐consuming common processes
– Tools, e.g., Globus Online and data management
o Note: much LHC data is moved by Globus GridFTP, e.g., May/June 2012, >20 PB, >20M files
– Gateways, e.g., nanoHUB, CIPRES, access to scienBfic simulaBon soiware
NSF grant size, 2007. (“Dark data in the long tail of science”, B. Heidorn)
www.ci.anl.gov www.ci.uchicago.edu
8 Advancing Science through CI – [email protected]
Long-‐tail Science and Infrastructure • CIPRES Science Gateway for PhylogeneBcs
– Study of diversificaBon of life and relaBonships among living things through Bme • Highly used, as of mid 2013:
– Cited in at least 400 publicaBons, e.g., Nature, PNAS, Cell – More than 5000 unique users in 3 years – Used rouBnely in at least 68 undergraduate classes – 45% US (including most states), 55% 70 other countries
• Infrastructure – Flexible web applicaBon
o A science gateway, uses soiware and lessons from XSEDE gateways team, e.g., idenBfy management, HPC job control
– Science soiware: tree inference and sequence alignment o Parallel versions of MrBayes, RAxML, GARLI, BEAST, MAFFT o PAUP*, Poy, ClustalW, Contralign, FSA, MUSCLE, ...
– Data o Personal user space for storing
results o Tools to transfer and view data
Credit: Mark Miller, SDSC
www.ci.anl.gov www.ci.uchicago.edu
9 Advancing Science through CI – [email protected]
Infrastructure Challenges
• Science – Larger teams, more disciplines, more countries
• Data – Size, complexity, rates all increasing rapidly – Need for interoperability (systems and policies)
• Systems – More cores, more architectures (GPUs), more memory hierarchy – Changing balances (latency vs bandwidth) – Changing limits (power, funds) – System architecture and business models changing (clouds) – Network capacity growing; increase networks -‐> increased security
• Soiware – MulBphysics algorithms, frameworks – Programing models and abstracBons for science, data, and hardware – V&V, reproducibility, fault tolerance
• People – EducaBon and training – Career paths – Credit and avribuBon
www.ci.anl.gov www.ci.uchicago.edu
10 Advancing Science through CI – [email protected]
Cyberinfrastructure
“Cyberinfrastructure consists of compu1ng systems, data storage systems, advanced instruments and data repositories, visualiza1on environments, and people,
all linked together by so@ware and high performance networks
to improve research produc1vity and enable breakthroughs not otherwise possible.” -‐-‐ Craig Stewart
www.ci.anl.gov www.ci.uchicago.edu
11 Advancing Science through CI – [email protected]
ComputaBonal & Data-‐enabled Science & Engineering (CDS&E)
• LIGO: Laser Interferometric GravitaBonal Wave Observatory
• Ties together theory, computaBon, and experiment – Each drives the other two!
www.ci.anl.gov www.ci.uchicago.edu
12 Advancing Science through CI – [email protected]
How We Started
• State commitment: $25M/year for Vision 20/20 – $9M: LSU -‐> CCT (similarly, ULL -‐> LITE)
• University commitment to build new programs for 21st century
• State and University willingness to make extraordinary investments
• Opportunity to build new world class program in interdisciplinary research and educaBon, involving all of LSU
• Ed Seidel-‐led vision to insBgate state-‐wide collaboraBon
www.ci.anl.gov www.ci.uchicago.edu
13 Advancing Science through CI – [email protected]
Advancing Research
• PotenBally requires advances in three areas, depending on exisBng strengths
www.ci.anl.gov www.ci.uchicago.edu
14 Advancing Science through CI – [email protected]
CCT
Director Office Edward Seidel
HPC Partnership McMahon
Cyberinfrastructure Development���
Katz Focus Areas
Allen
LONI Systems and Software
Coast to Cosmos
LSU HPC Performance Team Core Comp. Sci.
Corporate Relations
Blue Waters, etc. Material World
Labs: ACAL, DSL, Viz, LCAT, …
NSF TeraGrid Cultural Computing Visualization
14
CCT OrganizaBon
www.ci.anl.gov www.ci.uchicago.edu
15 Advancing Science through CI – [email protected]
Cyberinfrastructure Development
• Vision: combine research and infrastructure – Research
o Computer science o ApplicaBons o Tools
• Both together have squared growth of either alone
• CyD staff – PhDs in CS and apps who understand the whole picture and want to grow the ecosystem
15
– Infrastructure o Hardware o OperaBons o Policies
www.ci.anl.gov www.ci.uchicago.edu
16 Advancing Science through CI – [email protected]
NaBonal Lambda Rail UNO Tulane UL-‐L
SUBR LSU
LA Tech
LONI: 40 Gbps network
LONI: ~100TF IBM, Dell Supercomputers
Cybertools: Tools and Services
CompuBng in Louisiana
LONI InsBtute: People and CollaboraBons
TeraGrid, OSG
www.ci.anl.gov www.ci.uchicago.edu
17 Advancing Science through CI – [email protected]
LONI -‐ Networking & CompuBng
LSU
La Tech LSU HSC
ULL
Tulane
SU
UNO LSU HSC
LONI node Multiple 10GE ~500 core Dell cluster & 112 proc. IBM P5 cluster ~4500 core Dell Cluster
ULM
McNeese
NSU
SLU
Alex
Network: partners and customers
www.ci.anl.gov www.ci.uchicago.edu
18 Advancing Science through CI – [email protected]
LONI CompuBng Resources (2010)
• One central Dell cluster (Queen Bee) – 5500 IB-‐connected cores at ISB in Baton Rouge – Archival storage contracted through NCSA – 50% of allocaBons dedicated to TeraGrid from 2008
• Six distributed 512-‐core Dell clusters • Five distributed 14-‐node (112 procs) IBM P5-‐575 clusters • Distributed PetaShare storage
– 32 TB disk @ each small Dell cluster – 8 TB disk on LSU & LaTech small Dell clusters – for LBRN – 8 TB at SC-‐S & HSC-‐NO – for LBRN – 250 TB tape
• All run by HPC@LSU, including user support/training
www.ci.anl.gov www.ci.uchicago.edu
19 Advancing Science through CI – [email protected]
$12M NSF CyberTools Project: Enabler and Driver
www.ci.anl.gov www.ci.uchicago.edu
20 Advancing Science through CI – [email protected]
Cactus • Component-‐based
HPC framework – Freely-‐available
environment for collaboraBve applicaBon development
• Cuzng edge CS – Grid compuBng, petascale, accelerators, steering, remote viz
• AcBve user & developer communiBes – 10 year pedigree, >$10M support – Numerical RelaBvity, CFD, Coastal, Reservoir Engineering, …
• Domain-‐specific toolkits, e.g. CFD toolkit – FD/FV/FE numerical methods – Structured, mulB-‐block, unstructured – Uses PETSc, Trilinos, MUMPS, HYPRE – Used to build Black Oil Toolkit
www.ci.anl.gov www.ci.uchicago.edu
21 Advancing Science through CI – [email protected]
PetaShare
• Main concept: data is managed (migrated, moved, replicated, cached, etc.) automaBcally
• Data-‐aware storage systems, data-‐aware schedulers, cross-‐domain metadata scheme
• Provides: 250 TB disk, 400 TB tape storage (and access to naBonal storage faciliBes)
• ApplicaBons: coastal & environmental modeling, geospaBal analysis, bioinformaBcs, medical imaging, fluid dynamics, petroleum engineering, numerical relaBvity, high energy physics.
Credit: Tevfik Kosar
www.ci.anl.gov www.ci.uchicago.edu
22 Advancing Science through CI – [email protected]
LONI InsBtute “CCT for the Louisiana” • $15M 5-‐year project
– $7M BoR, $8M from LaTech, LSU, SUBR, Tulane, UNO, ULL
• Catalyzes new inter-‐insBtuBonal collaboraBons, ambiBous projects and top level hires: – LONI network and compuBng – NSF projects: PetaShare, VizTangibles, TeraGrid, Blue Waters
– EPSCoR: NSF CyberTools, DOE UCoMS, DoD – NIH: $17M LBRN – Promote collaboraBve research at interfaces for innovaBon
www.ci.anl.gov www.ci.uchicago.edu
23 Advancing Science through CI – [email protected]
LONI InsBtute Vision • LONI investments create world leading infrastructure • Create bold new inter-‐university superstructure
– New faculty, staff, students; train others. Focus on CS, Bio, Materials, but all disciplines impacted
– Promote research at interfaces for innovaBon • Draw on, enhance strengths of all universiBes
– Strong groups recently created; collecBvely world-‐class – Solve complex problems through collaboraBon & computaBon – Much stronger recruiBng opportuniBes for all insBtuBons – Statewide interdisciplinary educaBon & research program
• Create University-‐Industry Research Centers (UIRCs) – Research Triangle, NCSA/UIUC, Bay Area, others
• Transform Louisiana – Such commived cooperaBon between sites extraordinary
www.ci.anl.gov www.ci.uchicago.edu
24 Advancing Science through CI – [email protected]
LONI InsBtute Hiring and Projects • Two new faculty at each insBtuBon (12 total)
– Six in CS, six in Comp. Bio/Materials • Six ComputaBonal ScienBsts
– Following Bavarian KONWIHR project – Support 70-‐90 projects over five years; lead to external funding
• Graduate students – 36 new students funded, trained; two years each
• One Coordinator/economic development • All hiring coordinated across state • Leading faculty across state create mulB-‐insBtuBonal seed
projects • Building on seeds, dozens of new projects selected, started • Exploit common themes, compuBng environments, tools
found in all areas
www.ci.anl.gov www.ci.uchicago.edu
25 Advancing Science through CI – [email protected]
TeraGrid (XSEDE) • TeraGrid: world’s largest open scienBfic discovery infrastructure • Leadership class resources at eleven partner sites combined to create
an integrated, persistent computaBonal resource – High-‐performance networks – High-‐performance computers (>1 Pflops (~100,000 cores) -‐> 1.75 Pflops)
o And a Condor pool (w/ ~13,000 CPUs) – VisualizaBon systems – Data CollecBons (>30 PB, >100 discipline-‐specific databases) – Science Gateways – User portal – User services -‐ Help desk, training, advanced app support
• Allocated to US researchers and their collaborators through naBonal peer-‐review process
– Generally, review of compuBng, not science • Mid 2011: TeraGrid -‐-‐> XSEDE
www.ci.anl.gov www.ci.uchicago.edu
26 Advancing Science through CI – [email protected]
Campus Champions • “Champion” is a staff or faculty member on a campus that provides informaBon on
XSEDE to his/her colleagues • Currently ~160 insBtuBons represented by champions • Champions get:
– Monthly training and updates – Start-‐up accounts – Forum for sharing and interacBons – Access to informaBon on usage by
local users – RegistraBons for annual XSEDE
Conference waived • Champions do:
– Raise awareness locally – Provide training – Get users started with access quickly – Represent needs of local community – Provide feedback to improve services – Avend annual XSEDE conference – Share their training and educaBon materials – Build community across campus, and among all Champions
March 26, 2014
Revised March 22, 2014
Campus Champion Institutions Standard – 87
EPSCoR States – 51
Minority Serving Institutions – 12 EPSCoR States and Minority Serving Institutions – 8
Total Campus Champion Institutions – 158
Credit: Kay Hunt
www.ci.anl.gov www.ci.uchicago.edu
27 Advancing Science through CI – [email protected]
LONI and NaBonal Cyberinfrastructure • TeraGrid
– One of the 11 TeraGrid Resource Providers – Playing a role in TG-‐wide governance (TeraGrid Forum, ExecuBve Steering Commivee, various working groups, GIG Director of Science)
– Contributed administraBve soiware AmieGold (glue between TG account info and local info) and CS soiware (HARC, PetaShare, SAGA)
• OSG – Currently providing resources
• XSEDE – LONI not a partner in XSEDE, but a service provider
• NaBonally – Bringing in new users from the southeast US – LONI InsBtute ComputaBonal ScienBsts -‐> Campus Champions
www.ci.anl.gov www.ci.uchicago.edu
28 Advancing Science through CI – [email protected]
Create and maintain a CI ecosystem providing new capabili'es that advance and accelerate scienBfic inquiry at unprecedented complexity and scale
Support the foundaBonal research needed to conBnue to efficiently advance CI
Enable transformaBve, interdisciplinary, collaboraBve, science and engineering research and educaBon through the use of advanced CI
Transform pracBce through new policies for CI addressing challenges of academic culture, open disseminaBon and use, reproducibility and trust, curaBon, sustainability, governance, citaBon, stewardship, and avribuBon of authorship
Develop a next generaBon diverse workforce of scienBsts and engineers equipped with essenBal skills to use and develop CI, with CI used in both the research and educa'on process
NSF Vision: Infrastructure Role & Lifecycle
www.ci.anl.gov www.ci.uchicago.edu
29 Advancing Science through CI – [email protected]
Relevant NSF Programs
• EPSCoR – targeted support for states that are less successful in NSF funding
• MRI – Major Research InstrumentaBon • CIF21 (NSF’s CI umbrella)
– eXtreme Digital (XD) – Track 1 (Blue Waters) – Soiware Infrastructure for Sustained InnovaBon (SI2) – Campus Cyberinfrastructure -‐ Network Infrastructure and Engineering (CC-‐NIE)
• IntegraBve Graduate EducaBon and Research Traineeship Program (IGERT)
• General research programs
www.ci.anl.gov www.ci.uchicago.edu
30 Advancing Science through CI – [email protected]
Recap (to 2010)
• Louisiana decides that science and technology can lead to a bever future
• Builds a regional cyberinfrastructure (network, compuBng, soiware, ~data, people) that connects to naBonal-‐scale infrastructure – Using a mix of naBonal, state, and local funding
• Starts to change culture – infuse computaBon in academic departments, interdisciplinary hiring, large collaboraBve projects
• But... • Didn’t really think about data as much as we would have were we starBng again today
www.ci.anl.gov www.ci.uchicago.edu
31 Advancing Science through CI – [email protected]
• Swii is designed to compose large parallel workflows, from serial or parallel applicaBon programs, to run fast and efficiently on a variety of pla~orms
– A parallel scripBng system for Grids and clusters for loosely-‐coupled applicaBons -‐ programs (executable, shell, python, R, Octave, Matlab, etc.) linked by exchanging files
– Easy to write: simple high-‐level C-‐like funcBonal language, allows small Swii scripts to do large-‐scale work
– Easy to run: contains all services for running, in one Java applicaBon o Works on mulBcore workstaBons, HPC, Grids (interfaces to schedulers, Globus, ssh)
– A powerful, efficient, scalable and flexible execuBon engine. o Scaling O(10M) tasks – .5M in live science work, and growing o CollecBve data management being developed to opBmize I/O
• Used in earth science, neuroscience, proteomics, molecular dynamics, biochemistry, economics, staBsBcs, knowledge modeling, and more
• hvp://www.ci.uchicago.edu/swii
M. Wilde, N. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz, I. Foster, "Swii: A language for distributed parallel scripBng," Parallel CompuBng, v.37(9), pp. 633-‐652, 2011.
www.ci.anl.gov www.ci.uchicago.edu
32 Advancing Science through CI – [email protected]
Swii Programming model: all execuBon driven by parallel data flow
• analyze1() and analyze2() are computed in parallel • analyze() returns r when they are done • This parallelism is automa1c • Works recursively throughout the program’s call graph
– E.g., can embed within foreach loop, itself done in parallel – Foreach loops can be nested
(int r) analyze(int i)!{! j = analyze1(i); ! k = analyze2(i);! r = 0.5*(j + k);!}!!
www.ci.anl.gov www.ci.uchicago.edu
33 Advancing Science through CI – [email protected]
Submit host (login node, laptop, Linux server)
Data server
Swift script
Swii runBme system has drivers and algorithms to efficiently support and aggregate vastly diverse runBme environments
Swii Environment
Clouds: Amazon EC2, XSEDE Wispy, Future Grid …
Application Programs
www.ci.anl.gov www.ci.uchicago.edu
34 Advancing Science through CI – [email protected]
Globus
Big data transfer and sharing…
…with Dropbox-like simplicity… …directly from your own storage systems
Run as a non-profit service
to the non-profit research community
www.ci.anl.gov www.ci.uchicago.edu
35 Advancing Science through CI – [email protected]
Globus Users
• “I need a good place to store or backup my (big) research data, at a reasonable price.”
• “I need to easily, quickly, and reliably move or mirror porBons of my data to other places, including my campus HPC system, lab server, desktop, laptop, XSEDE, cloud, etc.”
• “I need a way to easily and securely share my data with my colleagues at other insBtuBons.”
• “I want to publish my data so that it’s available and discoverable long-‐term.”
• “I want to archive my data in case it’s needed someBme in the future.”
www.ci.anl.gov www.ci.uchicago.edu
36 Advancing Science through CI – [email protected]
Globus is SaaS
• Web, command line, and REST interfaces • Reduced IT operaBonal costs • New features automaBcally available • Consolidated support & troubleshooBng • Easy to add your laptop, server, cluster, supercomputer, etc. with Globus Connect
www.ci.anl.gov www.ci.uchicago.edu
37 Advancing Science through CI – [email protected]
Globus Connected Resources on Campus
• Research compuBng center • Department / lab storage • Campus-‐wide home/project file system • Mass Storage Systems • Science instruments • Desktops and laptops • Custom web applicaBons • Amazon Web Services S3
www.ci.anl.gov www.ci.uchicago.edu
38 Advancing Science through CI – [email protected]
Lessons • Three triangle facets (infrastructure, computaBonal, interdisciplinary) have
be taken seriously at highest levels, seen as important component of academic research
• Infrastructure need to be integrated at all levels (laboratory, campus, regional, naBonal, internaBonal) – users need to be able to easily move work and data to appropriate systems, and collaborate across locaBons
• EducaBon and training of students and faculty is crucial – vast improvements are needed over the small numbers currently reached through HPC center tutorials; computaBon and computaBonal thinking need to be part of new curricula across all disciplines
• Emphasis should be made on broadening parBcipaBon in computaBon, not just focusing on high end systems where decreasing numbers of researchers can join in, but making tools much more easily usable and intuiBve and freeing all researchers from the limitaBons of their personal workstaBons, and providing access to simple tools for large scale parameter studies, data archiving, visualizaBon and collaboraBon
• Vision needs to be consistent – cannot be just one person • Funding needs to be stable (acBviBes need to be sustainable)
www.ci.anl.gov www.ci.uchicago.edu
39 Advancing Science through CI – [email protected]
Video
• Data Sharing -‐ hvps://www.youtube.com/watch?v=N2zK3sAtr-‐4
www.ci.anl.gov www.ci.uchicago.edu
40 Advancing Science through CI – [email protected]
Sources • D. S. Katz et al., “Louisiana: A Model for Advancing Regional e-‐Science
through Cyberinfrastructure,” Philosophical TransacBons of the Royal Society A, 367(1897), 2009.
– authors from Louisiana State University, Tulane University, University of Louisiana at Lafayeve, Louisiana Tech University, Louisiana Community and Technical College System, Southern University, University of New Orleans
• G. Allen and D. S. Katz, “ComputaBonal science, infrastructure and interdisciplinary research on university campuses: experiences and lessons from the Center for ComputaBon and Technology,” NSF Workshop on Sustainable Funding and Business Models for Academic Cyberinfrastructure FaciliBes, Cornell University, 2010
• Daniel S. Katz, David Proctor, “A Framework for Discussing e-‐Research Infrastructure Sustainability,” hvp://dx.doi.org/10.6084/m9.figshare.790767, submived to Workshop on Sustainable Soiware for Science: PracBce and Experiences (hvp://wssspe.researchcompuBng.org.uk) at SC13
• Swii: Swii Team, led by Mike Wilde, hvp://www.ci.uchicago.edu/swii • Globus: Globus Team, led by Ian Foster and Steve Tuecke, hvp://
www.globus.org