Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Bridging Campuses to National Cyberinfrastructure:y
Experience and Perspective from the NSF
Dr. Jennifer M. SchopfNational Science FoundationOffice of CyberInfrastructure
May 3, 2009
Outline
OCI mission and the CF21 Vision How NSF supports campus workSustain, Advance, Experiment
Bridging from campus to national and back
2
Framing the QuestionS i h b R l ti i d b CIScience has been Revolutionized by CI
Modern science Data- and compute-
intensive Integrative
Multiscale Collabs Add’l complexity Individuals, groups,
teams communitiesteams, communitiesMust Transition NSF
CI approach to3
CI approach to address these issues 3
NSF Vision for CyberinfrastructureNSF Vision for Cyberinfrastructure
“National-level, integrated system of h d ft d t &hardware, software, data resources & services... to enable new paradigms of science”
Virtual Organizations for Distributed CommunitiesDistributed Communities
High Performance
Data & Visualization/
Learning & Work Force Needs
Performance Computing
Visualization/ Interaction
4http://www.nsf.gov/pubs/2007/nsf0728/index.jsp
& Opportunities
4
Office of Cyberinfrastructure (OCI)
Support collaborative computational and data scienceResearch and development of a comprehensive CIApplication of CI to solve complex problems in science,
i i b h i l i i d d tiengineering, behavioral science, economics and education
Provide stewardship for computational science at NSF in strong collaborations with other officesNSF, in strong collaborations with other offices, directorates, and other agencies
S t th ti d t i i f t dSupport the preparation and training of current and future generations of researchers and educators to use Cyberinfrastructure to further research and
5
use Cyberinfrastructure to further research and education goals
5
Cyberinfrastructure Framework for the 21st century (CF21) y ( )
High-end computation, data, visualizationfor transformative science Facilities/centers as hubs of innovation
MREFCs and collaborations including large-scale NSF g gcollaborative facilities, international partners
Software, tools, science applications, and VOs critical i i ll d ito science, integrally connected to instruments
Campuses fundamentally linked end-to-end; grids, clouds loosely coupled campus services policy toclouds, loosely coupled campus services, policy to support
People Comprehensive approach workforce
6
People Comprehensive approach workforce development for 21st century science and engineering
6
What is Needed?A t t tAn ecosystem, not components…
NSF-wide CI Framework for 21stCentury Science & Engineering
People, Sustainability, Innovation, Integration
77
CyberInfrastructure Ecosystem
ExpertiseResearch and ScholarshipEducation
OrganizationsUniversities, schoolsGovernment labs, agenciesResearch and Med Centers
Scientific Instruments
Learning and Workforce Development
Interoperability and opsCyberscience
Research and Med CentersLibraries, MuseumsVirtual OrganizationsCommunities
Large Facilities, MREFCs,telescopes
Colliders, shake TablesSensor Arrays
- Ocean, env’t, weather,
Computational DataDatabases Data reps
buildings, climate. etc
DiscoveryCollaboration
i
Networking
ResourcesSupercomputersClouds, Grids, ClustersVisualizationC t i
Databases, Data reps,Collections and LibsData Access; stor., nav
mgmt, mining tools,curation
Education
gCampus, national, international
networksResearch and exp networksEnd-to-end throughput Cybersecurity
Compute servicesData Centers
SoftwareApplications, middlewareSoftware dev’t & support
8
y yCybersecurity: access,
authorization, authen.
Sustain, Advance, Experiment
What Does Sustainability Mean?
“Ability to maintain a certain process or state” In a biological contextResources must be used at a rate at which they
can be replenished
In an CI contexth b d b dCreating CI that can be used in broad contexts
(reuse)Adopting approaches to funding that encourageAdopting approaches to funding that encourage
long-term support (beyond normal NSF grants)
9
We should fund and view CyberInfrastructure as InfrastructureCyberInfrastructure as Infrastructure
National LevelFund same as telescopes, colliders, shake tablesLine items in the directorate budgetsConstant or growing over time, reliablyFactor in “maintenance” and “replacement”NSF supports the science that a campus can’t fundNSF supports the science that a campus can’t fund
at a sustainable level
Campus level Campus levelCampus should fund CI the same way it does
other infrastructure
10
• Libraries, phone system (clean rooms?)Constant or growing over time, reliably
Note:
The answer is not more money from NSFMore money, even if we had it, which we don’t,
won’t address the fundamental problems
W d t d th h i We need to spend the money we have wiser We need to understand cost models and
t i t treturn on investments
Wh h b i ’ i i ? What are the best practices we’re missing? How can we leverage existing support?
11
Where could a small investment of funds have the most significant impact?
ACCITask Forces
CampusBridging
Data(Viz)
Task ForcesBridging
Craig StewartShenda Baker
Tony Hey
Timelines: 12-18 months Advising NSFSoftware
Computing(Clouds
D id K
Advising NSF Workshop(s)
R d ti Grids)David Keyes
Thomas Zacharia
Recommendations Input to NSF informs
GrandChallenge
EducationWorkforce
CF21 programs 2011-2 CI Vision Plan
12
gVOs
Alex Ramerez Tinsley Oden
Campus Bridging Task Force
Goal of Virtual Proximity – as though you are one with your resources (including people)Collapse barrier of distanceRemove geographic location as an issueAll resources virtually present, accessible, secure
L i f d d d th Leverages, informs, and depends upon the whole suite of CI elementsHPC Vis Data Software Expertise VOs etcHPC, Vis, Data, Software, Expertise, VOs, etcProvides end-to-end connectivity
Deployment of leading edge networking
13
Deployment of leading edge networking infrastructure, cybersecurity to support CF21
Driving Forces
Need to support the efficient pursuit of S&EMulti-domain, multi-disciplinary, multi-locationLeading edge CI network capabilitiesSeamless integration
Need to connect Researcher to Resourcef dAccess to major scientific resources and
instrumentsCI resource availability – at speed and in real-timeCI resource availability at speed and in real time
• (HPC, MREFC, Data Center, Vis center, Clouds, etc)Campus environment including intra-campus
14
State, regional, national and international network and infrastructure transparency
The Shift Towards DataImplicationsImplications
All science is becoming data-dominatedExperiment, computation, theory
Totally new methodologiesAlgorithms, mathematicsAll disciplines from science and engineering to arts
and humanitiesand humanities
End-to-end networking becomes critical part of CI ecosystemof CI ecosystemCampuses, please note!
How do we train “data-intensive” scientists?
15
How do we train data intensive scientists? Data policy becomes critical!
Preliminary Task Force (TF) Results Computing TF Workshop Interim ReportRec: Address sustainability, people, innovation
Software TF Interim ReportRec: Address sustainability, create long term, multi-
di l i l l fdirectorate, multi-level software program
GCC/VO TF Interim ReportR Add t i bilit OCI t tRec: Address sustainability, OCI to nurture
computational science across NSF units
Software Sustainability WS (Campus Bridging) Software Sustainability WS (Campus Bridging)Rec: Open source, use sw eng practices, reproducibility
1616
Innovation vs Sustainability
Tension between:Bleeding edge & tried and trueNovel and new & dependableMight have a new way & method that always
works
We need a spectrum of approachesAllow broad scale innovationAllow broad scale innovationContinue to advance approachesYet sustain scientific disciplines
17
p
17
Over-arching ApproachFor Upcoming ProgramsFor Upcoming Programs
SustainLarge-scale “Institute”-style projects to promote
long term approachesL t (5+ ) PI d i tit tiLong term (5+ years), many PIs, and institutionsHighly multi-disciplinary, perhaps multi-agency
Advance AdvanceMedium-scale collaborative teams to harden and
expand successful experimentsp pCollaborative teams, multi-year (3-5)
Experiment
18
pSmaller scale, trials of new approaches
18
Sustain
19
Sustain
20
Sustain
2121
CF21 Software Infrastructure for Sustained Innovation (SI2)( )
Significant multiscale, long-term software programPerhaps $200-300M over a decadep $
• $10M identified in FY10 ($4M OCI/$6M Dirs)• $14M annual in OCI in future years
C t l i ifi t f d f Di– Catalyze significant funds from Dirs
Sustain: Connected institutes, teams, investigators Integrated into CF21 framework w/Dirs Integrated into CF21 framework w/Dirs3-6 centers, 5+5 years, for critical mass, sustainability
Advance: Numerous teams of scientists and Advance: Numerous teams of scientists and computational and computer scientists with longer term grants
22
g Experiment: Many individuals w/short term
grants, funded by OCI and directorates 22
Software, continued Ongoing discussions to build this program across
NSFSome of the institutes will be discipline specificSome may be algorithm/tool themed (e.g., data,
i )provenance, viz)All should be fundamental to other programs (e.g., SEES)Education science applications industrial partners linkedEducation, science applications, industrial partners linked
deeply
MREFC’s, other large facilities need to participateC s, ot e a ge ac t es eed to pa t c pate iPlant, NEON, LSST, etc…
2323
Scientific Software Innovation Institutes Call for Exploratory Workshop ProposalsCall for Exploratory Workshop Proposals
Scale and complexity beyond community experience Many unknowns: models, modes, scales, ….
• domain, community specific aspects…• crosscutting aspects and many links…
Must be grown bottom up in a coordinated way smaller group evolving into community wide teams and institutes
Must leverage existing investments expertise Must leverage existing investments, expertise Collaborations across communities, disciplines and
directorates critical
Exploratory activities in during the summer – Call for Exploratory Workshop Proposals
24
p y p p http://www.nsf.gov/pubs/2010/nsf10050/nsf10050.jsp?org=NSF
Goals of S2I2 Workshops
Inform NSF in its writing of the solicitation Inform the community as it responds to the
solicitation in FY11 Provide a forum of discussions about the SI2
vision, and S2I2 models and structures within d i iand across communities.
25
Software Infrastructure for Sustained Innovation (SI2): Metrics of SuccessInnovation (SI ): Metrics of Success
(Beyond Lines of Code)f h b d Buy-in from the broader community
Demonstrated leverage and reuse Emergence of successful models, processes,
architectures, metrics for S&E software –empirically validatedempirically validated
Emergence of models and mechanisms for community sustainability of software institutescommunity sustainability of software institutes
Accepted research agenda by academic community
26
community
Open Source
Requirement for all current OCI programs And many others across NSF
Strongly encourages reuse Some people think simply open source is
enough – it’s not Necessary but not sufficient for sustainable
software
2727
Open Source software is free…
28
Free as in speech… free as in beer, or…
Open Source Software isLike a Free PuppyLike a Free Puppy
v
Seems like a great bargain Easy to access Easy to access Can catch you eye at a weak moment but sometimes more than you
29
…but sometimes more than you expected
Long term costs Needs love and attention May lose charm after growing up Occasional clean-ups required Many left abandoned by their owners
30
Many left abandoned by their owners May not be quite what you think
Data Programs
DataNet: OCI Flagship Data ProgramFocus on data-level interoperability and data
preservation
S t i 5 C t $20M 5 ( 5) Sustain: 5 Centers, $20M, 5years (+5) Advance: eg. SDCI awards
$ f d l f b d~3-4 year, $1-2M, support of data tools for broad set of applications and disciplines
Experiment: eg InterOp awards Experiment: eg. InterOp awardsSmaller scale, innovative use of data for new
communities
31
2008 DataNet Awards
DataNet Observation Network for Earth (PI: Michener) Facilitates research on climate change and biodiversity,
integrating earth observing networks Emphasis on user community engagement, promote data
deposition and re-use Science question: What are the relationships among
population density, atmospheric nitrogen, CO2, energy consumption and global temps?
Data Conservancy (PI: Choudhury) Integrates observational data to enable scientists to identify Integrates observational data to enable scientists to identify
causal and critical relationships in physical, biological, ecological, and social systems
User centered design paradigm ethnographic studies
32
User centered design paradigm, ethnographic studies Science question: How do land and energy use in mega-cities
impact the carbon cycle and climate change?
Planned CF21 HPC Program Sustain: Petascale-to-Exascale1-2 Large-scale sustainable facilitiesLikely NSF-DOE cooperation10 years (5+5)
UIUC Petascale Facility: Advance4-5 hubs of Excellence/Innovation, people, expertiseMi t f d t d t i t i t
UIUC Petascale Facility: $60M building!
Mixture of data and compute-intensive centers, supporting broader array of services
Experiment ExperimentExplore new architectures, couple with
application/software dev
3333
HPC Will Also Need
Discipline specific connectionsMRI, Divisional, Directorate programs can be aligned to
connect in to this NSF-wide structure• Recommended common software identity management• Recommended common software, identity management,
policy• Data, software sharing
How does eXtreme Digital (XD), TeraGrid Phase 3 fit in?Competition underway now
34
Foundation to build broader CF21 services in future at the national level
Outside of SW, Data, and HPC
Postdoc program: CITracsEmphasis on helping computational scientists
learn about CI or vice versa http://www nsf gov/pubs/2010/nsf10553/nsf10553 htm http://www.nsf.gov/pubs/2010/nsf10553/nsf10553.htm
CI-TEAM: Training, Education, Advancement, and Mentoring for Our 21st Century WFand Mentoring for Our 21st Century WFPrepare current and future generations of
scientists, engineers, and educatorsDesign, develop, adopt and deploy cyber-based
tools and environments for research and learning, both formal and informal
35
both formal and informal http://www.nsf.gov/pubs/2010/nsf10532/nsf10532.pdf
35
Sustain
3636
Sustain
Sustain
Sustain
3737
Sustain
Track 2Track 2Track 2Track 2 SDCI
Cross DirectorateSW D t d HPC
DataNetDataNetDataNetDataNet
SDCI
SW, Data and HPC interacting
Sustain
DataNetDataNetDataNetDataNet
PetaAps
SustainPetaAps
PetaAps
MRIMRIMRIMRI
Sustain
DataNetDataNetDataNetDataNet
PetaAps
SDCISDCI
MRIMRIMRIMRI
3838
Sustain
CF21 Strategy
Driven by science and engineering Intense coupling of data sensors satellites Intense coupling of data, sensors, satellites,
computing, visualization, grids, software, VOs; entire CI ecosystem; y
Better campus integration Major Facilities CI planning Major Facilities CI planning Task Forces and research community
provides guidance and inputp g p All NSF Directorates involved
3939 Sustain, Advance, Experiment 39
40
ARRA Catalyzed OCI Transition
Budget Initiatives
1 5%
Virtual Organizations
1.5% Budget
Other1.79%
1.5%
Networking
Software6.31% Virtual
Organizations5.01%
Initiatives7.69% Includes Viz
Workforce Development
4.06%
Networking3.97%
HPC21.25%
HPC
Data3.45%
Workforce Development
14.38%Software51.68%
HPC77.21%
Includes PetaAppsIncludes GRF,
CAREER
41
FY 09 Budget (Before ARRA)
Recovery Act Funds41
ARRA Catalyzed OCI Transition
Budget Initiatives
1 5%
Virtual Organizations
1.5%
Other1.79%
1.5%
Networking
Software6.31%
Virtual Organizations
2.45%
Budget Initiatives
3.50%
Other2.37%
Workforce Development
4.06%
Networking3.97%
Software19 07%
HPC
Data3.45%
HPC61.19%
Networking2.84%
19.07%
HPC77.21%
Data4.03%
WorkforceDevelopment
4.55%
42
FY 09 Budget (Before ARRA)
4.03%
FY 09 Budget (After ARRA)
OCI BUDGET BREAKDOWN
4343
Underestimations (and education)
Support costs are often underestimatedGrad student support is cheap (except when it
isn’t)
S C li P t i i t Space – Cooling- Power triumvirate People forget about data, networking,
ft ( ft li i )software (software licensing) Duplication of services vs need for special
architecturesarchitectures
44
Branscomb PyramidNational to CampusNational to Campus
45Slide from Gary Crane, http://sura.org/programs/docs/CI_White_Paper_Final.pdf
Branscomb PyramidNational to CampusNational to Campus
OCI Focus
MRI & others
46Slide from Gary Crane, http://sura.org/programs/docs/CI_White_Paper_Final.pdf
“Beyond Branscomb”, Sept 2006
47
Broaden Awarenessthrough CI Daysthrough CI Days
Work with campuses to develop leadership in promoting CI to accelerate scientific discovery
Catalyze campus-wide and regional discussions and planning
Collaboration of Open Science Grid, Internet 2 N i l L d R il EDUCAUSE Mi i2, National Lamda Rail, EDUCAUSE, Minority Serving Institution Cyberinfrastructure Empowerment Coalition TeraGrid and localEmpowerment Coalition, TeraGrid, and local & regional organizations
Identify Campus Champions
48
Identify Campus Championshttps://wiki.internet2.edu/confluence/display/cidays
TG Campus Champions Program
Source of local, regional and national high performance computing and cyberinfrastructure information at home campus
Source of information about TeraGrid Source of information about TeraGridresources and services that will benefit their campus
Source of startup accounts to quickly get researchers and educators using their allocation of time on thetheir allocation of time on the TeraGrid resources
Direct access to TeraGrid staff
49
https://www.teragrid.org/web/eot/campus_champions
50
51
An Idea from EPSCoR
State-wide CI plans State-wide CI proposals and funding
52
How to measure return on investment?investment?
Must measure to improve Must measure to justify additional funds at all
level
Would love to hear suggestions!
53
CyberInfrastructure Ecosystem
ExpertiseResearch and ScholarshipEducation
OrganizationsUniversities, schoolsGovernment labs, agenciesResearch and Med Centers
Scientific Instruments
Learning and Workforce Development
Interoperability and opsCyberscience
Research and Med CentersLibraries, MuseumsVirtual OrganizationsCommunities
Large Facilities, MREFCs,telescopes
Colliders, shake TablesSensor Arrays
- Ocean, env’t, weather,
Computational DataDatabases Data reps
buildings, climate. etc
DiscoveryCollaboration
i
Networking
ResourcesSupercomputersClouds, Grids, ClustersVisualizationC t i
Databases, Data reps,Collections and LibsData Access; stor., nav
mgmt, mining tools,curation
Education
gCampus, national, international
networksResearch and exp networksEnd-to-end throughput Cybersecurity
Compute servicesData Centers
SoftwareApplications, middlewareSoftware dev’t & support
54
y yCybersecurity: access,
authorization, authen.
Sustain, Advance, Experiment
Conclusions
Campus HPC is more than just machines Posit: Better central computing attracts more
grants (and researchers) Treat CI is infrastructure NSF continues to fund national-scale CICampus-scale CI should be part of campus
strategic planE t hEcosystem approachSustain, Advance, Experiment
Bridging is an urgent need55
Bridging is an urgent need
More Information
Jennifer M. Schopf [email protected] [email protected]
Dear Colleague letter for CF21h // f / b / / f / fhttp://www.nsf.gov/pubs/2010/nsf10015/nsf10015.jsp
Software infrastructure for sustained innovationhtt // f / b /2010/ f10551/ f10551 dfhttp://www.nsf.gov/pubs/2010/nsf10551/nsf10551.pdf S2I2 workshop DCLhttp://www nsf gov/pubs/2010/nsf10050/nsf10050 jsp
56
http://www.nsf.gov/pubs/2010/nsf10050/nsf10050.jsp