56
Bridging Campuses to National Cyberinfrastructure: Experience and Perspective from the NSF Dr. Jennifer M. Schopf National Science Foundation Office of CyberInfrastructure May 3, 2009

Bridging Campuses to National Cyberinfrastructure: Experience …lifka/Downloads/SRCC/Schopf.pdf · 2010. 5. 5. · Office of Cyberinfrastructure (OCI) Support collaborative computational

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Bridging Campuses to National Cyberinfrastructure:y

    Experience and Perspective from the NSF

    Dr. Jennifer M. SchopfNational Science FoundationOffice of CyberInfrastructure

    May 3, 2009

  • Outline

    OCI mission and the CF21 Vision How NSF supports campus workSustain, Advance, Experiment

    Bridging from campus to national and back

    2

  • Framing the QuestionS i h b R l ti i d b CIScience has been Revolutionized by CI

    Modern science Data- and compute-

    intensive Integrative

    Multiscale Collabs Add’l complexity Individuals, groups,

    teams communitiesteams, communitiesMust Transition NSF

    CI approach to3

    CI approach to address these issues 3

  • NSF Vision for CyberinfrastructureNSF Vision for Cyberinfrastructure

    “National-level, integrated system of h d ft d t &hardware, software, data resources & services... to enable new paradigms of science”

    Virtual Organizations for Distributed CommunitiesDistributed Communities

    High Performance

    Data & Visualization/

    Learning & Work Force Needs

    Performance Computing

    Visualization/ Interaction

    4http://www.nsf.gov/pubs/2007/nsf0728/index.jsp

    & Opportunities

    4

  • Office of Cyberinfrastructure (OCI)

    Support collaborative computational and data scienceResearch and development of a comprehensive CIApplication of CI to solve complex problems in science,

    i i b h i l i i d d tiengineering, behavioral science, economics and education

    Provide stewardship for computational science at NSF in strong collaborations with other officesNSF, in strong collaborations with other offices, directorates, and other agencies

    S t th ti d t i i f t dSupport the preparation and training of current and future generations of researchers and educators to use Cyberinfrastructure to further research and

    5

    use Cyberinfrastructure to further research and education goals

    5

  • Cyberinfrastructure Framework for the 21st century (CF21) y ( )

    High-end computation, data, visualizationfor transformative science Facilities/centers as hubs of innovation

    MREFCs and collaborations including large-scale NSF g gcollaborative facilities, international partners

    Software, tools, science applications, and VOs critical i i ll d ito science, integrally connected to instruments

    Campuses fundamentally linked end-to-end; grids, clouds loosely coupled campus services policy toclouds, loosely coupled campus services, policy to support

    People Comprehensive approach workforce

    6

    People Comprehensive approach workforce development for 21st century science and engineering

    6

  • What is Needed?A t t tAn ecosystem, not components…

    NSF-wide CI Framework for 21stCentury Science & Engineering

    People, Sustainability, Innovation, Integration

    77

  • CyberInfrastructure Ecosystem

    ExpertiseResearch and ScholarshipEducation

    OrganizationsUniversities, schoolsGovernment labs, agenciesResearch and Med Centers

    Scientific Instruments

    Learning and Workforce Development

    Interoperability and opsCyberscience

    Research and Med CentersLibraries, MuseumsVirtual OrganizationsCommunities

    Large Facilities, MREFCs,telescopes

    Colliders, shake TablesSensor Arrays

    - Ocean, env’t, weather,

    Computational DataDatabases Data reps

    buildings, climate. etc

    DiscoveryCollaboration

    i

    Networking

    ResourcesSupercomputersClouds, Grids, ClustersVisualizationC t i

    Databases, Data reps,Collections and LibsData Access; stor., nav

    mgmt, mining tools,curation

    Education

    gCampus, national, international

    networksResearch and exp networksEnd-to-end throughput Cybersecurity

    Compute servicesData Centers

    SoftwareApplications, middlewareSoftware dev’t & support

    8

    y yCybersecurity: access,

    authorization, authen.

    Sustain, Advance, Experiment

  • What Does Sustainability Mean?

    “Ability to maintain a certain process or state” In a biological contextResources must be used at a rate at which they

    can be replenished

    In an CI contexth b d b dCreating CI that can be used in broad contexts

    (reuse)Adopting approaches to funding that encourageAdopting approaches to funding that encourage

    long-term support (beyond normal NSF grants)

    9

  • We should fund and view CyberInfrastructure as InfrastructureCyberInfrastructure as Infrastructure

    National LevelFund same as telescopes, colliders, shake tablesLine items in the directorate budgetsConstant or growing over time, reliablyFactor in “maintenance” and “replacement”NSF supports the science that a campus can’t fundNSF supports the science that a campus can’t fund

    at a sustainable level

    Campus level Campus levelCampus should fund CI the same way it does

    other infrastructure

    10

    • Libraries, phone system (clean rooms?)Constant or growing over time, reliably

  • Note:

    The answer is not more money from NSFMore money, even if we had it, which we don’t,

    won’t address the fundamental problems

    W d t d th h i We need to spend the money we have wiser We need to understand cost models and

    t i t treturn on investments

    Wh h b i ’ i i ? What are the best practices we’re missing? How can we leverage existing support?

    11

    Where could a small investment of funds have the most significant impact?

  • ACCITask Forces

    CampusBridging

    Data(Viz)

    Task ForcesBridging

    Craig StewartShenda Baker

    Tony Hey

    Timelines: 12-18 months Advising NSFSoftware

    Computing(Clouds

    D id K

    Advising NSF Workshop(s)

    R d ti Grids)David Keyes

    Thomas Zacharia

    Recommendations Input to NSF informs

    GrandChallenge

    EducationWorkforce

    CF21 programs 2011-2 CI Vision Plan

    12

    gVOs

    Alex Ramerez Tinsley Oden

  • Campus Bridging Task Force

    Goal of Virtual Proximity – as though you are one with your resources (including people)Collapse barrier of distanceRemove geographic location as an issueAll resources virtually present, accessible, secure

    L i f d d d th Leverages, informs, and depends upon the whole suite of CI elementsHPC Vis Data Software Expertise VOs etcHPC, Vis, Data, Software, Expertise, VOs, etcProvides end-to-end connectivity

    Deployment of leading edge networking

    13

    Deployment of leading edge networking infrastructure, cybersecurity to support CF21

  • Driving Forces

    Need to support the efficient pursuit of S&EMulti-domain, multi-disciplinary, multi-locationLeading edge CI network capabilitiesSeamless integration

    Need to connect Researcher to Resourcef dAccess to major scientific resources and

    instrumentsCI resource availability – at speed and in real-timeCI resource availability at speed and in real time

    • (HPC, MREFC, Data Center, Vis center, Clouds, etc)Campus environment including intra-campus

    14

    State, regional, national and international network and infrastructure transparency

  • The Shift Towards DataImplicationsImplications

    All science is becoming data-dominatedExperiment, computation, theory

    Totally new methodologiesAlgorithms, mathematicsAll disciplines from science and engineering to arts

    and humanitiesand humanities

    End-to-end networking becomes critical part of CI ecosystemof CI ecosystemCampuses, please note!

    How do we train “data-intensive” scientists?

    15

    How do we train data intensive scientists? Data policy becomes critical!

  • Preliminary Task Force (TF) Results Computing TF Workshop Interim ReportRec: Address sustainability, people, innovation

    Software TF Interim ReportRec: Address sustainability, create long term, multi-

    di l i l l fdirectorate, multi-level software program

    GCC/VO TF Interim ReportR Add t i bilit OCI t tRec: Address sustainability, OCI to nurture

    computational science across NSF units

    Software Sustainability WS (Campus Bridging) Software Sustainability WS (Campus Bridging)Rec: Open source, use sw eng practices, reproducibility

    1616

  • Innovation vs Sustainability

    Tension between:Bleeding edge & tried and trueNovel and new & dependableMight have a new way & method that always

    works

    We need a spectrum of approachesAllow broad scale innovationAllow broad scale innovationContinue to advance approachesYet sustain scientific disciplines

    17

    p

    17

  • Over-arching ApproachFor Upcoming ProgramsFor Upcoming Programs

    SustainLarge-scale “Institute”-style projects to promote

    long term approachesL t (5+ ) PI d i tit tiLong term (5+ years), many PIs, and institutionsHighly multi-disciplinary, perhaps multi-agency

    Advance AdvanceMedium-scale collaborative teams to harden and

    expand successful experimentsp pCollaborative teams, multi-year (3-5)

    Experiment

    18

    pSmaller scale, trials of new approaches

    18

  • Sustain

    19

  • Sustain

    20

  • Sustain

    2121

  • CF21 Software Infrastructure for Sustained Innovation (SI2)( )

    Significant multiscale, long-term software programPerhaps $200-300M over a decadep $

    • $10M identified in FY10 ($4M OCI/$6M Dirs)• $14M annual in OCI in future years

    C t l i ifi t f d f Di– Catalyze significant funds from Dirs

    Sustain: Connected institutes, teams, investigators Integrated into CF21 framework w/Dirs Integrated into CF21 framework w/Dirs3-6 centers, 5+5 years, for critical mass, sustainability

    Advance: Numerous teams of scientists and Advance: Numerous teams of scientists and computational and computer scientists with longer term grants

    22

    g Experiment: Many individuals w/short term

    grants, funded by OCI and directorates 22

  • Software, continued Ongoing discussions to build this program across

    NSFSome of the institutes will be discipline specificSome may be algorithm/tool themed (e.g., data,

    i )provenance, viz)All should be fundamental to other programs (e.g., SEES)Education science applications industrial partners linkedEducation, science applications, industrial partners linked

    deeply

    MREFC’s, other large facilities need to participateC s, ot e a ge ac t es eed to pa t c pate iPlant, NEON, LSST, etc…

    2323

  • Scientific Software Innovation Institutes Call for Exploratory Workshop ProposalsCall for Exploratory Workshop Proposals

    Scale and complexity beyond community experience Many unknowns: models, modes, scales, ….

    • domain, community specific aspects…• crosscutting aspects and many links…

    Must be grown bottom up in a coordinated way smaller group evolving into community wide teams and institutes

    Must leverage existing investments expertise Must leverage existing investments, expertise Collaborations across communities, disciplines and

    directorates critical

    Exploratory activities in during the summer – Call for Exploratory Workshop Proposals

    24

    p y p p http://www.nsf.gov/pubs/2010/nsf10050/nsf10050.jsp?org=NSF

  • Goals of S2I2 Workshops

    Inform NSF in its writing of the solicitation Inform the community as it responds to the

    solicitation in FY11 Provide a forum of discussions about the SI2

    vision, and S2I2 models and structures within d i iand across communities.

    25

  • Software Infrastructure for Sustained Innovation (SI2): Metrics of SuccessInnovation (SI ): Metrics of Success

    (Beyond Lines of Code)f h b d Buy-in from the broader community

    Demonstrated leverage and reuse Emergence of successful models, processes,

    architectures, metrics for S&E software –empirically validatedempirically validated

    Emergence of models and mechanisms for community sustainability of software institutescommunity sustainability of software institutes

    Accepted research agenda by academic community

    26

    community

  • Open Source

    Requirement for all current OCI programs And many others across NSF

    Strongly encourages reuse Some people think simply open source is

    enough – it’s not Necessary but not sufficient for sustainable

    software

    2727

  • Open Source software is free…

    28

    Free as in speech… free as in beer, or…

  • Open Source Software isLike a Free PuppyLike a Free Puppy

    v

    Seems like a great bargain Easy to access Easy to access Can catch you eye at a weak moment but sometimes more than you

    29

    …but sometimes more than you expected

  • Long term costs Needs love and attention May lose charm after growing up Occasional clean-ups required Many left abandoned by their owners

    30

    Many left abandoned by their owners May not be quite what you think

  • Data Programs

    DataNet: OCI Flagship Data ProgramFocus on data-level interoperability and data

    preservation

    S t i 5 C t $20M 5 ( 5) Sustain: 5 Centers, $20M, 5years (+5) Advance: eg. SDCI awards

    $ f d l f b d~3-4 year, $1-2M, support of data tools for broad set of applications and disciplines

    Experiment: eg InterOp awards Experiment: eg. InterOp awardsSmaller scale, innovative use of data for new

    communities

    31

  • 2008 DataNet Awards

    DataNet Observation Network for Earth (PI: Michener) Facilitates research on climate change and biodiversity,

    integrating earth observing networks Emphasis on user community engagement, promote data

    deposition and re-use Science question: What are the relationships among

    population density, atmospheric nitrogen, CO2, energy consumption and global temps?

    Data Conservancy (PI: Choudhury) Integrates observational data to enable scientists to identify Integrates observational data to enable scientists to identify

    causal and critical relationships in physical, biological, ecological, and social systems

    User centered design paradigm ethnographic studies

    32

    User centered design paradigm, ethnographic studies Science question: How do land and energy use in mega-cities

    impact the carbon cycle and climate change?

  • Planned CF21 HPC Program Sustain: Petascale-to-Exascale1-2 Large-scale sustainable facilitiesLikely NSF-DOE cooperation10 years (5+5)

    UIUC Petascale Facility: Advance4-5 hubs of Excellence/Innovation, people, expertiseMi t f d t d t i t i t

    UIUC Petascale Facility: $60M building!

    Mixture of data and compute-intensive centers, supporting broader array of services

    Experiment ExperimentExplore new architectures, couple with

    application/software dev

    3333

  • HPC Will Also Need

    Discipline specific connectionsMRI, Divisional, Directorate programs can be aligned to

    connect in to this NSF-wide structure• Recommended common software identity management• Recommended common software, identity management,

    policy• Data, software sharing

    How does eXtreme Digital (XD), TeraGrid Phase 3 fit in?Competition underway now

    34

    Foundation to build broader CF21 services in future at the national level

  • Outside of SW, Data, and HPC

    Postdoc program: CITracsEmphasis on helping computational scientists

    learn about CI or vice versa http://www nsf gov/pubs/2010/nsf10553/nsf10553 htm http://www.nsf.gov/pubs/2010/nsf10553/nsf10553.htm

    CI-TEAM: Training, Education, Advancement, and Mentoring for Our 21st Century WFand Mentoring for Our 21st Century WFPrepare current and future generations of

    scientists, engineers, and educatorsDesign, develop, adopt and deploy cyber-based

    tools and environments for research and learning, both formal and informal

    35

    both formal and informal http://www.nsf.gov/pubs/2010/nsf10532/nsf10532.pdf

    35

  • Sustain

    3636

  • Sustain

    Sustain

    Sustain

    3737

    Sustain

  • Track 2Track 2Track 2Track 2 SDCI

    Cross DirectorateSW D t d HPC

    DataNetDataNetDataNetDataNet

    SDCI

    SW, Data and HPC interacting

    Sustain

    DataNetDataNetDataNetDataNet

    PetaAps

    SustainPetaAps

    PetaAps

    MRIMRIMRIMRI

    Sustain

    DataNetDataNetDataNetDataNet

    PetaAps

    SDCISDCI

    MRIMRIMRIMRI

    3838

    Sustain

  • CF21 Strategy

    Driven by science and engineering Intense coupling of data sensors satellites Intense coupling of data, sensors, satellites,

    computing, visualization, grids, software, VOs; entire CI ecosystem; y

    Better campus integration Major Facilities CI planning Major Facilities CI planning Task Forces and research community

    provides guidance and inputp g p All NSF Directorates involved

    3939 Sustain, Advance, Experiment 39

  • 40

  • ARRA Catalyzed OCI Transition

    Budget Initiatives

    1 5%

    Virtual Organizations

    1.5% Budget

    Other1.79%

    1.5%

    Networking

    Software6.31% Virtual

    Organizations5.01%

    Initiatives7.69% Includes Viz

    Workforce Development

    4.06%

    Networking3.97%

    HPC21.25%

    HPC

    Data3.45%

    Workforce Development

    14.38%Software51.68%

    HPC77.21%

    Includes PetaAppsIncludes GRF,

    CAREER

    41

    FY 09 Budget (Before ARRA)

    Recovery Act Funds41

  • ARRA Catalyzed OCI Transition

    Budget Initiatives

    1 5%

    Virtual Organizations

    1.5%

    Other1.79%

    1.5%

    Networking

    Software6.31%

    Virtual Organizations

    2.45%

    Budget Initiatives

    3.50%

    Other2.37%

    Workforce Development

    4.06%

    Networking3.97%

    Software19 07%

    HPC

    Data3.45%

    HPC61.19%

    Networking2.84%

    19.07%

    HPC77.21%

    Data4.03%

    WorkforceDevelopment

    4.55%

    42

    FY 09 Budget (Before ARRA)

    4.03%

    FY 09 Budget (After ARRA)

  • OCI BUDGET BREAKDOWN

    4343

  • Underestimations (and education)

    Support costs are often underestimatedGrad student support is cheap (except when it

    isn’t)

    S C li P t i i t Space – Cooling- Power triumvirate People forget about data, networking,

    ft ( ft li i )software (software licensing) Duplication of services vs need for special

    architecturesarchitectures

    44

  • Branscomb PyramidNational to CampusNational to Campus

    45Slide from Gary Crane, http://sura.org/programs/docs/CI_White_Paper_Final.pdf

  • Branscomb PyramidNational to CampusNational to Campus

    OCI Focus

    MRI & others

    46Slide from Gary Crane, http://sura.org/programs/docs/CI_White_Paper_Final.pdf

  • “Beyond Branscomb”, Sept 2006

    47

  • Broaden Awarenessthrough CI Daysthrough CI Days

    Work with campuses to develop leadership in promoting CI to accelerate scientific discovery

    Catalyze campus-wide and regional discussions and planning

    Collaboration of Open Science Grid, Internet 2 N i l L d R il EDUCAUSE Mi i2, National Lamda Rail, EDUCAUSE, Minority Serving Institution Cyberinfrastructure Empowerment Coalition TeraGrid and localEmpowerment Coalition, TeraGrid, and local & regional organizations

    Identify Campus Champions

    48

    Identify Campus Championshttps://wiki.internet2.edu/confluence/display/cidays

  • TG Campus Champions Program

    Source of local, regional and national high performance computing and cyberinfrastructure information at home campus

    Source of information about TeraGrid Source of information about TeraGridresources and services that will benefit their campus

    Source of startup accounts to quickly get researchers and educators using their allocation of time on thetheir allocation of time on the TeraGrid resources

    Direct access to TeraGrid staff

    49

    https://www.teragrid.org/web/eot/campus_champions

  • 50

  • 51

  • An Idea from EPSCoR

    State-wide CI plans State-wide CI proposals and funding

    52

  • How to measure return on investment?investment?

    Must measure to improve Must measure to justify additional funds at all

    level

    Would love to hear suggestions!

    53

  • CyberInfrastructure Ecosystem

    ExpertiseResearch and ScholarshipEducation

    OrganizationsUniversities, schoolsGovernment labs, agenciesResearch and Med Centers

    Scientific Instruments

    Learning and Workforce Development

    Interoperability and opsCyberscience

    Research and Med CentersLibraries, MuseumsVirtual OrganizationsCommunities

    Large Facilities, MREFCs,telescopes

    Colliders, shake TablesSensor Arrays

    - Ocean, env’t, weather,

    Computational DataDatabases Data reps

    buildings, climate. etc

    DiscoveryCollaboration

    i

    Networking

    ResourcesSupercomputersClouds, Grids, ClustersVisualizationC t i

    Databases, Data reps,Collections and LibsData Access; stor., nav

    mgmt, mining tools,curation

    Education

    gCampus, national, international

    networksResearch and exp networksEnd-to-end throughput Cybersecurity

    Compute servicesData Centers

    SoftwareApplications, middlewareSoftware dev’t & support

    54

    y yCybersecurity: access,

    authorization, authen.

    Sustain, Advance, Experiment

  • Conclusions

    Campus HPC is more than just machines Posit: Better central computing attracts more

    grants (and researchers) Treat CI is infrastructure NSF continues to fund national-scale CICampus-scale CI should be part of campus

    strategic planE t hEcosystem approachSustain, Advance, Experiment

    Bridging is an urgent need55

    Bridging is an urgent need

  • More Information

    Jennifer M. Schopf [email protected] [email protected]

    Dear Colleague letter for CF21h // f / b / / f / fhttp://www.nsf.gov/pubs/2010/nsf10015/nsf10015.jsp

    Software infrastructure for sustained innovationhtt // f / b /2010/ f10551/ f10551 dfhttp://www.nsf.gov/pubs/2010/nsf10551/nsf10551.pdf S2I2 workshop DCLhttp://www nsf gov/pubs/2010/nsf10050/nsf10050 jsp

    56

    http://www.nsf.gov/pubs/2010/nsf10050/nsf10050.jsp