View
162
Download
0
Category
Preview:
Citation preview
“Toward A National Big Data Superhighway”
Closing Kenote
Internet2 Global Summit
Washington, DC
April 26, 2017
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net1
Abstract
Research in data-intensive fields is increasingly multi-investigator and multi-institutional,
depending on ever more rapid access to ultra-large heterogeneous and widely
distributed datasets. The Pacific Research Platform (PRP) is an NSF-funded research
project which extends NSF-funded campus Science DMZs to a regional model, built on
the CENIC/Pacific Wave backbone, establishing a science-driven high-capacity data-
centric "freeway system." The PRP spans all 10 campuses of the University of
California, as well as the major California private research universities, four
supercomputer centers, and several universities outside California. Fifteen multi-campus
data-intensive application teams, including particle physics, astronomy/astrophysics,
earth sciences, biomedicine, and scalable multimedia, act as drivers of the PRP,
providing feedback over the five years to the technical design staff. Over the next three
years, PRP will examine sustainable methods for expanding such regional networks to a
national scale.
Vision: Creating a West Coast “Big Data Freeway”
Connected by CENIC/Pacific Wave to Internet2 & GLIF
Use Lightpaths to Connect
Big Data Generators and Consumers,
Creating a “Big Data” Freeway
Integrated With High Performance Global Networks
“The Bisection Bandwidth of a Cluster Interconnect,
but Deployed on a 20-Campus Scale.”
This Vision Has Been Building for Over a Decade
NSF’s OptIPuter Project: Using Supernetworks
to Meet the Needs of Data-Intensive Researchers
OptIPortal–
Termination
Device
for the
OptIPuter
Global
Backplane
Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PIUniv. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST
Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent
2003-2009
$13,500,000
In August 2003,
Jason Leigh and his
students used
RBUDP to blast
data from NCSA to
SDSC over the
TeraGrid DTFnet,
achieving18Gbps
file transfer out of
the available
20Gbps
LS Slide 2005
DOE ESnet’s Science DMZ: A Scalable Network
Design Model for Optimizing Science Data Transfers
• A Science DMZ integrates 4 key concepts into a unified whole:
– A network architecture designed for high-performance applications,
with the science network distinct from the general-purpose network
– The use of dedicated systems as data transfer nodes (DTNs)
– Performance measurement and network testing systems that are
regularly used to characterize and troubleshoot the network
– Security policies and enforcement mechanisms that are tailored for
high performance science environments
http://fasterdata.es.net/science-dmz/
Science DMZ
Coined 2010
The DOE ESnet Science DMZ and the NSF “Campus Bridging” Taskforce Report Formed the Basis
for the NSF Campus Cyberinfrastructure Network Infrastructure and Engineering (CC-NIE) Program
Based on Community Input and on ESnet’s Science DMZ Concept,
NSF Has Funded Over 100 Campuses to Build Local Big Data Freeways
Red 2012 CC-NIE Awardees
Yellow 2013 CC-NIE Awardees
Green 2014 CC*IIE Awardees
Blue 2015 CC*DNI Awardees
Purple Multiple Time Awardees
Source: NSF
I Believe as Greg Bell Has Said
We Should Engineer the Network as an Instrument of Discovery
It is all about the end users!
We Must Optimize The Instrument
For Multi-Campus Collaborating Application Teams
How CC-NIE Prism@UCSD Grant Transforms Big Data Microbiome Science:
Preparing for Knight/Smarr 1 Million Core-Hour Analysis
12 Cores/GPU
128 GB RAM
3.5 TB SSD
48TB Disk
10Gbps NIC
Knight Lab
FIONA
10Gbps
Gordon
Prism@UCSD
Data Oasis
7.5PB,
200GB/s
Knight 1024 Cluster
In SDSC Co-Lo
CHERuB
100Gbps
Emperor & Other Vis Tools
64Mpixel Data Analysis Wall
120Gbps
40Gbps
1.3Tbps
The Next Logical Step:
Build a Regional DMZ by Connecting West Coast Campus DMZs
• May 2014 LS Gives Invited Presentation to UC IT Leadership Council
– Strong Support from UC and UCOP CIOs
• July 2014 LS Gives Invited Talk to CENIC Annual Retreat
– CENIC/PW Agrees to Act as Backplane
– CIO Support Extends to CA Private Research Universities
• December 2014 UCOP CIO and VPR’s Provide PRP “Momentum Money”
• January 2015 Kickoff of PRPv0 by Network Engineers
– Begins Every Two Week Conference Calls, Now Weekly
• March 2015 LS Invited “Blue Sky” Presentation to UC VCR/CIO Summit
– NSF PRP Proposal Submitted With Letters of Commitment From:
– 50 Researchers from 15 Campuses
– 32 IT/Network Organization Leaders
The Pacific Research Platform:
a Working End-to-End Science-Driven Regional DMZ-Connector
NSF CC*DNI Grant
$5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2
Co-Pis:
• Camille Crittenden, UC Berkeley CITRIS,
• Tom DeFanti, UC San Diego Calit2,
• Philip Papadopoulos, UCSD SDSC,
• Frank Wuerthwein, UCSD Physics and SDSC
(GDC)
PRP is Built on CENIC/Pacific Wave
Our Prototype System – Built for for Scientists
Out of a Bunch of Independently Managed Networks
• Challenge:
– Campus DMZs, Regional (e.g., CENIC), National (Internet2), International
Networks (e.g., GLIF) are Individually-Architected Systems
• How Do They Work Together with Predictable Performance?
• PRP is Focused on Disk-to-Disk Data Movement
– From the Eyes of Domain Scientists
– End-to-End for Their Data is Their Only Real Metric of Concern (As it Should Be)
Source: Phil Papadopoulos
PRP Science DMZ Data Transfer Nodes (DTNs) -
Flash I/O Network Appliances (FIONAs)
UCSD Designed FIONAs
To Solve the Disk-to-Disk
Data Transfer Problem
at Full Speed
on 10G, 40G and 100G Networks
FIONAS—10/40G, $8,000FIONette—1G, $1,000
Phil Papadopoulos, SDSC &
Tom DeFanti, Joe Keefe & John Graham, Calit2
John Graham, Calit2
More Than 30 PRP Installed FIONAs:
Customized to the Needs of Application Teams
• Data Transfer Nodes
– 1, 10, 40, and 100Gb/s NICs
• Storage Transfer Nodes
– Up to 160TB of Rotating Disks
– Nonvolatile Memory Disks (NVMe - 10x Faster than Flash)
– ½ PB Flash Disk (at SC15, on Loan From Vendor)
• Compute Transfer Nodes
– 12-48 Intel CPU Cores
– 1-8 GPUs (Delivers Up to 500,000 GPU Core Hours/Day)
• Visualization Transfer Nodes
– 3-45 Tiled displays (up to 180 Megapixels, 2D & 3D)
– 360-Megapixel SunCAVE Coming Soon
PRP Continues to Expand Rapidly While Increasing Connectivity:
1 1/2 Years of Progress – 12 Sites to 24 Sites
January 29, 2016
Connected 24 DMZ FIONAs
at 10G and 40G
April 24, 2017
Source: John Graham, Calit2
We Measure FIONA Disk-to-Disk Throughput with 10GB File Transfer
4 Times Per Day in Both Directions for All PRP Sites
See Time Lapse Movie Jan 2016 to Today
http://prp-maddash.calit2.optiputer.net/optiputer/optiputer.mp4
We Have Held a Number of
PRP Science Engagement Workshops
Source: Camille Crittenden, UC Berkeley
UC San DiegoUC Merced
UC Davis UC Berkeley
PRP’s First 1.5 Years:
Connecting Campus Application Teams and Devices
We Scale the Working PRP by Providing Multi-Campus Application Teams
With Disk-to-Disk Measurements
UIC
UCSD
UCI
U Hawaii
USC
NCAR
SDSU
LHC Rearchers Look to PRP to Fix the Last Mile Architecture in California:
Data and Compute Resources Can Both Be Shared
PRP provides an Implementation of All This on a Single FIONA,
PRP helps Integrate Local Resources into This FIONA.
login nodes
compute
scheduler
compute cluster
storage clusterDTN
CTN
WAN
CTN = compute transfer node
DTN = data transfer node
Science DMZ
Source: Frank Wuerthwein, UCSD, SDSC
>360 California Scientists Are Researching
Particle Physics Big Data Analysis
• ATLAS
– UCB/LBNL (63)
– SLAC/Stanford (51)
– UCSC (30)
– UCI (32)• Total of 176 members listed in
ATLAS HR database at CERN
• CMS (Members)
– Caltech (29)
– LLNL (3)
– UCD (41)
– UCLA (17)
– UCR (25)
– UCSD (36)
– UCSB (35)• Total of 186 members listed in CMS
HR database at CERN
Source: Frank Wuerthwein, UCSD, SDSC
LHC Computing and Data Resources
10 Institutions
• ATLAS Institutions
– SLAC “T2”
– NERSC (used by both)
– UCSC T3
– UCI T3
• CMS Institutions
– Caltech T2
– UCSD T2
– SDSC (used by both)
– UCD T3
– UCR T3
– UCSB T3
Lots of Potential Network Traffic for LHC on PRP
Source: Frank Wuerthwein, UCSD, SDSC
100 Gbps FIONA at UCSC Connects the UCSC Hyades Cluster
to the NERSC Supercomputer at LBNL
Supporting UCSC Remote Access
to Large Data Subsets
of the Dark Energy Spectroscopic Instrument (DESI)
and AGORA Galaxy Simulation Data
Produced at NERSC.
250 images per night
800GB per night
Shawfeng Dong, UCSC Cyberengineer
UCSC Feb 7, 2017
40G FIONAs
20x40G PRP-connected
WAVE@UC San Diego
PRP Now Enables
Distributed Virtual Reality
PRP
WAVE @UC Merced
Transferring 5 CAVEcam Images from UCSD to UC Merced:
2 Gigabytes now takes 2 Seconds (8 Gb/sec)
PRP Will Link the Laboratories of
the Pacific Earthquake Engineering Research Center
http://peer.berkeley.edu/
PEER Labs: UC Berkeley, Caltech, Stanford,
UC Davis, UC San Diego, and UC Los Angeles
John Graham Installing FIONette at PEER Feb 10, 2017
Cancer Genomics Hub (UCSC) is Housed in SDSC:
Large Data Flows to End Users at UCSC, UCB, UCSF, …
1G
8G
Data Source: David Haussler,
Brad Smith, UCSC
15GJan 2016
30,000 TB
Per Year
NIH’s Cancer Genomics Database Moved
So the PRP Deployed a FIONA to Chicago’s MREN
The Prototype PRP Has Attracted
New Application Drivers-More in Next Larry and Scott Talks
Scott Sellars, Marty Ralph
Center for Western Weather and Water Extremes
Frank Vernon - Expansion of HPWREN
Tom Levy, Cultural Heritage
Cryo EM
GPU JupyterHub:
2 x 14-core CPUs
256GB RAM
1.2TB FLASH
3.8TB SSD
Nvidia K80 GPU
Dual 40GbE NICs
And a Trusted Platform
Module
GPU JupyterHub:
1 x 18-core CPUs
128GB RAM
3.8TB SSD
Nvidia K80 GPU
Dual 40GbE NICs
And a Trusted Platform
Module
PRP UC-JupyterHub Backbone
UCB Next Step: Deploy Across PRP UCSDSource: John Graham, Calit2
Atmospheric
Rivers
(fall and winter)
Southwest
Monsoon
(summer & fall)
Great Plains Convection
(spring and summer)
Front Range Upslope
(rain/snow)
Funded collaborations
CW3E Based at UCSD/Scripps Oceanography
CW3E-Northat Sonoma
County Water
Agency
Key Phenomena Causing Extreme Precipitation in the Western U.S. (Ralph et al.
2014)
Director: F. Martin Ralph Website: cw3e.ucsd.edu
Data is at the heart of what we do!
• High resolution numerical models
• Satellite images
• Ground based weather stations
• Weather radar
• Historical climate data
Big Data Collaboration with:
Source: Scott Sellers, CW3E
Collaboration on Atmospheric Water
Between UC San Diego and UC Irvine
Director, Soroosh Sorooshian, UCSD Website http://chrs.web.uci.edu
Calit2’s FIONA
SDSC’s COMET
Calit2’s FIONA
Pacific Research Platform (10-100 Gb/s)
GPUsGPUs
Complete workflow time: 20 days20 hrs20 Minutes!
UC, Irvine UC, San Diego
Improvement of Over 1000x With PRP
Cryo-electron Microscopy (cryo-EM)
Has Driven a “Resolution Revolution” in the Last Five Years
Exposure (every 60 seconds):
X & Y dimensions: 7420 x 7676 Pixels
Frames per Movie: 10 - 50
Size: 3 - 10 GB per Movie
Every 24 hours:
Number of Movies: ~1400
Data Size: ~5 TB
Typical Datasets:
Length of Time: 2 - 6 Days
Total size: 10 - 30 TB
Each Cryo-EM ‘Image’ is Actually a Movie
Source: Michael A. Cianfrocco,
Elizabeth Villa, & Andres Leschziner, UCSD
Using PRP to Connect Cryo-EM across California
With End Users and Computational Facilities
Long term:
‣Partner with Cryo-EM Facilities to Stream Data
Straight from Microscopes (over PRP) to SDSC
‣Perform All Cryo-EM Analysis (from Micrographs
to 3D Models) via Web Browser on SDSC
‣Expand Computing to Other XSEDE Resources
(e.g. Xstream) and DOE’s NERSC
Short term:
‣Provide 2D and 3D Analysis on Particle Stacks on
Comet at SDSC
Source: Michael A. Cianfrocco, UCSD
**
SDSC
NERSC
Xstream
3 Supercomputer Centers
cosmic-cryoem.org
~20 Microscopes in CA
UCLA
UC Davis
UC Santa Cruz
SF Bay
UC Berkeley, LBNL,
UCSF, Stanford
San Diego
UCSD, TSRI, Salk*
Linking Cultural Heritage and Archaeology Datasets
at UCB, UCLA, UCM and UCSD with CAVEkiosks
48 Megapixel CAVEkiosk
UCSD Library
48 Megapixel CAVEkiosk
UCB Library24 Megapixel CAVEkiosk
UCM Library
PRP is the Platform Chosen for 2017 Expansion
of HPWREN, Connected to CENIC, into Orange and Riverside Counties
• PRP CENIC 100G Link
UCSD to SDSU
– DTN FIONAs Endpoints
– Data Redundancy
– Disaster Recovery
– High Availability
– Network Redundancy
• Anchor to CENIC at UCI
– PRP FIONA Connects to
CalREN-HPR Network
– Data Replication Site
• Potential Future UCR
CENIC Anchor
UCR
UCI
UCSD
SDSU
Source: Frank Vernon,
Greg Hidley, UCSD
Proposed Cognitive Hardware and Software Ecosystem
On the Pacific Research Platform
• Working With 30 CSE Machine Learning Researchers
– Goal is 320 Game GPUs in 32-40 FIONAs at 10 PRP Campuses
– PRP Couples FIONAs with GPUs into a Condor-Managed Cloud
• PRP Access to Emerging Processors
– IBM TrueNorth, KnuEdge, FPGA, and Qualcomm Snapdragon
• Software Including a Wide Range of Open ML Algorithms
• Metrics for Performance of Processors and Algorithms
Source: Tom DeFanti, Calit2FIONA with 8-Game GPUs
We are Now Investigating
How the PRP Prototype Might Be Extended to National-Scale
From the text of the PRP cooperative agreement:
After approximately 18 (or TBD) months, a site visit and comprehensive review of
progress towards meeting project milestones and goals and overall performance and
management processes will take place, including user community relationships,
scientific impacts, and the status of the project as a model for potential future
national-scale, network-aware, data-focused cyberinfrastructure attributes,
approaches, and capabilities.
Expanding to National Research Platform and Global Research Platform
Via CENIC/Pacific Wave, Internet2, and International Links
PRP’s Current
International
Partners
Korea Shows Distance is Not the Barrier
to Above 5Gb/s Disk-to-Disk Performance
PRP Working on Connecting Guam
via the University of Oregon-Based Network Startup Resource Center
The PRP shipped a FIONette
to CENIC’s John Hess
to be Installed in Guam Mid-May
To support projects in:
• Geography
• Climate History
• Guam EPSCoR
• The UOG Marine Laboratory
“During the quarter century that this group has been helping to build internet infrastructure
around the world, there’s hardly a place on the planet that has not been touched
by the great work of the Network Startup Resource Center,” -- Larry Smarr.
PRP is Partnering with the Advanced CyberInfrastructure –
Research and Education Facilitators (ACI-REF) NSF Grant to Explore Extension
PRP Connected
ACI-REF has also spawned the 28-member Campus Research Computing consortium (CaRC), funded by the NSF as a Research Coordination Network (RCN).
CaRC is dedicated to sharing best practices, expertise, and resources, enabling the advancement of campus- based research computing activities around the nation.
Jim Bottum, Principal Investigator
ACI-REF
CaRC
Announcing the First National Research Platform
Workshop August 7-8, 2017
Co-Chairs:
Larry Smarr, Calit2
& Jim Bottum, Internet2
See pacificresearchplatform.org
for Registration Information
Toward a National Research Platform
PRP has 3 FTEs to Connect ~25 Campuses.
How Many are Needed to Expand to a NRP
Serving Researchers at 250 Campuses in Dozens of Fields?
What is the Path Forward?
As Internet2 Board of Trustees Member
John Evans Said to Me Last Night:
“We Are Near an Inflection Point.”
Our Support:
• US National Science Foundation (NSF) awards CNS 0821155 and
CNS-1338192, CNS-1456638, ACI-1540112, and ACI-1541349
• University of California Office of the President CIO
• UCSD Chancellor’s Integrated Digital Infrastructure Program
• UCSD Next Generation Networking initiative
• Calit2 and Calit2 Qualcomm Institute
• CENIC, PacificWave and StarLight
• DOE ESnet
Recommended