32
“Building a Regional 100G Collaboration Infrastructure” Keynote Presentation CineGrid International Workshop 2015 Calit2’s Qualcomm Institute University of California, San Diego December 11, 2015 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1

Building a Regional 100G Collaboration Infrastructure

Embed Size (px)

Citation preview

Page 1: Building a Regional 100G Collaboration Infrastructure

“Building a Regional 100G Collaboration Infrastructure”

Keynote PresentationCineGrid International Workshop 2015

Calit2’s Qualcomm InstituteUniversity of California, San Diego

December 11, 2015

Dr. Larry SmarrDirector, California Institute for Telecommunications and Information Technology

Harry E. Gruber Professor, Dept. of Computer Science and Engineering

Jacobs School of Engineering, UCSDhttp://lsmarr.calit2.net 1

Page 2: Building a Regional 100G Collaboration Infrastructure

Vision: Creating a West Coast “Big Data Freeway” Connected by CENIC/Pacific Wave to Internet2 & GLIF

Use Lightpaths to Connect All Data Generators and Consumers,

Creating a “Big Data” FreewayIntegrated With High Performance Global Networks

“The Bisection Bandwidth of a Cluster Interconnect, but Deployed on a 20-Campus Scale.”

This Vision Has Been Building for 25 Years

Page 3: Building a Regional 100G Collaboration Infrastructure

Interactive Supercomputing End-to-End Prototype: Using Analog Communications to Prototype the Fiber Optic Future

“We’re using satellite technology…to demo what It might be like to have high-speed fiber-optic links between advanced computers in two different geographic locations.”― Al Gore, Senator

Chair, US Senate Subcommittee on Science, Technology and Space

Illinois

Boston

SIGGRAPH 1989“What we really have to do is eliminate distance between individuals who want to interact with other people and with other computers.”― Larry Smarr, Director, NCSA

Page 4: Building a Regional 100G Collaboration Infrastructure

NSF’s OptIPuter Project: Using Supernetworks to Meet the Needs of Data-Intensive Researchers

OptIPortal– Termination

Device for the

OptIPuter Global

Backplane

Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PIUniv. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST

Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent

2003-2009 $13,500,000

In August 2003, Jason Leigh and his

students used RBUDP to blast data from NCSA to SDSC

over theTeraGrid DTFnet,

achieving18Gbps file transfer out of the available 20Gbps

LS Slide 2005

Page 5: Building a Regional 100G Collaboration Infrastructure

Integrated “OptIPlatform” Cyberinfrastructure System:A 10Gbps Lightpath Cloud

National LambdaRail

CampusOpticalSwitch

Data Repositories & Clusters

HPC

HD/4k Video Images

HD/4k Video Cams

End User OptIPortal

10G Lightpath

HD/4k TelepresenceInstruments

LS 2009 Slide

Page 6: Building a Regional 100G Collaboration Infrastructure

So Why Don’t We Have a NationalBig Data Cyberinfrastructure?

“Research is being stalled by ‘information overload,’ Mr. Bement said, because data from digital instruments are piling up far faster than researchers can study. In particular, he said, campus networks need to be improved. High-speed data lines crossing the nation are the equivalent of six-lane superhighways, he said. But networks at colleges and universities are not so capable. “Those massive conduits are reduced to two-lane roads at most college and university campuses,” he said. Improving cyberinfrastructure, he said, “will transform the capabilities of campus-based scientists.”-- Arden Bement, the director of the National Science Foundation May 2005

Page 7: Building a Regional 100G Collaboration Infrastructure

DOE ESnet’s Science DMZ: A Scalable Network Design Model for Optimizing Science Data Transfers

• A Science DMZ integrates 4 key concepts into a unified whole:– A network architecture designed for high-performance applications,

with the science network distinct from the general-purpose network

– The use of dedicated systems for data transfer

– Performance measurement and network testing systems that are regularly used to characterize and troubleshoot the network

– Security policies and enforcement mechanisms that are tailored for high performance science environments

http://fasterdata.es.net/science-dmz/Science DMZCoined 2010

The DOE ESnet Science DMZ and the NSF “Campus Bridging” Taskforce Report Formed the Basis for the NSF Campus Cyberinfrastructure Network Infrastructure and Engineering (CC-NIE) Program

Page 8: Building a Regional 100G Collaboration Infrastructure

Creating a “Big Data” Freeway on Campus:NSF-Funded CC-NIE Grants Prism@UCSD and CHeruB

Prism@UCSD, Phil Papadopoulos, SDSC, Calit2, PI (2013-15)CHERuB, Mike Norman, SDSC PI

CHERuB

Page 9: Building a Regional 100G Collaboration Infrastructure

A UCSD Integrated Digital Infrastructure Project for Big Data Requirements of Rob Knight’s Lab – PRP Does This on a Sub-National Scale

FIONA12 Cores/GPU128 GB RAM3.5 TB SSD48TB Disk

10Gbps NIC

Knight Lab

10Gbps

Gordon

Prism@UCSD

Data Oasis7.5PB,

200GB/s

Knight 1024 ClusterIn SDSC Co-Lo

CHERuB100Gbps

Emperor & Other Vis Tools

64Mpixel Data Analysis Wall

120Gbps

40Gbps

1.3Tbps

Page 10: Building a Regional 100G Collaboration Infrastructure

Based on Community Input and on ESnet’s Science DMZ Concept,NSF Has Funded Over 100 Campuses to Build Local Big Data Freeways

2012-2015 CC-NIE / CC*IIE / CC*DNI PROGRAMS

Red 2012 CC-NIE AwardeesYellow 2013 CC-NIE AwardeesGreen 2014 CC*IIE AwardeesBlue 2015 CC*DNI AwardeesPurple Multiple Time Awardees

Source: NSF

Page 11: Building a Regional 100G Collaboration Infrastructure

The Pacific Research Platform Creates a Regional End-to-End Science-Driven “Big Data Freeway System”

NSF CC*DNI Grant$5M 10/2015-10/2020

PI: Larry Smarr, UC San Diego Calit2Co-Pis:• Camille Crittenden, UC Berkeley CITRIS, • Tom DeFanti, UC San Diego Calit2, • Philip Papadopoulos, UC San Diego SDSC, • Frank Wuerthwein, UC San Diego Physics

and SDSC

Page 12: Building a Regional 100G Collaboration Infrastructure

What About the Cloud?• PRP Connects with the 2 NSF Experimental Cloud Grants

– Chameleon Through Chicago– CloudLab Through Clemson

• CENIC/PW Has Multiple 10Gbps into Amazon Web Services– First 10Gbps Connection 5-10 Years Ago– Today, Seven 10Gbps Paths Plus a 100Gbps Path

– Peak Usage is <10%– Lots of Room for Experimenting with Big Data

– Interest from Microsoft and Google as well• Clouds Useful for Lots of Small Data• No Business Model for Small Amounts of Really Big Data• Also Very High Financial Barriers to Exit

Page 13: Building a Regional 100G Collaboration Infrastructure

PRP Allows for Multiple Secure Independent Cooperating Research Groups

• Any Particular Science Driver is Comprised of Scientists and Resources at a Subset of Campuses and Resource Centers

• We Term These Science Teams with the Resources and Instruments they Access as Cooperating Research Groups (CRGs).

• Members of a Specific CRG Trust One Another, But They Do Not Necessarily Trust Other CRGs

Page 14: Building a Regional 100G Collaboration Infrastructure

FIONA – Flash I/O Network Appliance:Linux PCs Optimized for Big Data

UCOP Rack-Mount Build:

FIONAs Are Science DMZ Data Transfer Nodes &Optical Network Termination Devices

UCSD CC-NIE Prism Award & UCOPPhil Papadopoulos & Tom DeFanti

Joe Keefe & John Graham

Cost $8,000 $20,000

Intel Xeon Haswell Multicore

E5-1650 v3 6-Core

2x E5-2697 v3 14-Core

RAM 128 GB 256 GB

SSD SATA 3.8 TB SATA 3.8 TB

Network Interface 10/40GbEMellanox

2x40GbE Chelsio+Mellanox

GPU NVIDIA Tesla K80

RAID Drives 0 to 112TB (add ~$100/TB)

John Graham, Calit2’s QI

Page 15: Building a Regional 100G Collaboration Infrastructure

FIONAs as Uniform DTN End Points

Existing DTNs

As of October 2015

FIONA DTNs

UC FIONAs Funded byUCOP “Momentum” Grant

Page 16: Building a Regional 100G Collaboration Infrastructure

Ten Week Sprint to Demonstrate the West Coast Big Data Freeway System: PRPv0

Presented at CENIC 2015 March 9, 2015

FIONA DTNs Now Deployed to All UC CampusesAnd Most PRP Sites

Page 17: Building a Regional 100G Collaboration Infrastructure

PRP Timeline

• PRPv1 (Years 1 and 2)– A Layer 3 System – Completed In 2 Years – Tested, Measured, Optimized, With Multi-domain Science Data– Bring Many Of Our Science Teams Up – Each Community Thus Will Have Its Own Certificate-Based Access

To its Specific Federated Data Infrastructure.

• PRPv2 (Years 3 to 5)– Advanced IPv6-Only Version with Robust Security Features

– e.g. Trusted Platform Module Hardware and SDN/SDX Software– Support Rates up to 100Gb/s in Bursts And Streams– Develop Means to Operate a Shared Federation of Caches

Page 18: Building a Regional 100G Collaboration Infrastructure

Why is PRPv1 Layer 3 Instead of Layer 2 like PRPv0?

• In the OptIPuter Timeframe, with Rare Exceptions, Routers Could Not Route at 10Gbps, But Could Switch at 10Gbps. Hence for Performance, L2 was Preferred.

• Today Routers Can Route at 100Gps Without Performance Degradation.  Our Prism Arista Switch Routes at 40Gbps Without Dropping Packets or Impacting Performance.

• The Biggest Advantage of L3 is Scalability via Information Hiding.  Details of the End-to-End Pathways are Not Needed, Simplifying the Workload of the Engineering Staff. 

• Also Advantage of L3 is Engineered Path Redundancy within the Transport Network.

• Thus, a 100Gbps Routed Layer3 Backbone Architecture Has Many Advantages:– A Routed Layer3 Architecture Allows the Backbone to Stay Simple - Big, Fast, and Clean. – Campuses can Use the Connection Without Significant Effort and Complexity on the End Hosts.– Network Operators do not Need to Focus on Getting Layer 2 to Work and Later Diagnosing End-to-

End Problems with Less Than Good Visibility.– This Leaves Us Free to Focus on the Applications on The Edges, on The Science Outcomes, and

Less on The Backbone Network Itself.

These points from Eli Dart, John Hess, Phil Papadopoulos, Ron Johnson, and others

Page 19: Building a Regional 100G Collaboration Infrastructure

Pacific Research PlatformMulti-Campus Science Driver Teams

• Jupyter Hub• Biomedical

– Cancer Genomics Hub/Browser– Microbiome and Integrative ‘Omics– Integrative Structural Biology

• Earth Sciences– Data Analysis and Simulation for Earthquakes and Natural Disasters– Climate Modeling: NCAR/UCAR– California/Nevada Regional Climate Data Analysis– CO2 Subsurface Modeling

• Particle Physics• Astronomy and Astrophysics

– Telescope Surveys– Galaxy Evolution– Gravitational Wave Astronomy

• Scalable Visualization, Virtual Reality, and Ultra-Resolution Video 19

Page 20: Building a Regional 100G Collaboration Infrastructure

PRP First Application: Distributed IPython/Jupyter Notebooks: Cross-Platform, Browser-Based Application Interleaves Code, Text, & Images

IJuliaIHaskellIFSharpIRubyIGoIScalaIMathicsIaldorLuaJIT/TorchLua KernelIRKernel (for the R language)IErlangIOCamlIForthIPerlIPerl6IoctaveCalico Project • kernels implemented in Mono,

including Java, IronPython, Boo, Logo, BASIC, and many others

IScilabIMatlabICSharpBashClojure KernelHy KernelRedis Kerneljove, a kernel for io.jsIJavascriptCalysto SchemeCalysto Processingidl_kernelMochi KernelLua (used in Splash)Spark KernelSkulpt Python KernelMetaKernel BashMetaKernel PythonBrython KernelIVisual VPython Kernel

Source: John Graham, QI

Page 21: Building a Regional 100G Collaboration Infrastructure

PRP Has Deployed Powerful FIONA Servers at UCSD and UC Berkeley to Create a UC-Jupyter Hub 40Gbps Backplane

FIONAs Have GPUs and Can Spawn Jobs to SDSC’s Comet

Using inCommon CILogon Authenticator Module

for Jupyter.Deep Learning Libraries

Have Been InstalledAnd Run on Applications

Source: John Graham, QI

Jupyter Hub FIONA:2 x 14-core CPUs

256GB RAM 1.2TB FLASH

3.8TB SSDNvidia K80 GPU

Dual 40GbE NICsAnd a Trusted Platform Module

Page 22: Building a Regional 100G Collaboration Infrastructure

Cancer Genomics Hub (UCSC) is Housed in SDSC CoLo:Large Data Flows to End Users at UCSC, UCB, UCSF, …

1G

8G

15G

Cumulative TBs of CGH Files Downloaded

Data Source: David Haussler, Brad Smith, UCSC

30 PB

Page 23: Building a Regional 100G Collaboration Infrastructure

Large Hadron Collider Data Researchers Across Eight California Universities Benefit From Petascale Data & Compute Resources across PRP

• Aggregate Petabytes of Disk Space & Petaflops of Compute

• Transparently Compute on Data at Their Home Institutions & Systems at SLAC, NERSC, Caltech, UCSD, SDSC

SLACData & Compute

Resource

CaltechData & Compute

ResourceUCSD & SDSC

Data & ComputeResources

UCSB

UCSC

UCD

UCR

CSU Fresno

UCI

Source: Frank Wuerthwein, UCSD Physics;

SDSC; co-PI PRP

PRP Builds on SDSC’s LHC-UC

Project

Page 24: Building a Regional 100G Collaboration Infrastructure

Two Automated Telescope SurveysCreating Huge Datasets Will Drive PRP

300 images per night. 100MB per raw image

30GB per night

120GB per night

250 images per night. 530MB per raw image

150 GB per night

800GB per nightWhen processed

at NERSC Increased by 4x

Source: Peter Nugent, Division Deputy for Scientific Engagement, LBLProfessor of Astronomy, UC Berkeley

Precursors to LSST and NCSA

PRP Allows Researchersto Bring Datasets from NERSC

to Their Local Clusters for In-Depth Science Analysis-see UCSC’s Brad Smith Talk

Page 25: Building a Regional 100G Collaboration Infrastructure

Dan Cayan USGS Water Resources Discipline

Scripps Institution of Oceanography, UC San Diego

much support from Mary Tyree, Mike Dettinger, Guido Franco and other colleagues

Sponsors: California Energy Commission NOAA RISA program California DWR, DOE, NSF

Planning for climate change in California substantial shifts on top of already high climate variability

SIO Campus Climate Researchers Need to Download Results from NCAR Remote Supercomputer Simulations

to Make Regional Climate Change Forecasts

Page 26: Building a Regional 100G Collaboration Infrastructure

Calit2’s Qualcomm Institute Has Established a Pattern Recognition LabInvestigating Using Brain-Inspired Processors

“On the drawing board are collections of 64, 256, 1024, and 4096 chips.

‘It’s only limited by money, not imagination,’ Modha says.”Source: Dr. Dharmendra Modha

Founding Director, IBM Cognitive Computing Group

August 8, 2014

Page 27: Building a Regional 100G Collaboration Infrastructure

UCSD ECE Professor Ken Kreutz-Delgado Brings the IBM TrueNorth Chip to Calit2’s Qualcomm Institute

September 16, 2015

Page 28: Building a Regional 100G Collaboration Infrastructure

A Brain-Inspired Cyberinstrument: Pattern Recognition Co-Processors Coupled to Today’s von Neumann Processors

“If we think of today’s von Neumann computers as akin to the “left-brain”

—fast, symbolic, number-crunching calculators, then IBM’s TrueNorth chip

can be likened to the “right-brain”—slow, sensory, pattern recognizing machines.”

- Dr. Dhamendra Modha, IBM Cognitive Computing

www.research.ibm.com/articles/brain-chip.shtml

The Pattern Recognition Laboratory’s Cyberinstrument Will be a PRP Computational Resource

Exploring Realtime Pattern Recognition in Streaming Media& Discovering Patterns in Massive Datasets

Page 29: Building a Regional 100G Collaboration Infrastructure

Collaboration Between EVL’s CAVE2 and Calit2’s VROOM Over 10Gb Wavelength

EVL

Calit2

Source: NTT Sponsored ON*VECTOR Workshop at Calit2 March 6, 2013

Page 30: Building a Regional 100G Collaboration Infrastructure

Optical Fibers Link Australian and US Big Data Researchers-Also Korea, Japan, and the Netherlands

Page 31: Building a Regional 100G Collaboration Infrastructure

Next Step: Use AARnet/PRP to Set Up Planetary-Scale Shared Virtual Worlds

Digital Arena, UTS Sydney

CAVE2, Monash U, Melbourne

CAVE2, EVL, Chicago

Page 32: Building a Regional 100G Collaboration Infrastructure

The Pacific Research Platform Creates a Regional End-to-End Science-Driven “Big Data Freeway System”

Opportunities for Collaboration with CineGrid Systems