31
Science Networks: How Data Gets There Eli Dart, Network Engineer ESnet Science Engagement Lawrence Berkeley Na=onal Laboratory ATPESC 2017 Chicago, IL August 4, 2017

Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

ScienceNetworks:HowDataGetsThere

EliDart,NetworkEngineerESnetScienceEngagementLawrenceBerkeleyNa=onalLaboratory

ATPESC2017

Chicago,IL

August4,2017

Page 2: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

Outline

8/4/172

•  ScienceNetworks–structureandrela=onshiptotherestoftheInternet

•  DatatransferatHPCfacili=es

•  Dataportals–past,present,andfuture

Page 3: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

NCARRDADataPortal

•  Let’ssayIhaveanicecomputealloca=onattheALCF–climatescience

•  Let’ssayIneedsomedatafromNCARformyproject

•  hSps://rda.ucar.edu/

•  Datasets(therearemanymore,butthesearetwo):

•  hSps://rda.ucar.edu/datasets/ds199.1/(1.5TB)•  hSps://rda.ucar.edu/datasets/ds313.0/(430GB)

•  DownloadtoALCF(couldalsodoNCSAorNERSCorOLCF)

8/4/173

Page 4: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

WhatIsAScienceNetwork?•  Downloadingdatafromaportalhappensviathenetwork•  Whatdoes“viathenetwork”actuallymean?•  Whatis“thenetwork”anyway?

•  Mostofusarefamiliarwiththeno=onofanISP–  Internetaccessathome(Ne`lix,etc.)–  Dataforphones(Facebook,maps,Google,etc.)–  Thisis“theInternet”thatmostpeoplesee

•  Sciencenetworksinterconnectscien=ficsites–  HPCfacili=es–  Par=cleaccelerators(LHC,lightsources,…)–  Dataportals

•  SciencenetworksusethesameprotocolsastherestoftheInternet–  TheyarealsoconnectedtotherestoftheInternet

8/4/174

Page 5: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

ThisisnotanISP.

Page 6: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

It’saDOEuserfacilityengineeredandopDmizedforBigDataScience

Wedothisbyofferinguniquecapabili=esandop=mizingthefacilityfordataacquisi=on,data

placement,datasharing,datamobility.

Page 7: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

TheInternet

•  TheInternetiscomposedofalargenumberofindividualnetworks–  Eachisrunbysomeen=tyforitsownreasons

•  Google•  USDepartmentofDefense•  FordMotorCompany•  USDepartmentofEnergy•  AT&T

–  Eachnetworkconnectstoothersforitsownreasons

•  Ingeneral,networksaremorevaluablewhenconnectedtoeachother–  Butremember–thisconnec=vityhappensforselfishreasons–  Notallnetworksarethesame–eachexistsforitsownreasons

8/4/177

Page 8: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

Selectednetworksandtheirmissions

8/4/178

Google(Search, YouTube, Gmail, Gdocs, etc.)

AT&T(Phones, home broadband, etc.)

ESnet(Big Science facilities,

DOE labs)

Insurance Company(Business

operations)

ISP(Internet connectivity for

customers, related services)

Internet2(Science network

connectivity for state and regional science

networks)

GEANT(Network connectivity for

European national science networks)

Regional Networks(Network connectivity

for universities, libraries, schools)

European NRENs(Network connectivity

for universities, libraries, schools)

Millions of phones, millions of homes

Giant data centers

National Labsand Facilities

Universities, schools, libraries

Universities, schools, libraries Many of these!

Many of these!

Car Company(Business

operations)

Many of these!

Page 9: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

Notesaboutdifferentnetworks

•  Thepreviousdiagramisadras=csimplifica=on–  hSp://www.caida.org/research/topology/as_core_network/2015/

•  Keypoints:–  Allnetworksexistforaspecificreason

•  Somenetworksprovideconnec=vitybetweennetworks•  Somenetworksprimarilyservetheirownusers•  Somenetworksprovideservicestouserswhoaccessthemviadifferentnetworks(e.g.

Google)–  Theselinesareblurry,butit’sausefulwaytothinkaboutit

•  Networkmissioninfluencesengineering,policy,reliability,etc.–  Notallnetworksarebuiltthesameway–  Notallnetworkscansupportallusemodels–  Sciencenetworkshaveadifferenttrafficprofilethancommercialnetworks

8/4/179

Page 10: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

ElephantDatavs.MiceData

10

Page 11: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

ElephantDatavs.MiceDataBehavior

11

Page 12: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

Physicalpipethatleakswateratrateof.0046%byvolume.

è è

Network‘pipe’thatdropspacketsatrateof.0046%.è è

Result100%ofdatatransferred,slowly,at<<5%op=malspeed.

ElephantFlowsPlaceGreatDemandsonNetworks

Result99.9954%ofwatertransferred,at“linerate.”

essen=allyfixed

determinedbyspeedoflight

Throughcarefulengineering,wecanminimizepacketloss.

Page 13: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

ElephantflowsrequireessenDallylosslessnetworks

MetroArea

Local(LAN)

Regional

Con=nental

Interna=onal

Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)

13

.SeeEliDart,LaurenRotman,BrianTierney,MaryHester,andJasonZurawski.TheScienceDMZ:ANetworkDesignPaSernforData-

IntensiveScience.InProceedingsoftheIEEE/ACMAnnualSuperCompu=ngConference(SC13),DenverCO,2013.

Page 14: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

EmergingglobalconsensusaroundScienceDMZarchitecture.

•  Over 120 universi=es in the US have

deployedthisESnetarchitecture.

•  NSF has invested >>$80M to accelerateadop=on.

•  Australian, Canadian, Bri=sh, Brazilianuniversi=esfollowingsuit.

•  hQp://fasterdata.es.net/science-dmz/

1.  Fric=on-freenetworkpath

2.  Dedicateddatatransfernodes(DTNs)

3.  Performancemonitoring(perfSONAR)

Page 15: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

ThePetascaleDTNProject•  BuiltontopoftheScienceDMZmodel•  EfforttoimprovedatatransferperformancebetweentheDOEASCRHPCfacili=esatANL,LBNL,andORNL,andalsoNCSA.–  Mul=plecurrentandfuturescienceprojectsneedtotransferdatabetweenHPCfacili=es

–  Performancegoalis15gigabitspersecond(equivalentto1PB/week)–  Realizeperformancegoalforrou=neGlobustransferswithoutspecialtuning

•  Referencedatasetis4.4TBofcosmologysimula=ondata

8/4/1715

Page 16: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

DTNClusterPerformance–HPCFaciliDes

16 – ESnet Science Engagement ([email protected]) - 8/4/17 ©2015,EnergySciencesNetwork

11.8 Gbps

20.2 Gbps

15.2 Gbps

15.1 Gbps

20.6 Gbps 19.7 Gbps

23.0 Gbps

25.7 Gbps

27.2 Gbps

22.9 Gbps

19.4 Gbps

21.2 Gbps

DTN

DTN

DTN

DTN

alcf#dtn_miraALCF

nersc#dtnNERSC

olcf#dtn_atlasOLCF

ncsa#BlueWatersNCSA

Data set: L380Files: 19260Directories: 211Other files: 0Total bytes: 4442781786482 (4.4T bytes)Smallest file: 0 bytes (0 bytes)Largest file: 11313896248 bytes (11G bytes)Size distribution:

1 - 10 bytes: 7 files10 - 100 bytes: 1 files100 - 1K bytes: 59 files1K - 10K bytes: 3170 files10K - 100K bytes: 1560 files100K - 1M bytes: 2817 files1M - 10M bytes: 3901 files10M - 100M bytes: 3800 files100M - 1G bytes: 2295 files1G - 10G bytes: 1647 files10G - 100G bytes: 3 files

June 2017L380 Data Set

Page 17: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

ScienceDataPortals

•  Largerepositoriesofscien=ficdata–  Climatedata–  Skysurveys(astronomy,cosmology)–  Manyothers–  Datasearch,browsing,access

•  Manyscien=ficdataportalsweredesigned15+yearsago–  Single-web-serverdesign–  Databrowse/search,dataaccess,userawarenessallinasinglesystem–  Allthedatagoesthroughtheportalserver

•  Inmanycasesbydesign•  E.g.embargobeforepublica=on(enforceaccesscontrol)

8/4/1717

Page 18: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

LegacyPortalDesign

10GE

Border Router

WAN

Firewall

Enterprise

perfSONAR

perfSONAR

Filesystem(data store)

10GE

Portal Server

Browsing pathQuery pathData path

Portal server applications:· web server· search· database· authentication· data service

8/4/1718

•  Verydifficulttoimproveperformancewithoutarchitecturalchange–  Soxwarecomponentsalltangledtogether

–  DifficulttoputthewholeportalinaScienceDMZbecauseofsecurity

–  EvenifyoucouldputitinaDMZ,manycomponentsaren’tscalable

•  Whatdoesarchitecturalchangemean?

Page 19: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

ExampleofArchitecturalChange–CDN

•  Let’slookatwhatContentDeliveryNetworksdidforwebapplica=ons

•  CDNsareawell-deployeddesignpaSern(e.g.AirBnB,OlympicGames,etc.)

• WhatdoesaCDNdo?–  Storesta=ccontentinaseparateloca=onfromdynamiccontent•  Complexityisn’tinthesta=ccontent–it’sintheapplica=ondynamics•  Webapplica=onsarecomplex,full-featured,andslow•  Dataserviceforsta=ccontentissimple–justmovethefile

–  Separa=onofapplica=onanddataserviceallowseachtobeop=mized

8/4/1719

Page 20: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

ClassicalWebServerModel

8/4/1720

•  Webbrowserfetchespagesfromwebserver–  Allcontentstoredonthewebserver–  Webapplica=onsrunonthewebserver–  Webserversendsdatatoclientbrowseroverthenetwork

•  Perceivedclientperformancechangeswithnetworkcondi=ons–  Severalproblemsinthegeneralcase–  Latencyincreases=metopagerender–  Packetloss+latencycauseproblemsforlargesta=cobjects

HostingProvider

TransitNetwork

Residential BroadbandWEB

Long Distance / High Latency

Web Server

Browser

Page 21: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

SoluDon:PlaceLargeStaDcObjectsNearClient

HostingProvider

TransitNetwork

Residential BroadbandWEB

Long Distance / High Latency

CDN

DATA

Short Distance / Low Latency

Web Server

CDN Data Server

Browser

8/4/1721

•  CDNprovidessta=ccontent“close”toclient•  Webservers=llmanagescomplexbehavior

•  Latencygoesdown–  Timetopagerendergoesdown–  Sta=ccontentperformancegoesup

•  Loadonwebservergoesdown(noneedtoservesta=ccontent)

•  Significantwinforwebapplica=onperformance

Page 22: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

ArchitecturalExaminaDonofDataPortals

•  Commondataportalfunc=ons(mostportalshavethese)–  Search/query/discovery–  Datadownloadmethodfordataaccess–  GUIforbrowsingbyhumans–  APIformachineaccess–ideallyincorporatessearch/query+download

•  Performancepainisprimarilyinthedatahandlingpiece–  Rapidincreaseindatascaleeclipsedlegacysoxwarestackcapabili=es–  Portalserversoxenstuckinenterprisenetwork

•  Canwe“disassemble”theportalandputthepiecesbacktogetherbeSer?–  UseScienceDMZasapla`ormforthedatapiece–  AvoidplacingcomplexsoxwareintheScienceDMZ

8/4/1722

Page 23: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

LegacyPortalDesign

10GE

Border Router

WAN

Firewall

Enterprise

perfSONAR

perfSONAR

Filesystem(data store)

10GE

Portal Server

Browsing pathQuery pathData path

Portal server applications:· web server· search· database· authentication· data service

8/4/1723

Page 24: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

Next-GeneraDonPortalLeveragesScienceDMZ

10GE10GE

10GE

10GE

Border Router

WAN

Science DMZSwitch/Router

Firewall

Enterprise

perfSONAR

perfSONAR

10GE

10GE

10GE10GE

DTN

DTN

API DTNs(data access governed

by portal)

DTN

DTN

perfSONAR

Filesystem (data store)

10GE

Portal Server

Browsing pathQuery path

Portal server applications:· web server· search· database· authentication

Data Path

Data Transfer Path

Portal Query/Browse Path

8/4/1724

Page 25: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

PutTheDataOnDedicatedInfrastructure

•  Wehaveseparatedthedatahandlingfromtheportallogic•  Portaliss=llitsnormalself,butenhanced

–  PortalGUI,database,search,etc.allfunc=onastheydidbefore–  QueryreturnspointerstodataobjectsintheScienceDMZ–  Portalisnowfreedfrom=estothedataservers(runitonAmazonifyouwant!)

•  Datahandlingisseparate,andscalable–  High-performanceDTNsintheScienceDMZ–  Scaleasmuchasyouneedtowithoutmodifyingtheportalsoxware

•  Outsourcedatahandlingtocompu=ngcenters–  Compu=ngcentersaresetupforlarge-scaledata–  Letthemhandlethelarge-scaledata,andlettheportaldotheorchestra=onofdataplacement

8/4/1725

Page 26: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

DataPortalImplicaDons

•  Portalsholdalotofvaluabledata–  Observa=ons(skysurveys,satellitedata,genomes,etc.)–  Manyhavebeeninplaceforyears

•  Mostareinadequatetosupportlarge-scaleanalysis–  Legacysearch/queryinterfaces–  Legacyaccessprotocols/tools–  Thisisintheprocessofchanging

•  Thetechnologyexiststoradicallyimprovetheu=lityofdataportals–  Whatshouldtheperformanceexpecta=onbe?–  HPCfacili=escando1PB/week–ifdataportalscoulddothis…

8/4/1726

Page 27: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

NCARRDAPerformancetoDOEHPCFaciliDes

13.9 Gbps 16.6 Gbps 11.9 Gbps

DTN

nersc#dtnNERSC

DTN

olcf#dtn_atlasOLCF

DTN

alcf#dtn_miraALCF

DTN

NCAR RDArda#datashare

8/4/1727

•  1.5TBdataset

•  1121files

Page 28: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

Summary

•  Sciencenetworksareengineeredtosupportdata-intensivescience–  RelatedtoandconnectedtotherestoftheInternet,butdifferent

•  ScienceDMZmodeleffec=velyconnectsdatainfrastructuretonetworks–  Ifyouneedtosendyoursysadmintome,feelfree

•  GlobusatHPCfacili=esmakesterascaletopetascaledatatransferspossible–  (moreonGlobuslatertoday)

•  HugeopportunityinupgradingdataportalstouseScienceDMZ,DTNs,advancedtools(e.g.Globus)–  MakelargedatarepositoriesavailableforanalysisatHPCfacili=es

8/4/1728

Page 29: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

Inconclusion–ESnet’svision:

Scien=ficprogresswillbecompletelyunconstrainedbythephysicalloca=onofinstruments,people,computa=onal

resources,ordata.

29

Page 30: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

LinksandLists

–  ESnetfasterdataknowledgebase•  hSp://fasterdata.es.net/

–  ScienceDMZpaper•  hSp://www.es.net/assets/pubs_presos/sc13sciDMZ-final.pdf

–  ScienceDMZemaillist•  [email protected]"subscribeesnet-sciencedmz”

–  perfSONAR•  hSp://fasterdata.es.net/performance-tes=ng/perfsonar/•  hSp://www.perfsonar.net

–  Globus•  hSps://www.globus.org/

30 – ESnet Science Engagement ([email protected]) - 8/4/17 ©2015,EnergySciencesNetwork

Page 31: Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

Thanks!

EliDartEnergySciencesNetwork(ESnet)LawrenceBerkeleyNa=onalLaboratory

hSp://my.es.net/

hSp://www.es.net/

hSp://fasterdata.es.net/