Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
IncreasingCoherenceBetweenSimulationandDataAnalyticsChesapeake Large Scale Data Analytics ConferenceAnnapolis, MDOctober 25, 2016 RobLeland
VicePresident,Science&TechnologyChiefTechnologyOfficerSandiaNationalLaboratories
SAND2016-10762 C
Outline
2
§ Ataleoftwovisions
§ Somebackground
§ AchargefromtheNationalStrategicComputingInitiative
§ Answerstothreekeyquestions§ Whyisanincreasingcoherencebetweensimulationandanalyticsimportant?§ Whatisreallymeantby“increasingcoherence”betweenthetwo?§ Howmightcoherencebefurtheredinpractice?
§ Aunifyingvision
Vision1:Fromascientificperspective
FromTheFourthParadigm:Data-IntensiveScientificDiscoverybyJimGray
Dataanalysiscomplementstheory,experiment,andcomputation
GraphmatchingexampleofdataanalyticsAkeyanalyticprimitive-- usedtofindaspecificinstanceofanabstractpatternofinterest
FromCoffman,Greenblatt,andMarcus,Graph-BasedTechnologiesforIntelligenceAnalysis, CommunicationsoftheACM,47,March2004.
Vision2:Fromanationalsecurityperspective
Somebackground
5
§ Simulation§ Computationstounderstandphysicalphenomenaorconductengineering
§ LargeScaleDataAnalytics(LSDA)§ DataAnalytics=Discoveringmeaningfulpatternsindata§ LargeScale=Requiringleading-edgeprocessingandstoragecapabilities
§ LSDAisincreasinginimportance§ Pervasive
§Commerce,finance,healthcare,science,engineering,nationalsecurity,...§ Lastingsocietalsignificance
§ Internetsearch,genomics,climatemodeling,Higgsparticle,...
§ LSDAisgetting“harder”§ Captureddatagrowingexponentiallywithtime§ Individualanalysisbecomingmoresophisticated§ Morepeopleexaminingmoredatamorefrequently§ AggregateworkgrowingmuchfasterthanMoore’sLaw
TheEconomist:
NationalStrategicComputingInitiative(NSCI)
6
NSCIStrategicObjectives
7
§ (1)Acceleratingdeliveryofacapableexascale computingsystemthatintegrateshardwareandsoftwarecapabilitytodeliverapproximately100timestheperformanceofcurrent10petaflopsystemsacrossarangeofapplicationsrepresentinggovernmentneeds.
§ (2)Increasingcoherencebetweenthetechnologybaseusedformodelingandsimulationandthatusedfordataanalyticcomputing.
§ (3)Establishing,overthenext15years,aviablepathforwardforfutureHPCsystemsevenafterthelimitsofcurrentsemiconductortechnologyarereached(the"post-Moore'sLawera").
§ (4)IncreasingthecapacityandcapabilityofanenduringnationalHPCecosystembyemployingaholisticapproachthataddressesrelevantfactorssuchasnetworkingtechnology,workflow,downwardscaling,foundationalalgorithmsandsoftware,accessibility,andworkforcedevelopment.
§ (5)Developinganenduringpublic-privatecollaborationtoensurethatthebenefitsoftheresearchanddevelopmentadvancesare,tothegreatestextent,sharedbetweentheUnitedStatesGovernmentandindustrialandacademicsectors.
Q1:Whyisincreasingcoherencebetweensimulationandanalyticsimportant?
8
§ Forsimulation§ HPCsimulationmustrideonsomecommoditycurve§ Largermarketforcesbehindanalytics§ Canexploitcommoditycomponenttechnologyfromanalytics
§ Foranalytics§ LargeScaleDataAnalyticsproblemsbecomingevermoresophisticated§ Requiringmorecoupledmethods§ CanexploitarchitecturallessonsfromHPCsimulation
§ Forboth:Integrationofsimulationandanalyticsinthesameworkflow§ Automationofanalysisofdatafromsimulation§ Creationofsyntheticdataviasimulationtoaugmentanalysis§ Automatedgenerationandtestingofhypothesis§ Explorationofnewscientificandtechnicalscenarios§ ...
Mutualinspiration,technicalsynergy,andeconomiesofscaleinthecreation,deployment,anduseofHPCresources
9
Achallengebecausesimulationandanalyticsdifferinmanyrespects…
DatastructuresdescribingsimulationandanalyticsdifferGraphsfromsimulationsmaybeirregular,buthavemorelocalitythanthosederivedfromanalytics
ComputationalSimulationofphysicalphenomena:
Climatemodeling Carcrash
Internetconnectivity Yeastproteininteractions
LargeScaleDataAnalytics:
FiguresfromLelandet.al.courtesyofYelick,LBNL.
TheU.S.roadmap,whichhasspatiallocalityandisthusmostsimilarofthethreeinstructuretocomputationalpatternsthatwouldariseintypicalphysicalsimulations.
Computationandcommunicationpatternsdiffer
Black =timespentcomputingGreen =timespentcommunicatingWhite =timespentwaitingfordatatobecommunicated
TheErdős-Rényi graph,awell-studiedexampleingraphtheorywork.
A scale-freegraph,anexamplemorereflectiveofreal-worldnetworks.
FigurefromLelandet.al.courtesyofJohnson,PNNL.
Simulation
Analytics
Standardbenchmarksinclude:• LINPACK(smallestdataintensiveness;barelyvisibleongraph)• STREAM• SPECFP• SpecInt
MemoryperformancedemandsdifferAkeydifferentiatorintheperformanceofsimulationandanalytics
FigurefromMurphy&Kogge withadjustmenttodoubleradiusofLinpack datapointtomakeitvisible.
Areaofthecircle=relativedataintensiveness(i.e.totalamountofuniquedataaccessed overafixedintervalofinstructions)
Simulation
Analytics
Applicationcodeproperty Simulation Analytics
Spatiallocality High Low
Temporallocality Moderate Low
Memoryfootprint Moderate High
Computationtype Maybefloating-pointdominated* Integerintensive
Input-outputorientation Outputdominated Inputdominated
*Increasingly,simulationworkhasbecomelessfloating-pointdominated
Applicationcodecharacteristicsdiffer
Contrastingproperties:
Q2:Sowhatdowereallymeanby“increasingcoherence”betweensimulationandanalytics?
14
§ NOTonesystemostensiblyoptimizedforbothsimulationandanalytics
§ Greatercommonalityinunderlyingcomponentryanddesignprinciples
§ Greaterinteroperability,allowinginterleavingofbothtypesofcomputations
…Amorecommonhardwareandsoftwareroadmapbetweensimulationandanalytics
15
Andyet,thereishope…
Simulationandanalyticsareevolvingtobecomemoresimilarintheirarchitecturalneeds
16
§ CurrentchallengesfortheLSDAcommunity§ Datamovement§ Powerconsumption§ Memory/interconnectbandwidth§ Scalingefficiency
§ InstructionmixforSandia’sHPCengineeringcodes§ Memoryoperations 40%§ Integeroperations 40%§ Floatingpoint 10%§ Other 10%
§ Commondesignimpactsofenergycosttrends§ Increasedconcurrency(processingthreads,cores,memorydepth)§ Increasedcomplexityandburdenon
§ systemsoftware,languages,tools,runtimesupport,codes
…similartoHPCsimulation
…similartoLSDA
Energycostofmovingdataisbecomingdominant
Energycost,inpicojou
les(pJ),pe
r64
-bitflo
ating-po
into
peratio
n
Costestimatesfortechnologyyear
Energycostforvariouscommonoperations
FromDanMcMorrow,TechnicalChallengesofExascaleComputing,JSR-12-310,JASON,MITRECorporation,April2013.
ArchitecturalCharacteristic
Simulation Analytics
Computation Memoryaddressgenerationdominated Same
Primarymemory Lowpower,highbandwidth,semi-randomaccess Same
Secondarymemory Emergingtechnologiesmayoffsetcost,allowingmuchmorememory …require extremelylargememoryspaces
Storage Integrationofanotherlayerofmemoryhierarchytosupportcheckpoint/restart …tosupportout-of-coredatasetaccess
Interconnecttechnology Highbisectionbandwidth,(forrelativelycoarse-grainedaccess) …(forfine-grainedaccess)
Systemsoftware(node-level)
Lowdependenceonsystemservices,increasinglyadaptive,resourcemanagementforstructured parallelism
…highlyadaptive,resourcemanagementforunstructured parallelism
Systemsoftware(system-level) Increasinglyirregularworkflows Irregularworkflows
Emergingarchitecturalandsystemsoftwaresynergies
Similarneeds:
Q3:Howmightcoherencebefurtheredinpractice?
19
§ Makingitanelementofnationalstrategy§ CheckviatheNSCI
§ Buildingthisintoexascale computingefforts§ AlsoacomponentoftheNSCI
§ Communicatingwithandenlistingthetechnicalcommunitiesconcerned§ Thisforumandsimilarevents
§ Furtherdevelopingthevision§ Today’sdialoguesession!
Acknowledgements
20
Additionalreferences
21
§ TheEconomist,“Data,Data,Everywhere,” Feb25th,2010
§ R.C.MurphyandP.M.Kogge,“OntheMemoryAccessPatternsofSupercomputerApplications:BenchmarkSelectionandItsImplications,”IEEETransactionsonComputers56(7,July2007):937–945.
§ R.Murphy,“PowerIssues,”presentationtoJASON2012,June2012.
§ PeterKogge (editor)etal.,ExaScale ComputingStudy:TechnologyChallengesinAchievingExascaleSystems. DARPA,2008.
§ DanMcMorrow,TechnicalChallengesofExascaleComputing,JSR-12-310,JASON,MITRECorporation,April2013.
§ TonyHey,StewartTansley,andKristinTolle(editors), TheFourthParadigm:Data-IntensiveScientificDiscovery,MicrosoftResearch,2009.
§ JimGray,TheFourthParadigm:Data-IntensiveScientificDiscovery