View
215
Download
0
Category
Preview:
Citation preview
TheConvergenceofHPCandBigData
PeteBeckmanSeniorScien6st,ArgonneNa6onalLaboratoryCo-Director,Northwestern/ArgonneIns6tuteforScienceandEngineering(NAISE)SeniorFellow,UniversityofChicagoComputa6onIns6tute
4
Following the International Exascale Software Initiative (IESP 2008-2012 èBig Data and Extreme Computing workshops (BDEC)
http://www.exascale.org/bdec/ Overarching goal: 1. Create an international collaborative process focused on the co-design of software infrastructure for
extreme scale science, addressing the challenges of both extreme scale computing and big data, and supporting a broad spectrum of major research domains,
2. Describe funding structures and strategies of public bodies with Exascale R&D goals worldwide 3. Establishing and maintaining a global network of expertise and funding bodies in the area of
Exascale computing 1 – BDEC Workshop, Charleston, SC, USA, April 29-May1, 2013 2 – BDEC Workshop, Fukuoka, Japan, February 26-28, 2014 3 – BDEC Workshop, Barcelona, Spain, January 28-30, 2015 4 – BDEC Workshop, Frankfurt, Germany, June 15-17, 2016
Europe-USA-AsiaInterna6onalseriesofWorkshopsonExtremeScaleScien6ficCompu6ng
12
WhyConverge?Independentpaths:MoreCost,LessScience,
• $mul6plehardwareso]wareinfrastructures• $developingso]warefortwocommuni6es• $learningtwocompu6ngmodels• $smallerdiscoverycommunity,fewerideas• Lessscience
14
ANL:PeteBeckman(PI),MarcSnir(ChiefScien,st),PavanBalaji,RinkuGupta,KamilIskra,FranckCappello,RajeevThakur,KazutomoYoshii
LLNL:MayaGokhale,EdgarLeon,BarryRountree,Mar6nSchulz,BrianVanEssenPNNL:SriramKrishnamoorthy,RobertoGioiosaUC:HenryHoffmannUIUC:LaxmikantKale,EricBohm,RamprasadVenkataramanUO:AllenMalony,SameerShende,KevinHuckUTK:JackDongarra,GeorgeBosilca,ThomasHerault
See http://www.argo-osr.org/ for more information
AnExascaleOpera0ngSystemandRun0meSoJwareResearch&DevelopmentProject
Developingvendorneutral,open-sourceOS/Rso]ware
15
WhatOS/RGapsMustWeAddress?• Extremein-nodeparallelism
– Poormechanismsforpreciseresourcemanagement(cores,power,memory,network)– Legacythreads/tasksimplementa6onsperformpoorlyatscale
• DynamicvariabilityofplaPorm;Powerisconstrained– Poorrun6memechanismsformanagingdynamicoverclocking,provisioningpower,
adjus6ngworkloads– Nomechanismsformanagingpowerdynamically,globally,andincoopera6onwith
user-levelrun6melayers
• Hierarchicalmemory– Poorinterfaces/strategiesformanagingdeepeningmemory
• NewmodesforHPC– Noportableinterfacesforeasilybuildingworkflows,in-situanalysis,coupledphysics,
advancedI/O,applica6onresilience
10/28/16 ArgoOSRPeteBeckman 15
16HierarchyofEnclaves
connectedviaaBackplane
Elas6cintranodecontainerswithresourceknobs
.
.
. Lightweightthread/tasksdesignedforcontainers,messaging,andmemoryhierarchy
Adap6ve,learning,integratedcontrolsystem
ArgoExplora6onstoAddressExascaleGaps
20
PM 2.5, 10, 100
Acollabora6veproject:ArgonneNa6onalLaboratory,theUniversityofChicago,andtheCityofChicago
Supportedbycollabora6ngins6tu6onsandtheU.S.Na6onalScienceFounda6on.IndustryIn-Kindpartners:AT&T,Cisco,Intel,Microso],MotorolaSolu6ons,SchneiderElectric,Zebra
22
Waggle:AnOpenPlamormforIntelligentSensorsExploi6ngDisrup6veTechnology,EdgeCompu,ng,ResilientDesign
MachineLearningComputerVision
NovelSensorsNano/MEMS
LowPowerCPUsGPU/Smartphones
23
RelaysCurrentSensors
Cont
rol
Proc
esso
r
Real6meclock&Internalsensors
Mul6plebootmedia(μSD/eMMC)
4-coreARM
NodeControl&Communica6ons
4+4-coreARM8-coreGPU
In-Situ/EdgeProcessing
Powerful,Resilient&Hackable
Hear
tbea
tMon
itors
Re
setp
ins
“DeepSpaceProbe”DesignLinuxDevelopmentEnvironment
27
EdgeCompu0ng:AnalysisandFeatureRecogni0onPreservingPrivacy……
• ParallelCompu6ng• OpenPlamorm• DeepLearning
30
WaggleMachineLearning&EdgeCompu6ng• WeareexploringCaffe&OpenCV
– Convolu6onalNeuralNetworks
• TrainingwillbedoneonsystemsatArgonne
• Classifica6ononWaggle
37
Waggle:APlaPormforResearch
• OpenSource/OpenPlaPorm– Reusable,extensibleso]warecommuni6es
• MachineLearning:ComputerVision– Datamustbereducedin-situ
• NovelSensors:Nano/MEMS/μfluidics– Explosionofnano/MEMS&imagingtech
• Low-PowerCPUs:GPU/Smartphones– Powerful,low-power,smartphoneCPUs
Opportunity:BigData+Predic6veModelsSmartSensors+Supercomputers/CloudCompu6ng=predic6onsandanalysis
38
WhyHPCGeeksShouldCare• Newsensorsareprogrammableparallelcomputers
– Mul6core+GPUs&OpenCLorOpenMP– Newalgorithmsforin-situdataanalysis,featuredetec6on,compression,deeplearning– Neednewprogmodfor“stackable”in-situanalysis(forsensorsandHPC)– NeedadvancedOS/Rresilience,cybersecurity,networking,over-the-airprogramming
• 1000sofnodesmakeadistributedcompu0ng“instrument”– Newstreamingprogrammingmodelneeded– Newtechniquesformachinelearningforscien6ficdatarequired
• Bothforwithina“node”andcollec6velyacross6meseries
• HowwillHPCstreaminganaly0csandsimula0onbeconnectedtolivedata?– CanwetriggerHPCsimula6onsa]erfirstapproxima6ons?(weather,energy,transporta6on)– UnstructureddatabasewithprovenanceandmetadataforQA/collabora6on
• UsenovelHPChardwaretosolvepowerissue?– CanweuseneuromorphicorFPGAstoreducepowerforin-situanalysis&compression?
• Wearetradingprecision&costforgreaterspa6alresolu6on:Whatispossible?
39
CloudDatabase
NearRealTimeHPCSimula0ons
DataAggrega0onMul0pleSources
DataAnalysisandHPCsimula0ons
ParallelComputa0onattheEdge
NewEdgeAlgorithms
40
RISCversionofConvergenceStory:Startbyenabling…removeroadblocks(theneveryone’swishlistfollows)
• Applica6ons(ScienceDrivers):So]wareneeds&workflowpauerns• Opera0ons
– Supportreal-6meandstreamingfromfastnetworks– Supportnodesharing,long-livedservices,storagerequestsforyears…
• Architecture– Mothballcurrentparallelfilesystems,replacewithpersistentstorageservices
(databases,KV,etc.)– Acceleratemoveofstorageintocomputeinfrastructure
• SoJware– Linuxso]waredevelopmentenvironment.– Na6vesupportforlow-levelinfrastructure:Docker,VMs,Mesos,etc.– NewfocusonQoS;So]wareDefinedStorage,on-demandservices,etc.
42
ChicagoPittsburgh
New York Portland
Atlanta
Boston
Delhi Chattanooga
2016-17Phase2Pilots
Developingapilotprojectstrategyaimedatempoweringpartneruniversi,esandna,onallaboratoriestoworkwith
theirlocalci,es.
Chicago
2016Phase1Pilots
Seattle Bristol
Newcastle
Developingapilotprojectstrategyaimedatempoweringpartneruniversi,esandna,onallaboratoriestoworkwith
theirlocalci,es.
Denver
Ini6aldiscussions
Recommended