43
The Convergence of HPC and Big Data Pete Beckman Senior Scien6st, Argonne Na6onal Laboratory Co-Director, Northwestern / Argonne Ins6tute for Science and Engineering (NAISE) Senior Fellow, University of Chicago Computa6on Ins6tute

The Convergence of HPC and Big Data - ncic.ac.cn · The Convergence of HPC and Big Data ... , Rajeev Thakur, Kazutomo Yoshii LLNL: ... Smart Sensors + Supercomputers/Cloud Compu6ng

  • Upload
    buingoc

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

TheConvergenceofHPCandBigData

PeteBeckmanSeniorScien6st,ArgonneNa6onalLaboratoryCo-Director,Northwestern/ArgonneIns6tuteforScienceandEngineering(NAISE)SeniorFellow,UniversityofChicagoComputa6onIns6tute

2

ArgonneNa0onalLaboratory

§  $675M/yrbudget§  3,200employees§  1,450scien6sts/eng§  750Ph.D.s

3

Argonne’sNextBigSupercomputer:Aurora

4

Following the International Exascale Software Initiative (IESP 2008-2012 èBig Data and Extreme Computing workshops (BDEC)

http://www.exascale.org/bdec/ Overarching goal: 1.  Create an international collaborative process focused on the co-design of software infrastructure for

extreme scale science, addressing the challenges of both extreme scale computing and big data, and supporting a broad spectrum of major research domains,

2.  Describe funding structures and strategies of public bodies with Exascale R&D goals worldwide 3.  Establishing and maintaining a global network of expertise and funding bodies in the area of

Exascale computing 1 – BDEC Workshop, Charleston, SC, USA, April 29-May1, 2013 2 – BDEC Workshop, Fukuoka, Japan, February 26-28, 2014 3 – BDEC Workshop, Barcelona, Spain, January 28-30, 2015 4 – BDEC Workshop, Frankfurt, Germany, June 15-17, 2016

Europe-USA-AsiaInterna6onalseriesofWorkshopsonExtremeScaleScien6ficCompu6ng

5

6

Courtesy:MarkAsch

Courtesy:MarkAsch

Hardware&OS

Applica0ons

Shared

11

12

WhyConverge?Independentpaths:MoreCost,LessScience,

•  $mul6plehardwareso]wareinfrastructures•  $developingso]warefortwocommuni6es•  $learningtwocompu6ngmodels•  $smallerdiscoverycommunity,fewerideas•  Lessscience

13Hardware&OS

Applica0ons

Shared

14

ANL:PeteBeckman(PI),MarcSnir(ChiefScien,st),PavanBalaji,RinkuGupta,KamilIskra,FranckCappello,RajeevThakur,KazutomoYoshii

LLNL:MayaGokhale,EdgarLeon,BarryRountree,Mar6nSchulz,BrianVanEssenPNNL:SriramKrishnamoorthy,RobertoGioiosaUC:HenryHoffmannUIUC:LaxmikantKale,EricBohm,RamprasadVenkataramanUO:AllenMalony,SameerShende,KevinHuckUTK:JackDongarra,GeorgeBosilca,ThomasHerault

See http://www.argo-osr.org/ for more information

AnExascaleOpera0ngSystemandRun0meSoJwareResearch&DevelopmentProject

Developingvendorneutral,open-sourceOS/Rso]ware

15

WhatOS/RGapsMustWeAddress?•  Extremein-nodeparallelism

–  Poormechanismsforpreciseresourcemanagement(cores,power,memory,network)–  Legacythreads/tasksimplementa6onsperformpoorlyatscale

•  DynamicvariabilityofplaPorm;Powerisconstrained–  Poorrun6memechanismsformanagingdynamicoverclocking,provisioningpower,

adjus6ngworkloads–  Nomechanismsformanagingpowerdynamically,globally,andincoopera6onwith

user-levelrun6melayers

•  Hierarchicalmemory–  Poorinterfaces/strategiesformanagingdeepeningmemory

•  NewmodesforHPC–  Noportableinterfacesforeasilybuildingworkflows,in-situanalysis,coupledphysics,

advancedI/O,applica6onresilience

10/28/16 ArgoOSRPeteBeckman 15

16HierarchyofEnclaves

connectedviaaBackplane

Elas6cintranodecontainerswithresourceknobs

.

.

. Lightweightthread/tasksdesignedforcontainers,messaging,andmemoryhierarchy

Adap6ve,learning,integratedcontrolsystem

ArgoExplora6onstoAddressExascaleGaps

17Hardware&OS

Applica0ons

Shared

18

Understanding Cities

80% GDP

70% Energy

70% GHG

19

Whymeasureci6es?

20

PM 2.5, 10, 100

Acollabora6veproject:ArgonneNa6onalLaboratory,theUniversityofChicago,andtheCityofChicago

Supportedbycollabora6ngins6tu6onsandtheU.S.Na6onalScienceFounda6on.IndustryIn-Kindpartners:AT&T,Cisco,Intel,Microso],MotorolaSolu6ons,SchneiderElectric,Zebra

21

22

Waggle:AnOpenPlamormforIntelligentSensorsExploi6ngDisrup6veTechnology,EdgeCompu,ng,ResilientDesign

MachineLearningComputerVision

NovelSensorsNano/MEMS

LowPowerCPUsGPU/Smartphones

23

RelaysCurrentSensors

Cont

rol

Proc

esso

r

Real6meclock&Internalsensors

Mul6plebootmedia(μSD/eMMC)

4-coreARM

NodeControl&Communica6ons

4+4-coreARM8-coreGPU

In-Situ/EdgeProcessing

Powerful,Resilient&Hackable

Hear

tbea

tMon

itors

Re

setp

ins

“DeepSpaceProbe”DesignLinuxDevelopmentEnvironment

24

25

26

Waggle/AoTRobustTes0ng

27

EdgeCompu0ng:AnalysisandFeatureRecogni0onPreservingPrivacy……

•  ParallelCompu6ng•  OpenPlamorm•  DeepLearning

28

30

WaggleMachineLearning&EdgeCompu6ng•  WeareexploringCaffe&OpenCV

–  Convolu6onalNeuralNetworks

•  TrainingwillbedoneonsystemsatArgonne

•  Classifica6ononWaggle

32

TheData

33

PM2.5Alert

PowerOutage

Damen&Ashland

34Pete Beckman, Charlie Catlett, Rajesh Sankaran (ANL)

35

36

37

Waggle:APlaPormforResearch

•  OpenSource/OpenPlaPorm–  Reusable,extensibleso]warecommuni6es

•  MachineLearning:ComputerVision–  Datamustbereducedin-situ

•  NovelSensors:Nano/MEMS/μfluidics–  Explosionofnano/MEMS&imagingtech

•  Low-PowerCPUs:GPU/Smartphones–  Powerful,low-power,smartphoneCPUs

Opportunity:BigData+Predic6veModelsSmartSensors+Supercomputers/CloudCompu6ng=predic6onsandanalysis

38

WhyHPCGeeksShouldCare•  Newsensorsareprogrammableparallelcomputers

–  Mul6core+GPUs&OpenCLorOpenMP–  Newalgorithmsforin-situdataanalysis,featuredetec6on,compression,deeplearning–  Neednewprogmodfor“stackable”in-situanalysis(forsensorsandHPC)–  NeedadvancedOS/Rresilience,cybersecurity,networking,over-the-airprogramming

•  1000sofnodesmakeadistributedcompu0ng“instrument”–  Newstreamingprogrammingmodelneeded–  Newtechniquesformachinelearningforscien6ficdatarequired

•  Bothforwithina“node”andcollec6velyacross6meseries

•  HowwillHPCstreaminganaly0csandsimula0onbeconnectedtolivedata?–  CanwetriggerHPCsimula6onsa]erfirstapproxima6ons?(weather,energy,transporta6on)–  UnstructureddatabasewithprovenanceandmetadataforQA/collabora6on

•  UsenovelHPChardwaretosolvepowerissue?–  CanweuseneuromorphicorFPGAstoreducepowerforin-situanalysis&compression?

•  Wearetradingprecision&costforgreaterspa6alresolu6on:Whatispossible?

39

CloudDatabase

NearRealTimeHPCSimula0ons

DataAggrega0onMul0pleSources

DataAnalysisandHPCsimula0ons

ParallelComputa0onattheEdge

NewEdgeAlgorithms

40

RISCversionofConvergenceStory:Startbyenabling…removeroadblocks(theneveryone’swishlistfollows)

•  Applica6ons(ScienceDrivers):So]wareneeds&workflowpauerns•  Opera0ons

–  Supportreal-6meandstreamingfromfastnetworks–  Supportnodesharing,long-livedservices,storagerequestsforyears…

•  Architecture–  Mothballcurrentparallelfilesystems,replacewithpersistentstorageservices

(databases,KV,etc.)–  Acceleratemoveofstorageintocomputeinfrastructure

•  SoJware–  Linuxso]waredevelopmentenvironment.–  Na6vesupportforlow-levelinfrastructure:Docker,VMs,Mesos,etc.–  NewfocusonQoS;So]wareDefinedStorage,on-demandservices,etc.

41

Courtesy:MarkAsch

BOFSC2016

42

ChicagoPittsburgh

New York Portland

Atlanta

Boston

Delhi Chattanooga

2016-17Phase2Pilots

Developingapilotprojectstrategyaimedatempoweringpartneruniversi,esandna,onallaboratoriestoworkwith

theirlocalci,es.

Chicago

2016Phase1Pilots

Seattle Bristol

Newcastle

Developingapilotprojectstrategyaimedatempoweringpartneruniversi,esandna,onallaboratoriestoworkwith

theirlocalci,es.

Denver

Ini6aldiscussions

43

Ques6ons?