Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
PerformanceToolsandHolisticHPCWorkflows
KarenL.KaravanicPortlandStateUniversity
WorkPerformedwith:HolisticHPCWorkflows:DavidMontoya(LANL)
PSUDroughtProject:Yasodha Suriyakumar (CS),Hongjiang Yan(CEE),PI:HamidMoradkhani (CEE),co-PI:DacianDaescu (Math)PPerfG PSUUndergraduateProgrammers:Jiaqi Luo,LeTu
Slide 2
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
WhatisanHPCWorkflow?HolisticView
– Onescienceeffortacrossaperiodoftime/campaign,orfor1specificgoal– mayincludemultipleplatformsorlabs
– Trackresourceutilization,performance,andprogress,datamovement
– IncludesSystemServices– power,resourcebalance,scheduling,monitoring,datamovement,etc.
– IncludesDataCenter– power,cooling,physicalplacementofdataandjobs
– Informedby&InterfaceswiththeApplicationandExperimentViews
– Includeshardware,systemsoftwarelayers,application
Slide 3
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-20222
Foundational Work: All Layers of Workflow and their RelationshipsLayer 0 – Campaign• Process through time of repeated Job Runs• Changes to approach, physics and data needs as a campaign or
project is completed - Working through phasesLayer 1 – Job Run• Application to application that constitute a suite job run series• May include closely coupled applications and decoupled ones that
provide an end-to-end repeatable process with differing input parameters
• User and system interaction, to find an answer to a specific science question.
Layer 2 – Application• One or more packages with differing computational and data requirements
Interacts across memory hierarchy to archival targets• The subcomponents of an application {P1..Pn} are meant to model various
aspects of the physics Layer 3 – Package• The processing of kernels within a phase and associated interaction with
various levels of memory, cache levels and the overall underlying platform• The domain of the computer scientist
Slide 4
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-20222
We described a layer above the application layer (2) that posed use cases that used the application in potential different ways. This also allowed the entry of environment based entities that impact a given workflow and also allow impact of scale and processing decisions. At this level we can describe time, volume and speed requirements.
Layer 1 – Ensemble of applications – Use Case – example template
Slide 5
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
OurGoal
MeasurementinfrastructureinsupportofHolisticHPCWorkflowPerformanceAnalysisandValidation
Goal#1:PPerfG
• Motivation:Howcanweautomaticallygeneratetheworkflowlayerdiagrams?
• InitialFocus:• Layer2(Application):OneormorepackageswithdifferingcomputationalanddatarequirementsInteractsacrossmemoryhierarchytoarchivaltargets
• Approach:• ImplementsimpleprototypeusingpythonandTkInter• Investigatedatacollectionoptions• Evaluatewithacasestudy
KarenL.Karavanic7/9/18 6
PPerfG
• PPerfG:AVisualizationToolforHolisticHPCWorkflowsforuseinbothperformancediagnosisandprocurement
• Capturesthedatamovementbehaviorbetweenstoragelayers,andbetweendifferentstagesofanapplication
• Challenges:MeasurementandDataintegrationtogeneratethedisplay• InitialprototypedevelopedwithPythonandTkInter
KarenL.Karavanic7/9/18 7
PPerfG Prototype
KarenL.Karavanic7/9/18 8
PPerfG Prototype
KarenL.Karavanic7/9/18 9
PPerfG Prototype:simplejson inputfile
KarenL.Karavanic7/9/18 10
CaseStudy:TheDroughtHPC1 ProjectGoals
• DevelopaperformantimplementationofDroughtHPC,anovelapproachtodroughtpredictiondevelopedatPortlandStateUniversity
• Scaletheapplicationtodofiner-grainedsimulations,andtosimulatealargergeographicalarea
o DroughtHPCo improvespredictionaccuracyforatargetgeographicalareao usesdataassimilationtechniquesthatintegratedatafromhydrologicmodelsandsatellite
datao UsesMonteCarlomethodstogenerateanumberofsamplespercello Inputsspanavarietyofdata:soilconditions,snowaccumulation,vegetationlayers,canopy
coverandmeteorologicaldatao UsesVariableInfiltrationCapacity(VIC)MacroscaleHydrologicModel2
1 https://hamid.people.ua.edu/research.html2 Liang,X.,D.P.Lettenmaier,E.F.Wood,andS.J.Burges(1994),Asimplehydrologicallybasedmodeloflandsurfacewaterandenergyfluxesforgeneralcirculationmodels, J.Geophys.Res., 99(D7),14415–14428, doi:10.1029/94JD00483
KarenL.Karavanic7/9/18 11
CaseStudy:DroughtHPC Code
• ApplicationiswritteninPython,andusestwohydrologicmodelsVIC[2]writteninC,andPRMS[3]writteninFORTRANandC
• Themodelingcodesaretreatedas“blackboxes”bythedomainscientists• Landsurfaceofthetargetgeographicalareaismodeledasagridof
uniformcells,andsimulationdividesitintojobs,withgroupof25cellsineachjob
• DataisSmallbyourstandards:Forajobthatsimulates50meteorologicalsamplesandonemonthtimeperiod:
• inputdatasize:144.5MB• satellitedata:132MB
• Runtimefor1 job(25cells)onsingle-nodeisapproximatelytwohourswiththeinitialPythonprototype
KarenL.Karavanic7/9/18 12
KarenL.Karavanic7/9/18 13
Yan,H.,C.M.DeChant,andH.Moradkhani(2015), ImprovingSoilMoistureProfilePredictionwiththeParticleFilter-MarkovChainMonteCarloMethod,IEEETransactiononGeoscienceandRemoteSensing,DOI:10.1109/TGRS.2015.2432067
InitializationOverheads
• Meanof30runs,simulationof24hours(onehourtimesteps)• Columbiariverbasin(CRB)has5359cellsinVIC4dataset,butithas11280cellsinVIC5dataset.Thedatausedinthemeteorologicalforcingisdifferentbetweenthetwoversions.VIC5dataincludesprecipitation,pressure,temperature,vaporpressure,andwindspeed.VIC4dataspecifiesmaximumtemperature,minimumtemperature,precipitationandwindspeed.
KarenL.Karavanic7/9/18 14
Model Data Initialization(Milliseconds)
Work(Milliseconds)
WriteOutput(Milliseconds)
Total
VIC4–ASCIItextfiles
Sample–singlecell
177.592 (99%)
0.241 0.144 177.977
CRB25cells
4,079.126 (98%)
70.990 10.774 4,170.89
VIC5–NetCDFfiles
Sample–Stehekin data – 20 cells
19,088.990 (99%)
196.116 29.065 19,314.171
CRB–11280cells
26,277.904 (47%)
29,001.285 80.398 55,359.587
DroughtHPC /VICcallingpatterns
• InitialDroughtHPC prototypecode(python)calledVICversion4 (“classicdriver”):
• Foreachgridcell• Foreachsimulationtimestep
• Foreachprobabilisticsample• CallVIC• Useresultstocomputeinputsfornexttimestep
• VIC4isTime-before-space• NewVIC5“imagedriver”isSpace-before-time,designedforcall-once
• UsesMPI,embarassingly parallelmodel(eachcellcomputationisindependent)• SinglecalltoVICcannowcomputeoveralldata,reducingcalloverhead
• Oursolution:addextensibilitytoVIC,injectourcodeintothemodel
KarenL.Karavanic7/9/18 15
PPerfG:VisualizingDataPatternsAcrossSeparateCodes
KarenL.Karavanic7/9/18 16
drawing(notscreenshot)
PPerfG:Illustratingthechangeincallingpattern
KarenL.Karavanic7/9/18 17
drawing(notscreenshot)
PPerfG DataCollection
• PerformanceDatawascollectedwithavarietyofperformancetools• Nosingleperformancetoolprovidesallofthedataweneed• Notoolcharacterizesthecallingpattern/interactionsbetweenPythonandVIC
• PerfTrack performancedatabase1 usedtointegratethedatapostmortembutsomeintegrationwasdonemanually
• InterfaceoverPostGreSQL relationaldatabase• Multiplerunsfordifferentmeasurementtools
• Json filewasgeneratedmanually
1KarenL.Karavanic,JohnMay,KathrynMohror,BrianMiller,KevinHuck,RashawnKnapp,BrianPugh,"IntegratingDatabaseTechnologywithComparison-basedParallelPerformanceDiagnosis:ThePerfTrack PerformanceExperimentManagementTool," SC2005.
KarenL.Karavanic7/9/18 18
PPerfG FutureWork
• HowtoeasecomparisonofdifferentversionswithPPerfG?• Slidertomoveforwardovertimefromstarttofinish?• Canwegeneratethejson automaticallyfromPerfTrack?• Howtointegrateapplication/developersemanticswithmeasurementdata?
• Howtolinkdatastructuresinmemorywithfiles?• Howtolabelthephases?• Howtocollecttheloopinformationatthebottom?
• Howtoshowscalingbehaviors?• Numberoffilespersimulationday?• Sizeoffilespersimulationcell?• TrafficMapidea:useedgecolorstoshowdatacongestion
KarenL.Karavanic7/9/18 19
ConclusionsandFutureWork
• Weproposeanewperformancemetric:WorkflowCriticalPath• WCP:WhatpartoftheEntireWorkflow tofocuson?• DroughtHPC casestudy:patternoffileactivity,callingpattern,overheadofVICinitialization
• WehavedesignedPPerfG,avisualizationforWorkflowLayer2:Application• WorkflowLayers:differentperspectivesofanHPCworkflowusedinHolisticHPCPerformanceDiagnosis
• DataCollectionischallenging• NeedtointegrateLayer3(Package)– drillingdownintoDroughtHPC andVIC
• PerfTrack isusefultogatheranddosomeintegrationofdata• Currentlythegenerationofjson filesismostlymanual
KarenL.Karavanic7/9/18 20
Acknowledgments
• PSUstudentsHenryCooney, Tu Le,Jiaqi Luo,andKristinaFryecontributedideasanddiscussionsandimplementedsoftwareusedinthisproject. StudentsinKaravanic’s AcceleratedComputingandIntroductiontoPerformancecoursesperformedanalysisandparallelizationoftheVICmodelingcode.
• ThismaterialisbaseduponworksupportedbytheNationalScienceFoundationunderGrantNo.1539605. Anyopinions,findings,andconclusionsorrecommendationsexpressedinthismaterialarethoseoftheauthor(s)anddonotnecessarilyreflecttheviewsoftheNationalScienceFoundation.
• ThisworksupportedinpartbyagenerousgiftfromIntelCorp.• PartofthisworkwasconductedattheUltrascale SystemsResearchCenter(USRC)supportedbyLosAlamosNationalLaboratoryunderContractNo.DE-AC52-06NA25396withtheU.S.DepartmentofEnergy.TheU.S.Governmenthasrightstouse,reproduce,anddistributethisinformation.ThisworksupportedinpartbyPortlandStateUniversityandbytheNewMexicoConsortium.
Contact:[email protected]
KarenL.Karavanic7/9/18 21