108
LDMS Version 3 Tutorial https://github.com/ovis-hpc/ovis Jim Brandt, Tom Tucker, Ann Gentile, Nichamon Nasksinehaboon, Narate Taerat Open Grid Computing, Inc. Sandia National Laboratories 04/2017 OGC | Open Grid Computing, Austin, TX Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. SAND2017-5153 O

LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Embed Size (px)

Citation preview

Page 1: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2011-XXXXP

LDMSVersion3Tutorialhttps://github.com/ovis-hpc/ovis

JimBrandt,TomTucker,AnnGentile,Nichamon Nasksinehaboon,Narate TaeratOpenGridComputing,Inc.

SandiaNationalLaboratories04/2017

OGC | Open Grid Computing, Austin, TX

Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.

SAND2017-5153 O

Page 2: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Aboutthisdocument

• Thisisasub-selectionofmaterialsfromanLDMStutorial.ThefulltutorialincludesVM’swithanLDMSinstallation.TheVMisnothere,howevertherunscriptsfromtheexercisesareincluded.

• IfyouinstallLDMSonyoursystem,youcanthenusethesescriptsasmodelsandworkthroughtheexercises.

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 3: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Resources

• Documentation(Building,Using)• https://github.com/ovis-hpc/ovis/wiki

• SourceCode• https://github.com/ovis-hpc/ovis• git clonehttps://github.com/ovis-hpc/ovis.git

• Publications:• https://ovis.ca.sandia.gov

Page 4: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

TutorialFormatOverviewoftheLightweightDistributedMetricService(LDMS)

• IntroductiontoHPCmonitoring• OverviewoftheLDMSframework

• LDMSarchitecturedescriptionSetup

• Environmentsetupdescription andverification• Introductiontosupport programsandhelperscriptsforuseinlabwork

Hands-onlabsInstructorwalkthroughandfacilitatedstudentexploration• Lab1:Samplers

• Samplerstartupandlocalandremoteverification• Lab2:Aggregators

• Aggregationstartupandverificationusingsampler• Aggregationofallotherattendees’samplers

• Lab3:Dynamicconfigurationsandresilience• Lab4:StoringdatainCSVstores• Lab5:CalculatingderiveddataandsavingtoaCSVstore• Lab6:StoringthedatainanSOSdatabase• Lab7:ExploringdatainanSOSdatabase• Lab8:Dataanalysis andVisualizationfromanSOSdatabase

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 5: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

IntroductiontoHPCMonitoring• CanonicalMonitoringGoal:Real-timetroubleshooting(e.g.,nodesdown,outofmemory,resourcecongestion)

• HPCmonitoringconcerns:• Impactonrunningapplications• Howtoaggregatedatafromdifferentsourcesforanalysis.

• Network,filesystem,CPUutilization,memoryutilization• Whatanalyseswouldbemeaningful.

• e.g.,Whatrawandderiveddatawouldindicateperformance-impactingnetworkcongestion.• Howtoprocesslargeamountsofdatainreal-time

• Asaresult,canonicalsystemmonitoring:• Typicallyperformedatintervalsofminutes• Analyseslargelyconsistsofdetectingmonitoringvaluesexceedingpre-definedthresholds

• Dataisunsuitableforgainingsignificantinsightsintoapplicationperformanceproblems

Page 6: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

MonitoringCanEnableResource-AwareComputingLightweighthigh-frequencycontinuousrun-timemonitoring,analysis,andfeedbackcouldenable:

• Fasterproblemdetection,includingcomponent-specificissuesbasedonaparticularcomponent’sknownbehaviorsandenvironment(e.g.,thermalvariations)

• Insightintoalarge-scaleapplication’suseofresourcesunderproductionconditions,includingcontentionfromotherapplications

• Dynamicapplication-to-resourcemappingbasedonapplicationneedsandsystemstate

• Co-schedulingofapplicationsbasedoncontentionforsharedresources• Dynamicsystemoperationsbasedonadatacenter’spowerdemands,temperatureetc.

Page 7: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LDMSOverview• WhatistheLightweightDistributedMetricSystem(LDMS)?

• Collectnumericdata• Moveandaggregatedata• Storedata• Analyzedata

• Troubleshooting• Optimization• Informfuturedesigns

• Typicalusecasedescriptions• Supportedtechnologies

• LinuxonallbutIBMBlueGeneplatforms• Sourcesofcode,information,andsupport

Page 8: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LightweightDistributedMetricService(LDMS)HighLevelOverview

*Onlythecurrentdataisretainedon-node

Page 9: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LDMSPluginArchitecture

Memory

LDMSAPI(libldms)

SamplerPlug-in Interface

TransportDriverInterface

MemorySampler

HSNSampler

RDMATransport

SocketTransport

LDMSDaem

on

MetricSet

MetricSet

MetricSet

MetricSet

StoragePlug-inInterface

LDMSAPI(libldms)

Storage

SOS

MySQL

CSVCSVStore

OtherStore

Page 10: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation
Page 11: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

DataFlow

Page 12: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Supportedplatformsandnetworks

• Platforms• Rhel 6and7• SLES11&12• Ubuntu• CrayXE6,XKandXC

• Transports• Socket• Crayugni

• Aries• Gemini

• RDMA• Infiniband• iWarp

Page 13: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Builddependencies

• Typicalcomputenodeenvironment• Autoconf >=2.3,automake,autotool• Libevent2-devel>=2.0.31• OpenSSH-devel

• Endusehosts(monitorcluster,specialaggregationhosts,etc.)• Python

• 2.6withtheargparsemodule• 2.7

• Swig• Doxygen fordocumentation

Page 14: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LDMSInstallationmethods

• Manuallyinstallusingautoconf andautomake• DeploymentusingRPMNote:Forthisdemo,LDMSispre-installedonstudentVMsin/opt/ovis.

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 15: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Gettingstarted:Loginandsetupyourenvironmentssh –Yovis_public@XXXXXXX

$ovis_public@XXXXXX's password:*******

ovis_public@ovis-demo-login ~[sshd:]

$ssh –Yovis_public@ovis-demo-01

Note:“/home/ovis_public/demo/ldmsd/env/ldms-env.sh” isusedtosetupLDMSenvironment

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 16: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

VMdirectorystructure• VMsincludesourcecode,scriptsandconfigurationfilesforeveryexercise,helpermini-applicationsforuseintheexercises,andsupportingvisualizationtools(e.g.,gnuplot).

• Directorystructure:• source-code/

• ldms/sourcecodeofLDMSlatestreleaseversion• util/utilitycodesforuseintheexamples

• data/Pre-collectednumericdataandlogmessagedata• ldms-data/ReleasednumericdatafromNCSABlueWaters

• csvAsubsetofBlueWatersdataintheCSVformat• demo/

• ldmsd/• conf/ConfigurationfilesusedintheLDMSdemo• data/Placeholdersfortheto-be-storedLDMSdata• env/Scriptstosetupenvironmentvariables• scripts/HelperscriptstodeployingLDMSdaemons

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 17: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Gettingstarted:SetupandverifyyourEnvironment• Systemenv.var.

• PATH= ${OVIS_HOME}/bin/:${OVIS_HOME}/sbin/:${PATH}• LD_LIBRARY_PATH= ${OVIS_HOME}/lib/:${LD_LIBRARY_PATH}• PYTHONPATH= ${OVIS_HOME}/lib/python2.7/site-packages/:${PYTHONPATH}

• LDMSenv.var.• ZAP_LIBPATH= ${OVIS_HOME}/lib/ovis-lib• LDMSD_PLUGIN_LIBPATH= ${OVIS_HOME}/lib/ovis-ldms

• LDMSauthentication• LDMS_AUTH_FILE= <pathtofilewithyoursharedsecret>

• Permissions600• Format:secretword=<8ormorecharacters>(e.g.secretword=mylittlesecret)

NOTE:${OVIS_HOME}=/opt/ovis inthisexample

Note:VM’snotinthereleasematerials.Additional configurationscriptsintheassociatedtarball

Page 18: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Testcode:memeater.c• Memeater codewhichrepeatedlyallocs mem.RunwithLDMStoseechangesinmemoryutilizationvaluesreportedin/proc/meminfo.

• Locatedat/home/ovis_public/source-code/util/memeater.c.Compilewithcc.

./a.outPeriodicallyincreasememoryallocated

Sleepbetweenalloc.Changethiswrtsampling frequency.

Sleepbeforereleasingmemory

Page 19: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LabExercises

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 20: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LAB1: Samplers

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 21: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

StartandconfigureaLDMSdaemonLabGoals:• BasicLDMSdaemonstartupandconfigurationflags/args

• Manualandrun-timeconfigurationoptions• Outputoptions

• Logfilesandloglevels• Debuginformation

• manpages• man/opt/ovis/share/man/man8/ldmsd.8– opensldmsdmanpages• man/opt/ovis/share/man/man8/ldmsd_controller.8– opens“ldmsd_controller”manpages

• Useofldms_ls utilityasadiagnostictool• manpages

• man/opt/ovis/share/man/man8/ldms_ls.8– opensldms_lsmanpages

Page 22: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LDMSPluginArchitecture

Memory

LDMSAPI(libldms)

SamplerPlug-in Interface

TransportDriverInterface

MemorySampler

HSNSampler

RDMATransport

SocketTransport

LDMSDaem

on

MetricSet

MetricSet

MetricSet

MetricSet

StoragePlug-inInterface

LDMSAPI(libldms)

Storage

SOS

MySQL

CSVCSVStore

OtherStore

Page 23: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

StartaLDMSdaemon

• Startldmsdldmsd –x sock:10001 –l sampled.log –S sampled.sock –r sampled.pid –p 20001

• -x: Transport: listeningport• -l: Specifythelogfilepathandname• -S: SpecifytheUnixdomainsocketforcommunicationwithldmsctl orldmsd_controller• -r: Specifywheretowritethepid file• -p: SpecifythelistenerportforremoteconfigurationNote:ThelogandUnixdomainsocketnamesarejuststrings.Weuse“samplerd”heretodenotethosebeingusedbyaldmsd thatwillberunning“samplers”asopposedtoperformingaggregation.

Page 24: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Checktoseeifldmsd isrunning

• Usingpsps auxw |grepldmsd |grep–vgrep

• Returnssomethinglike:“ovis_pu+35820.00.14016042204?Ssl12:510:00ldmsd -xsock:10001-Ssamplerd.sock“if running

• Returns:blanklineifnotrunning• Usingldms_lsldms_ls –hlocalhost–xsock–p10001

• Returns:“Connectionfailed/rejected.”ifldmsd specifieddoesnotexist• Returns:blanklineiftheldmsd specifiedexistsbuthasnometricsetsconfigured

Page 25: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:Runldmsd

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 26: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Manuallyloadandconfigureasamplerplugin

LabGoals:• Basicsamplerpluginoperation

• Manualdynamicconfigurationusingthe“ldmsd_controller”utility• Staticconfigurationusingaconfigurationfile• manpages

• man/opt/ovis/share/man/man7/Plugin_meminfo.7– opensmeminfo pluginmanpages• man/opt/ovis/share/man/man7/Plugin_vmstat.7– opensvmstat pluginmanpages

• Useofldms_ls utilityasadiagnostictool• manpages

• man/opt/ovis/share/man/man8/ldms_ls.8– opensldms_lsmanpages

Page 27: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

ConfigureLDMSdaemonSamplerPlugin(s)

• Loadthe“meminfo”samplerplugin• Configureloaded“meminfo”samplerplugin

• Givethesetname(instance)• Givethenodename(producer)• GivethecomponentID• Plugin-specificarguments

• Startsamplerpluginwithaparticularsamplingintervalandoffset

optional

Page 28: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Connectldmsd_controller toanldmsd

• Setup“ldmsd_controller”connectiontotheaggregatoroversocket$ldmsd_controller --host localhost --port 20001--auth_file ~/.ldmsauth.conf

Welcome to the LDMSD control processor

localhost:20001>

Page 29: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:Connecttoldmsd withldmsd_controller

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 30: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LDMSPluginArchitecture

Memory

LDMSAPI(libldms)

SamplerPlug-in Interface

TransportDriverInterface

MemorySampler

HSNSampler

RDMATransport

SocketTransport

LDMSDaem

on

MetricSet

MetricSet

MetricSet

MetricSet

StoragePlug-inInterface

LDMSAPI(libldms)

Storage

SOS

MySQL

CSVCSVStore

OtherStore

Page 31: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

InteractiveConfigurationusingtheldmsd_controller

• Loadthe“meminfo”samplerlocalhost:20001> load name=meminfo

• Configurethe“meminfo”samplerlocalhost:20001> config name=meminfo

producer=<$HOSTNAME> instance=<$HOSTNAME>/meminfocomponent_id=<host number>

Page 32: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

QuerycurrentsetsonanLDMSDaemonusing“ldms_ls”• Useldms_ls toquerythecurrentsetsavailableonanLDMSdaemon

$ ldms_ls –h localhost -x sock -p 10001

ovis-demo-01/meminfo

$

Page 33: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Getthesetinformationbeforestartingthe“meminfo”sampler

$ ldms_ls –h localhost -x sock -p 10001 –v ovis-demo-01/meminfo

ovis-demo-01/meminfo: inconsistent, last update: Wed Dec 31 18:00:00 1969 [0us]

METADATA --------

Producer Name : ovis-demo-01

Instance Name : ovis-demo-01/meminfo

Schema Name : meminfo

Size : 1904

Metric Count : 45

GN : 2

DATA ------------

Timestamp : Wed Dec 31 18:00:00 1969 [0us]

Duration : [0.000000s]

Consistent : FALSE

Size : 400

GN : 1

-----------------

Page 34: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Querycurrentmetricvaluesbeforestartingthe“meminfo”sampler$ldms_ls -x sock -p 10001 -l ovis-demo-01/meminfo

ovis-demo-01/meminfo: inconsistent, last update: Wed Dec 31 18:00:00 1969 [0us]

M u64 component_id 1

D u64 job_id 0

D u64 MemTotal 0

D u64 MemFree 0

D u64 MemAvailable 0

D u64 Buffers 0

D u64 Cached 0

D u64 SwapCached 0

D u64 Active 0

D u64 Inactive 0

Page 35: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Startthe“meminfo”sampler

• Startthe“meminfo”samplerlocalhost:20001> start name=meminfo interval=1000000 offset=0

• Thisstartsthesamplerupdatingthemetricvaluesevery1second

Page 36: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Getthesetinformation$ ldms_ls -x sock -p 10001 –v ovis-demo-01/meminfo

ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:46:55 2017 [3486us]

METADATA --------

Producer Name : ovis-demo-01

Instance Name : ovis-demo-01/meminfo

Schema Name : meminfo

Size : 1904

Metric Count : 45

GN : 2

DATA ------------

Timestamp : Fri Feb 10 12:46:55 2017 [3486us]

Duration : [0.000068s]

Consistent : TRUE

Size : 400

GN : 259

-----------------

Page 37: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Querycurrentmetricvalues

$ldms_ls -x sock -p 10001 -l ovis-demo-01/meminfo

ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:50:25 2017[4156us]

M u64 component_id 1

D u64 job_id 0

D u64 MemTotal 1884188

D u64 MemFree 828244

D u64 MemAvailable 1639232

D u64 Buffers 948

D u64 Cached 915992

D u64 SwapCached 0

D u64 Active 84336

D u64 Inactive 891196

Page 38: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Checksourceforreference$cat/proc/meminfoMemTotal:1884188kBMemFree:828420kBMemAvailable:1639912kBBuffers:948kBCached:916396kBSwapCached:0kBActive:85144kBInactive:890212kBActive(anon): 58272kBInactive(anon):8372kBActive(file): 26872kBInactive(file):881840kB

Page 39: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:Manualsamplerconfiguration

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 40: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

• Killallofyourldmsd inpreparationforthenextsection$pkill ldmsd

• Killaparticularldmsd• ps auxw |grepldmsd |grep–vgrepovis_pu+35820.00.14016042204?Ssl 12:510:00ldmsd -xsock:10001-Ssamplerd.sock• kill3582

• Checktomakesureitisdead$ps auxw |grepldmsd |grep–vgrep

Page 41: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Startldmsd andsamplerpluginusingaconfigurationfile• ldmsd canbestartedusingaconfigurationfile

• Syntaxisidenticaltothatusedformanualconfiguration• CanbeusedtorunandconfigureBOTHsamplerandaggregatorldmsd

• Sampleconfigurationfileformeminfo example:$cat /home/ovis_public/demo/ldmsd/conf/simple_sampler.conf

load name=meminfo

config name=meminfo producer=<$HOSTNAME> instance=<$HOSTNAME>/meminfocomponent_id=<host number>

start name=meminfo interval=1000000

• Runldmsd usingthisconfigurationfile$ldmsd -x sock:10001 -l samplerd.log -S samplerd.sock –c /home/ovis_public/demo/ldmsd/conf/simple_sampler.conf

Page 42: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Querycurrentmetricvalues

$ldms_ls -x sock -p 10001 -l ovis-demo-01/meminfo

ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:50:25 2017 [4156us]

M u64 component_id 1

D u64 job_id 0

D u64 MemTotal 1884188

D u64 MemFree 828244

D u64 MemAvailable 1639232

D u64 Buffers 948

D u64 Cached 915992

D u64 SwapCached 0

D u64 Active 84336

D u64 Inactive 891196

Page 43: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:Staticsamplerconfiguration

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 44: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

ConfigurationToolsSummary

Dynamic/manualconfiguration(remoteorlocal)• ldmsd_controller – Pythonscriptthatcanconnecttoaldmsd viaaconfigurednetworksocketor alocalUnixDomainSocket

Staticconfiguration(local)• Configurationfile– loadedatldmsd runtime

Page 45: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Configurationoptionandtool.

• CMDlineconfiguration–c• ldmsctl

• CinterfacetoconfigureLDMSD.• Onlyforsamplerdaemon

• ldmsd_controller• PythoninterfacetoconfigureLDMSD.• ConnecttoanLDMSDusingUNIXdomainsocket(local)orsocket(remote).• Auto-completion• Commandhelp

• Moredetailscanbefoundathttps://www.opengridcomputing.com/wordpress/index.php/ovis-3-3-user-guide/#ldmsd-config

Page 46: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Startldmsd_controller

• ConnectwithUNIXdomainsocketldmsd_controller --sockname samplerd.sock

• Connectwithsocketldmsd_controller --host localhost --port 20001 --auth_file ~/.ldmsauth.conf

Page 47: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

ldmsd_controller:Getcommandlist

samplerd.sock> help

Documented commands (type help <topic>):

========================================

EOF prdcr_del stop udata version

add prdcr_start store udata_regex

config prdcr_start_regex strgp_add updtr_add

env prdcr_stop strgp_del updtr_del

help prdcr_stop_regex strgp_metric_add updtr_match_add

include quit strgp_metric_del updtr_match_del

info say strgp_prdcr_add updtr_prdcr_add

load shell strgp_prdcr_del updtr_prdcr_del

loglevel source strgp_start updtr_start

logrotate standby strgp_stop updtr_stop

prdcr_add start term usage

DefinitelyuseforsamplerdDefinitelyuseforaggregatorsUsetoloadandconfigpluginGethelpanddaemonstatus

Page 48: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

ldmsd_controller:commandhelp

samplerd.sock> help prdcr_add

Add an LDMS Producer to the Aggregator

Parameters:

name= A unique name for this Producer

xprt= The transport name [sock, rdma, ugni]

host= The hostname of the host

port= The port number on which the LDMS is listening

type= The connection type [active, passive]

interval= The connection retry interval (us)

Page 49: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LAB2: Aggregators

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 50: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LDMSPluginArchitecture

Memory

LDMSAPI(libldms)

SamplerPlug-in Interface

TransportDriverInterface

MemorySampler

HSNSampler

RDMATransport

SocketTransport

LDMSDaem

on

MetricSet

MetricSet

MetricSet

MetricSet

StoragePlug-inInterface

LDMSAPI(libldms)

Storage

SOS

MySQL

CSVCSVStore

OtherStore

Page 51: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

ConfigureaLDMSdaemon(ldmsd)toAggregatemetricset(s)

Goals:• Addlistofconnectionstosamplerldmsd’s• Starttheconnections• CreateanUpdatepolicy

• Howoftentogetametricset’supdate• Fromwhichsamplerldmsd’s toaggregate

• StarttheUpdatepolicy

Page 52: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Startanldmsd thatwillbeusedforaggregation

• StartLDMSD

ldmsd -x sock:10002 –m 10M -l aggd.log –S aggd.sock –p 20002

• -x: transport:listenerport• -m: Allocatesetmemoryforaggregatedmetricsets(default:512K)• -l: Specifythelogfilepath• -S: Specify“UnixDomainSocket”nameusedforlocalconfiguration• -p: Specifythelistenerportforremoteconfiguration

Page 53: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Interactiveaggregatorconfiguration

• Setup“ldmsd_controller”connectiontotheaggregatoroversocket$ldmsd_controller --host localhost --port 20002--auth_file ~/.ldmsauth.conf

Welcome to the LDMSD control processor

localhost:20002>

Page 54: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

SimpleAggregatorConfiguration

• Configuretheaggregatortoaggregatethe“meminfo”setfromthesamplerdaemonlocalhost:20002> prdcr_add name=bar host=$HOSTNAME port=10001 xprt=sock type=active interval=20000000

localhost:20002> prdcr_start name=bar

• name:policytag• host:hostnameofthesamplerdaemon• port:Listenerportofthesamplerdaemon• xprt:Transportthesamplerdaemonlistenson• type:Always“active”• interval:Re-connectinterval

Page 55: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Pluginstatus(onagg afterstartedprdcr butbeforeupdtr)localhost:20002> status

Page 56: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Querycurrentmetricvaluesontheaggregator

$ldms_ls –h localhost -x sock -p 10002 -l

ovis-demo-01/meminfo: inconsistent, last update: Wed Dec 31 18:00:00 1969 [0us]

M u64 component_id 1

D u64 job_id 0

D u64 MemTotal 0

D u64 MemFree 0

D u64 MemAvailable 0

D u64 Buffers 0

D u64 Cached 0

D u64 SwapCached 0

D u64 Active 0

D u64 Inactive 0

Page 57: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

SimpleAggregatorConfiguration

• Configuretheaggregatortoupdate the“meminfo”setlocalhost:20002> updtr_add name=foo interval=1000000 offset=200000

localhost:20002> updtr_prdcr_add name=foo regex=.*

localhost:20002> updtr_start name=foo

• name: policy tag

• interval: update interval (in usec)• Example: interval=1000000 means aggregate every 1 seconds

• offset: Target (in us) from <epoc sec>.000000

• Example: offset=10000 means aggregate every <interval> seconds at 10ms into the second.

• regex: regular expression to match the target producers tag(s)

Page 58: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Pluginstatus(onaggregatorafterstartedprdcr andupdtr)

localhost:20002> status

Page 59: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Querycurrentmetricvaluesontheaggregator

$ldms_ls -h locahost -x sock -p 10002 -l ovis-demo-01/meminfo

ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:50:25 2017 [4156us]

M u64 component_id 1

D u64 job_id 0

D u64 MemTotal 1884188

D u64 MemFree 828244

D u64 MemAvailable 1639232

D u64 Buffers 948

D u64 Cached 915992

D u64 SwapCached 0

D u64 Active 84336

D u64 Inactive 891196

Page 60: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:Validatemanualconfigurationandaggregationfromsampler

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 61: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Startldmsd andaggregationusingaconfigurationfile• ldmsd canbestartedusingaconfigurationfile

• Syntaxisidenticaltothatusedformanualconfiguration• CanbeusedtorunandconfigureBOTHsamplerandaggregatorldmsd

• Sampleconfigurationfileformeminfo example:$cat /home/ovis_public/demo/ldmsd/conf/simple_aggregator.conf

prdcr_add name=localhost host=$HOSTNAME port=10001 xprt=sock type=active interval=20000000

prdcr_start name=localhost

updtr_add name=foo interval=1000000 offset=200000

updtr_prdcr_add name=foo regex=.*

updtr_start name=foo

• Runldmsd usingthisconfigurationfile$ldmsd -x sock:10002 -l aggd.log -S aggd.sock –c /home/ovis_public/demo/ldmsd/conf/simple_aggregator.conf

Page 62: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Querycurrentmetricvalues

$ldms_ls -x sock -p 10002 -l ovis-demo-01/meminfo

ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:50:25 2017 [4156us]

M u64 component_id 1

D u64 job_id 0

D u64 MemTotal 1884188

D u64 MemFree 828244

D u64 MemAvailable 1639232

D u64 Buffers 948

D u64 Cached 915992

D u64 SwapCached 0

D u64 Active 84336

D u64 Inactive 891196

Page 63: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:Validatestaticaggregatorconfigurationandaggregationfromsampler

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 64: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

AggregatefromstudentVMs

• Killaggregatorldmsd• Restartldmsd using“-cstudents_all_aggregator.conf”• Killaggregatorldmsd• Restartldmsd using“-cstudents_subset_aggregator.conf”

Page 65: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Pluginstatus(onaggregator fromallstudents)

localhost:20002> status

Page 66: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:Validatestaticaggregatorconfigurationandaggregationfromsampler

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 67: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LAB3: DynamicChangesandResilience

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 68: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

DynamicConfigurationChanges

• Dynamicconfiguration• Samplerdaemons

• stopsamplerplugins• startwithdifferentintervals

• Aggregatordaemons• stopprdcr/updtr/strgp• removeprdcr/updtr/strgp• changeinterval

Page 69: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

DynamicChangesandRobustness• On-the-flyadditionsofsamplerswillbediscoveredbytheaggregatingldmsd

• Exercise– onestudentwilladdthevmstat samplervialdmsd_controllertohisrunningldmsd.Allotherswillseeitappearintheiraggregatorswhicharecollectingfromthatsampler.

• Exercise – onestudentwillstophismeminfo samplervialdmsd_controller inhisrunningldmsd.Allotherswillseeinldms_ls timestampoutputthatthatstudent’smetricsetceasestoupdate.

• Exercise – thesamestudentwillrestarthismeminfo samplervialdmsd_controller inhisrunningldmsd.Allotherswillseeinldms_ls timestampoutputthatthatstudent’smetricsetresumesupdating.

• SamplersandAggregatorscanbestartedinanyorder• LDMScollectionandtransporttopologiesarerobusttoSamplersandAggregatorsbeingkilledandrestarted

• Exercise – onestudentwillkillhisldmsd sampler.Allotherstudentswillseeinldms_ls timestampoutputthatthatstudent’smetricsetceasestoupdate

• Exercise – thesamestudentwillrestarthisldmsd sampler.Allotherstudentswillseeinldms_ls timestampoutputthatthatstudent’smetricsetresumesupdating.

Page 70: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LAB4: StoringdatainCSVstores

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 71: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LDMSPluginArchitecture

Memory

LDMSAPI(libldms)

SamplerPlug-in Interface

TransportDriverInterface

MemorySampler

HSNSampler

RDMATransport

SocketTransport

LDMSDaem

on

MetricSet

MetricSet

MetricSet

MetricSet

StoragePlug-inInterface

LDMSAPI(libldms)

Storage

SOS

MySQL

CSVCSVStore

OtherStore

Page 72: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Storingdatatocsvfile(s)• Goals:

• Configureacsvstorewithldmsd_controller• Configureacsvstorewithconfiguration file• Storeoptions

• Exampleoutput:#Time,Time_usec,ProducerName,component_id,job_id,MemTotal,MemFree,MemAvailable,Buffers,Cached,SwapCached,Active,Inactive,Active(anon),Inactive(anon),Active(file),Inactive(file),Unevictable,Mlocked,SwapTotal,SwapFree,Dirty,Writeback,AnonPages,Mapped,Shmem,Slab,SReclaimable,SUnreclaim,KernelStack,PageTables,NFS_Unstable,Bounce,WritebackTmp,CommitLimit,Committed_AS,VmallocTotal,VmallocUsed,VmallocChunk,HardwareCorrupted,AnonHugePages,HugePages_Total,HugePages_Free,HugePages_Rsvd,HugePages_Surp,Hugepagesize,DirectMap4k,DirectMap2M1487105964.002482,2482,ovis-demo-09,9,0,1884188,571028,1688632,0,1212004,6108,104536,1122496,8276,8580,96260,1113916,0,0,839676,793956,420,0,10552,24812,1796,52124,40104,12020,1792,3280,0,0,0,1781768,387984,34359738367,7216,34359728128,0,2048,0,0,0,0,2048,47040,20500481487105963.002583,2583,ovis-demo-02,2,0,1884188,1665280,1671132,948,107512,0,71540,80920,44128,8308,27412,72612,0,0,839676,839676,0,0,44000,22264,8436,35680,24304,11376,1600,2940,0,0,0,1781768,296444,34359738367,7216,34359728128,0,6144,0,0,0,0,2048,34752,20623361487105963.001964,1964,ovis-demo-08,8,0,1884188,1623168,1644996,948,129700,0,89312,101956,60788,8332,28524,93624,0,0,839676,839676,0,0,60620,23912,8500,36456,24608,11848,1872,4364,0,0,0,1781768,403252,34359738367,7216,34359728128,0,16384,0,0,0,0,2048,44992,2052096

Page 73: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

AggregatorConfigurationtostoremetricsetdatausingCSVstore

• Configuretheaggregatortostorethe“meminfo”settoacsvfile usingldmsd_controller

• Loadthestore_csv plugin• Configuretheplugin

$ldmsd_controller --host localhost --port 20002 --auth_file ~/.ldmsauth.conf

localhost:20002> load name=store_csv

localhost:20002> config name=store_csv path=/home/ovis_public/demo/ldmsd/data action=init buffer=0

• name:pluginname• path:Pathtothebasedirectoryforthecsvfilecontainer.Thisdirectorymustpre-exist.• action:‘init’toinitializetheplugin(otheractionswillnotbedescribedinthistutorial)• buffer:‘0’todisablebuffering• manpage:

• man/opt/ovis/share/man/man7/Plugin_store_csv.7– opensstore_csv pluginmanpages

Page 74: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

AggregatorConfigurationtostoremetricsetdatausingCSVstore

• Configuretheaggregatortostorethe“meminfo”settoacsvfile.localhost:20002> strgp_add name=meminfo_store_csvplugin=store_csv container=csv schema=meminfolocalhost:20002> strgp_start name=meminfo_store_csv

• name:storagepolicytag• plugin:storepluginusedforstoringmetricsetdata• container:thestoragebackendcontainername.Forcsv,thisisthedirectorywheretheoutputfilewillgo.Thiswillbecreated.

• schema:metricsetschematobestored

Page 75: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

PluginStatus(storeinfoonly)localhost:20002> status

Page 76: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

ExaminingtheCSVfile• Thedataissavedin:/home/ovis_public/demo/ldmsd/data/csv/meminfo

1.Checkingthecsvfile$tail–f/home/ovis_public/demo/ldmsd/data/csv/meminfo• Ifaggregatingfromothers’vm’s,seemultiplehostsintheoutput

2.Datachanges:• Runthememeater executable$./a.out

• Comparethelivememeater outputwiththetail–fvalues

Page 77: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:StoreCSV

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 78: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Startcsvstorewithaconfigurationfilewithadvancedconfigurationoptions

• Aggregatorconfigurationfileat:/home/ovis_public/demo/ldmsd/conf/agg.confloadname=store_csvconfig name=store_csvpath=/home/ovis_public/demo/ldmsd/dataaction=initbuffer=0

rollover=120rolltype=1altheader=1strgp_add name=meminfo_store_csv schema=meminfo plugin=store_csv container=csvstrgp_startname=meminfo_store_csv

• Newconfigurationoptions:• Rolloverbytimeorsize:

• rollover=120 rolltype=1 – rollsoverevery120sec.Outputfileispostpended withepochtimestamp(meminfo.12345)

• Headerinaseparatefile:• altheader=1

Page 79: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Startcsvstorewithaconfigurationfilewithadvancedconfigurationoptions

• Uncommentthelinesforstore_csv only(notstore_function_csv)• Killcurrentaggregator(notthesampler)andRestartaggregator:

ldmsd -x sock:10002 -l agg.log -p 20002

-c /home/ovis_public/demo/ldmsd/conf/agg.conf

• Notethefilerolloverandalternateheader

Page 80: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:CSVstorewithaconfigurationfileandadvancedconfigurationoptions

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 81: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LAB5: CalculatingderiveddataandsavingtoaCSVstore

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 82: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Storingdatatostorefunctioncsvfile(s)Goals:• Configureafunctioncsvstorewithldmsd_controller• Configureafunctioncsvstorewithaconfigurationfile• Functionoptions

Exampleoutput:#Time,Time_usec,DT,DT_usec,ProducerName,component_id,job_id,RAW_ACTIVE,RAW_ACTIVE.Flag,RAW_MEMTOTAL,RAW_MEMTOTAL.Flag,RATIO100,RATIO100.Flag, TimeFlag1487107627.002486,2486,0.999712,999712,ovis-demo-i03,103,0,828068,0,1884188,0,43,0,01487107628.002425,2425,0.999939,999939,ovis-demo-i03,103,0,975536,0,1884188,0,51,0,01487107629.002402,2402,0.999977,999977,ovis-demo-i03,103,0,975528,0,1884188,0,51,0,01487107630.018970,18970,1.016568,16568,ovis-demo-i03,103,0,980228,0,1884188,0,52,0,01487107631.002405,2405,0.983435,983435,ovis-demo-i03,103,0,1122996,0,1884188,0,59,0,0

Active/Memtotal ratioincreasingwhilememeater runs

Page 83: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Store_function_csv configurationfileConfigurationFileat/home/ovis_public/demo/ldmsd/conf/fct.conf#SCHEMANEW_METRICNAMEFUNCTIONN_MET<METS_CSV>SCALE|THRESHWRITEOUTmeminfo RAW_ACTIVERAW1Active11meminfo RAW_MEMTOTAL RAW1MemTotal 11meminfo RATIO100DIV_AB2RAW_ACTIVE,RAW_MEMTOTAL 100 1

• Functions:RAW(rawvalue),ScalarandVectoradd/subtract/multiply/divide,thresholdchecks,min/max

• manpage• man/opt/ovis/share/man/man7/Plugin_store_function_csv.7 – opensstore_function_csv pluginmanpages

• Chainvariablesforacomplexcomputation• V3Limitations(addressedinfutureversions):

• u64castatallsteps.Canusescaletokeepprecision.• Functionsareonlyperinstanceofametricset(e.g.,cannotcombinedatafrommeminfo andvmstat,cannotcombineinfofromdifferentcomponents)

• Outputflags:Flagforinvalidforeverycomputationandforageusec

Page 84: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

AggregatorConfigurationtostoremetricsetdatausingstore_function_csv

• Configuretheaggregatortostorederiveddatafromthe“meminfo”settoacsvfile.

$ldmsd_controller --host localhost --port 20002 --auth_file ~/.ldmsauth.conf

localhost:20002> load name=store_function_csv

localhost:20002> config name=store_function_csv

path=/home/ovis_public/demo/ldmsd/data buffer=0 ageusec=2000000

derivedconf=/home/ovis_public/demo/ldmsd/conf/fct.conf

• action:‘init’toinitializetheplugin• derived_conf: derivedconfigurationfile(cantakemultiples:csv)• ageusec: flagwhentheDTbetweendatapointsisgreaterthanthisvalue

Page 85: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

AggregatorConfigurationtostoremetricsetdatausingstore_function_csv

•Configuretheaggregatortostorederiveddatafromthe“meminfo”settoacsvfile.

localhost:20002> strgp_add name=mem_fplugin=store_function_csv container=csv_fctschema=meminfo

localhost:20002> strgp_start name=mem_f

Page 86: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

PluginStatus(storeinfoonlyshown)localhost:20002> status

Page 87: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

StoringderiveddatatoafunctionstoreCSVfile• Thedataissavedat/home/ovis_public/demo/ldmsd/data/csv_fct/meminfo

• Checkingthecsv_fct file:tail-f/home/ovis_public/demo/ldmsd/data/csv_fct/meminfo

Page 88: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:Store_function_csv

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 89: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

StoringderiveddatatoafunctionstoreCSVfileusingtheldmsd configurationfile

• Uncommentthelinesforstore_function_csv (store_csv linesarestilluncommented)

• Killcurrentaggregator(notthesampler)andRestartaggregator:ldmsd -x sock:10002 -l agg.log -p 20002

-c /home/ovis_public/demo/ldmsd/conf/agg.conf

• Checkingthecsv_fct filetail -f /home/ovis_public/demo/ldmsd/data/csv_fct/meminfo

• Runthememeater codeatsametimeasstoringdata:./a.out #thememeater executable

comparethelivememeater outputwiththetail–fvalues

Page 90: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:Store_function_csv withconfigurationfileandmemeater

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 91: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LAB6: StoringthedatainanSOSdatabase

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 92: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LDMSPluginArchitecture

Memory

LDMSAPI(libldms)

SamplerPlug-in Interface

TransportDriverInterface

MemorySampler

HSNSampler

RDMATransport

SocketTransport

LDMSDaem

on

MetricSet

MetricSet

MetricSet

MetricSet

StoragePlug-inInterface

LDMSAPI(libldms)

Storage

SOS

MySQL

CSVCSVStore

OtherStore

Page 93: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Configuretheaggregator’sSOSstoreplugin

• Steps:• Loadthestore_sos plugin• Configuretheplugin

localhost:20002> load name=store_soslocalhost:20002> config name=store_sospath=/home/ovis_public/demo/ldmsd/data/sos

• name:pluginname• path:PathtothedirectorytocontaintheSOSdatabase

Page 94: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

AddastoragepolicytosavethememinfodatatotheSOSstore

•Configuretheaggregatortostorethe“meminfo”settoaSOSdatabase.

localhost:20002> strgp_add name=meminfo_sos plugin=store_sos container=meminfo schema=meminfolocalhost:20002> strgp_start name=meminfo_sos

• name:storagepolicytag• plugin:storepluginusedforstoringmetricsetdata• container:thestoragebackendcontainername• schema:metricsetschematobestored

Page 95: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Useaconfigurationfiletoconfigurethestorageback-end• Edittheconfigurationfileat~/demo/ldmsd/conf/agg.conf

• Uncommentthestore_sos configurationlines• Killcurrentaggregator(notthesampler)• Restarttheaggregator

ldmsd -x sock:10002 -l agg.log -p 20003 \–c ~/demo/ldmsd/conf/agg.conf

Page 96: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LAB7: ExploringdatainanSOSdatabase

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 97: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:UsetheSOStoolstoexplorethedatabase• sos_cmd

• Createcontainers• Createandqueryschema• Importandquerydata

• lmq• PlotdatastoredintheSOSdatabase

• DatavisualizationonGrafana

Page 98: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Queryavailableschemasinyourdatabase

$ sos_cmd -C /home/ovis_public/demo/ldmsd/data/sos/meminfo/ -l

-attribute : MemTotaltype : UINT64idx : 5indexed : 0offset : 48

-attribute : MemFreetype : UINT64idx : 6indexed : 0offset : 56

Containernamegivenatstrgp_add

schema :name : meminfoschema_sz : 4504obj_sz : 408id : 129-attribute : timestamp

type : TIMESTAMPidx : 0indexed : 1offset : 8

Page 99: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

QuerydataintheSOSdatabase

timestamp component_id MemFree Active-------------------------------- ------------------ ------------------ ------------------

1487100290.607418 0 1636160 80120 1487100300.609416 0 1636160 80120 1487100310.611474 0 1642688 76016

. . . 1487114607.002163 103 1628516 90320 1487114608.002077 103 1628516 90320

-------------------------------- ------------------ ------------------ ------------------Records 887636/887636.

sos_cmd -C/home/ovis_public/demo/ldmsd/data/sos/meminfo \-q-Smeminfo -Xcomp_time -Vtimestamp–Vcomponent_id -VMemFree -VActive|less

-qQuerythedatabase-SSchemaname-Xindexusedtoorderdata-Vonceforcolumnintheoutput

Page 100: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

OutputthedataasaCSVfile

#timestamp,component_id,MemFree,Active1487100290.607418,0,1636160,801201487100300.609416,0,1636160,801201487100310.611474,0,1642688,76016. . . 1487114606.002196,103,1628548,903201487114607.002163,103,1628516,903201487114608.002077,103,1628516,90320#Records889483/889483.-------------------------------- ------------------ ------------------ ------------------Records 887636/887636.

sos_cmd -C/home/ovis_public/demo/ldmsd/data/sos/meminfo \-q-Smeminfo -Xcomp_time-Vtimestamp–Vcomponent_id -VMemFree -VActive -fcsv|less

-qQuerythedatabase-SSchemaname-Xindexusedtoorderdata-Vonceforcolumnintheoutput-fcsvformattheoutputasCSV

Page 101: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

OutputthedataasaJSONfile

{"data":[{"timestamp":"1487100290.607418","component_id" :"0","MemFree":"1636160","Active":"80120"},{"timestamp":"1487100300.609416","component_id" :"0","MemFree":"1636160","Active":"80120"},{"timestamp":"1487100310.611474","component_id" :"0","MemFree":"1642688","Active":"76016"},{"timestamp":"1487100320.613736","component_id" :"0","MemFree":"1641272","Active":"77292"},. . .{"timestamp":"1487114606.002196","component_id" :"103","MemFree":"1628548","Active":"90320"},{"timestamp":"1487114607.002163","component_id" :"103","MemFree":"1628516","Active":"90320"},{"timestamp":"1487114608.002077","component_id" :"103","MemFree":"1628516","Active":"90320"}],"totalRecords":890414,"recordCount":890414}

sos_cmd -C/home/ovis_public/demo/ldmsd/data/sos/meminfo \-q-Smeminfo -Xcomp_time-Vtimestamp–Vcomponent_id -VMemFree -VActive -fjson |less

-qQuerythedatabase-SSchemaname-Xindexusedtoorderdata-Vonceforcolumnintheoutput-fcsvformattheoutputasJSon

Page 102: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

LAB8: DataAnalysisandVisualizationfromanSOSdatabase

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Page 103: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

lmqLDMStooltoplottime-seriesgraphs

Page 104: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Queryrangeofdatesavailableinthedatabaselmq --path /home/ovis_public/demo/data/sos/meminfo \

--query dates --schema meminfo

There are data available from 02/13/17 14:47:44 (1487018864.002345) through 02/15/17 21:12:21 (1487214741.002282)

--path Thepathtothecontainer--query Whatisbeingqueried--schema Theschematoquery

Page 105: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:Plottime-seriesgraphofametric

$lmq --path ~/demo/ldmsd/data/sos/meminfo --query data --schema meminfo \

--metric_name MemFree --component_id 2

--path Thepathtothecontainer--query Whatisbeingqueried--schema Theschematoquery--metric_name Themetricdatatoplot--component_id Thecomponentdatatoplot

Page 106: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

lmq plotofMemFee ofcomponent2

Page 107: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

Exercise:Plotagraphshowingwindowedaverage,andrunningwindowedvariance

lmq --path~/demo/ldmsd/data/sos/meminfo --querydata--schemameminfo \--metric_name current_freemem --component_id2--bollinger

--path Thepathtothecontainer--query Whatisbeingqueried--schema Theschematoquery--metric_name Themetricdatatoplot--component_id Thecomponentdatatoplot--bollinger PlotBollingerbandsandoutliers

Page 108: LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · • Python • 2.6 with the argparsemodule • 2.7 • Swig • Doxygenfor documentation

lmq plotofMemFree ofcomponent2