LDMS Version 3 Tutorial //ovis.ca.sandia.gov/images/4/4d/LDMSV3Tutorial_SAND20175153.pdf · •...

Preview:

Citation preview

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2011-XXXXP

LDMSVersion3Tutorialhttps://github.com/ovis-hpc/ovis

JimBrandt,TomTucker,AnnGentile,Nichamon Nasksinehaboon,Narate TaeratOpenGridComputing,Inc.

SandiaNationalLaboratories04/2017

OGC | Open Grid Computing, Austin, TX

Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.

SAND2017-5153 O

Aboutthisdocument

• Thisisasub-selectionofmaterialsfromanLDMStutorial.ThefulltutorialincludesVM’swithanLDMSinstallation.TheVMisnothere,howevertherunscriptsfromtheexercisesareincluded.

• IfyouinstallLDMSonyoursystem,youcanthenusethesescriptsasmodelsandworkthroughtheexercises.

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Resources

• Documentation(Building,Using)• https://github.com/ovis-hpc/ovis/wiki

• SourceCode• https://github.com/ovis-hpc/ovis• git clonehttps://github.com/ovis-hpc/ovis.git

• Publications:• https://ovis.ca.sandia.gov

TutorialFormatOverviewoftheLightweightDistributedMetricService(LDMS)

• IntroductiontoHPCmonitoring• OverviewoftheLDMSframework

• LDMSarchitecturedescriptionSetup

• Environmentsetupdescription andverification• Introductiontosupport programsandhelperscriptsforuseinlabwork

Hands-onlabsInstructorwalkthroughandfacilitatedstudentexploration• Lab1:Samplers

• Samplerstartupandlocalandremoteverification• Lab2:Aggregators

• Aggregationstartupandverificationusingsampler• Aggregationofallotherattendees’samplers

• Lab3:Dynamicconfigurationsandresilience• Lab4:StoringdatainCSVstores• Lab5:CalculatingderiveddataandsavingtoaCSVstore• Lab6:StoringthedatainanSOSdatabase• Lab7:ExploringdatainanSOSdatabase• Lab8:Dataanalysis andVisualizationfromanSOSdatabase

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

IntroductiontoHPCMonitoring• CanonicalMonitoringGoal:Real-timetroubleshooting(e.g.,nodesdown,outofmemory,resourcecongestion)

• HPCmonitoringconcerns:• Impactonrunningapplications• Howtoaggregatedatafromdifferentsourcesforanalysis.

• Network,filesystem,CPUutilization,memoryutilization• Whatanalyseswouldbemeaningful.

• e.g.,Whatrawandderiveddatawouldindicateperformance-impactingnetworkcongestion.• Howtoprocesslargeamountsofdatainreal-time

• Asaresult,canonicalsystemmonitoring:• Typicallyperformedatintervalsofminutes• Analyseslargelyconsistsofdetectingmonitoringvaluesexceedingpre-definedthresholds

• Dataisunsuitableforgainingsignificantinsightsintoapplicationperformanceproblems

MonitoringCanEnableResource-AwareComputingLightweighthigh-frequencycontinuousrun-timemonitoring,analysis,andfeedbackcouldenable:

• Fasterproblemdetection,includingcomponent-specificissuesbasedonaparticularcomponent’sknownbehaviorsandenvironment(e.g.,thermalvariations)

• Insightintoalarge-scaleapplication’suseofresourcesunderproductionconditions,includingcontentionfromotherapplications

• Dynamicapplication-to-resourcemappingbasedonapplicationneedsandsystemstate

• Co-schedulingofapplicationsbasedoncontentionforsharedresources• Dynamicsystemoperationsbasedonadatacenter’spowerdemands,temperatureetc.

LDMSOverview• WhatistheLightweightDistributedMetricSystem(LDMS)?

• Collectnumericdata• Moveandaggregatedata• Storedata• Analyzedata

• Troubleshooting• Optimization• Informfuturedesigns

• Typicalusecasedescriptions• Supportedtechnologies

• LinuxonallbutIBMBlueGeneplatforms• Sourcesofcode,information,andsupport

LightweightDistributedMetricService(LDMS)HighLevelOverview

*Onlythecurrentdataisretainedon-node

LDMSPluginArchitecture

Memory

LDMSAPI(libldms)

SamplerPlug-in Interface

TransportDriverInterface

MemorySampler

HSNSampler

RDMATransport

SocketTransport

LDMSDaem

on

MetricSet

MetricSet

MetricSet

MetricSet

StoragePlug-inInterface

LDMSAPI(libldms)

Storage

SOS

MySQL

CSVCSVStore

OtherStore

DataFlow

Supportedplatformsandnetworks

• Platforms• Rhel 6and7• SLES11&12• Ubuntu• CrayXE6,XKandXC

• Transports• Socket• Crayugni

• Aries• Gemini

• RDMA• Infiniband• iWarp

Builddependencies

• Typicalcomputenodeenvironment• Autoconf >=2.3,automake,autotool• Libevent2-devel>=2.0.31• OpenSSH-devel

• Endusehosts(monitorcluster,specialaggregationhosts,etc.)• Python

• 2.6withtheargparsemodule• 2.7

• Swig• Doxygen fordocumentation

LDMSInstallationmethods

• Manuallyinstallusingautoconf andautomake• DeploymentusingRPMNote:Forthisdemo,LDMSispre-installedonstudentVMsin/opt/ovis.

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Gettingstarted:Loginandsetupyourenvironmentssh –Yovis_public@XXXXXXX

$ovis_public@XXXXXX's password:*******

ovis_public@ovis-demo-login ~[sshd:]

$ssh –Yovis_public@ovis-demo-01

Note:“/home/ovis_public/demo/ldmsd/env/ldms-env.sh” isusedtosetupLDMSenvironment

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

VMdirectorystructure• VMsincludesourcecode,scriptsandconfigurationfilesforeveryexercise,helpermini-applicationsforuseintheexercises,andsupportingvisualizationtools(e.g.,gnuplot).

• Directorystructure:• source-code/

• ldms/sourcecodeofLDMSlatestreleaseversion• util/utilitycodesforuseintheexamples

• data/Pre-collectednumericdataandlogmessagedata• ldms-data/ReleasednumericdatafromNCSABlueWaters

• csvAsubsetofBlueWatersdataintheCSVformat• demo/

• ldmsd/• conf/ConfigurationfilesusedintheLDMSdemo• data/Placeholdersfortheto-be-storedLDMSdata• env/Scriptstosetupenvironmentvariables• scripts/HelperscriptstodeployingLDMSdaemons

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Gettingstarted:SetupandverifyyourEnvironment• Systemenv.var.

• PATH= ${OVIS_HOME}/bin/:${OVIS_HOME}/sbin/:${PATH}• LD_LIBRARY_PATH= ${OVIS_HOME}/lib/:${LD_LIBRARY_PATH}• PYTHONPATH= ${OVIS_HOME}/lib/python2.7/site-packages/:${PYTHONPATH}

• LDMSenv.var.• ZAP_LIBPATH= ${OVIS_HOME}/lib/ovis-lib• LDMSD_PLUGIN_LIBPATH= ${OVIS_HOME}/lib/ovis-ldms

• LDMSauthentication• LDMS_AUTH_FILE= <pathtofilewithyoursharedsecret>

• Permissions600• Format:secretword=<8ormorecharacters>(e.g.secretword=mylittlesecret)

NOTE:${OVIS_HOME}=/opt/ovis inthisexample

Note:VM’snotinthereleasematerials.Additional configurationscriptsintheassociatedtarball

Testcode:memeater.c• Memeater codewhichrepeatedlyallocs mem.RunwithLDMStoseechangesinmemoryutilizationvaluesreportedin/proc/meminfo.

• Locatedat/home/ovis_public/source-code/util/memeater.c.Compilewithcc.

./a.outPeriodicallyincreasememoryallocated

Sleepbetweenalloc.Changethiswrtsampling frequency.

Sleepbeforereleasingmemory

LabExercises

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

LAB1: Samplers

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

StartandconfigureaLDMSdaemonLabGoals:• BasicLDMSdaemonstartupandconfigurationflags/args

• Manualandrun-timeconfigurationoptions• Outputoptions

• Logfilesandloglevels• Debuginformation

• manpages• man/opt/ovis/share/man/man8/ldmsd.8– opensldmsdmanpages• man/opt/ovis/share/man/man8/ldmsd_controller.8– opens“ldmsd_controller”manpages

• Useofldms_ls utilityasadiagnostictool• manpages

• man/opt/ovis/share/man/man8/ldms_ls.8– opensldms_lsmanpages

LDMSPluginArchitecture

Memory

LDMSAPI(libldms)

SamplerPlug-in Interface

TransportDriverInterface

MemorySampler

HSNSampler

RDMATransport

SocketTransport

LDMSDaem

on

MetricSet

MetricSet

MetricSet

MetricSet

StoragePlug-inInterface

LDMSAPI(libldms)

Storage

SOS

MySQL

CSVCSVStore

OtherStore

StartaLDMSdaemon

• Startldmsdldmsd –x sock:10001 –l sampled.log –S sampled.sock –r sampled.pid –p 20001

• -x: Transport: listeningport• -l: Specifythelogfilepathandname• -S: SpecifytheUnixdomainsocketforcommunicationwithldmsctl orldmsd_controller• -r: Specifywheretowritethepid file• -p: SpecifythelistenerportforremoteconfigurationNote:ThelogandUnixdomainsocketnamesarejuststrings.Weuse“samplerd”heretodenotethosebeingusedbyaldmsd thatwillberunning“samplers”asopposedtoperformingaggregation.

Checktoseeifldmsd isrunning

• Usingpsps auxw |grepldmsd |grep–vgrep

• Returnssomethinglike:“ovis_pu+35820.00.14016042204?Ssl12:510:00ldmsd -xsock:10001-Ssamplerd.sock“if running

• Returns:blanklineifnotrunning• Usingldms_lsldms_ls –hlocalhost–xsock–p10001

• Returns:“Connectionfailed/rejected.”ifldmsd specifieddoesnotexist• Returns:blanklineiftheldmsd specifiedexistsbuthasnometricsetsconfigured

Exercise:Runldmsd

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Manuallyloadandconfigureasamplerplugin

LabGoals:• Basicsamplerpluginoperation

• Manualdynamicconfigurationusingthe“ldmsd_controller”utility• Staticconfigurationusingaconfigurationfile• manpages

• man/opt/ovis/share/man/man7/Plugin_meminfo.7– opensmeminfo pluginmanpages• man/opt/ovis/share/man/man7/Plugin_vmstat.7– opensvmstat pluginmanpages

• Useofldms_ls utilityasadiagnostictool• manpages

• man/opt/ovis/share/man/man8/ldms_ls.8– opensldms_lsmanpages

ConfigureLDMSdaemonSamplerPlugin(s)

• Loadthe“meminfo”samplerplugin• Configureloaded“meminfo”samplerplugin

• Givethesetname(instance)• Givethenodename(producer)• GivethecomponentID• Plugin-specificarguments

• Startsamplerpluginwithaparticularsamplingintervalandoffset

optional

Connectldmsd_controller toanldmsd

• Setup“ldmsd_controller”connectiontotheaggregatoroversocket$ldmsd_controller --host localhost --port 20001--auth_file ~/.ldmsauth.conf

Welcome to the LDMSD control processor

localhost:20001>

Exercise:Connecttoldmsd withldmsd_controller

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

LDMSPluginArchitecture

Memory

LDMSAPI(libldms)

SamplerPlug-in Interface

TransportDriverInterface

MemorySampler

HSNSampler

RDMATransport

SocketTransport

LDMSDaem

on

MetricSet

MetricSet

MetricSet

MetricSet

StoragePlug-inInterface

LDMSAPI(libldms)

Storage

SOS

MySQL

CSVCSVStore

OtherStore

InteractiveConfigurationusingtheldmsd_controller

• Loadthe“meminfo”samplerlocalhost:20001> load name=meminfo

• Configurethe“meminfo”samplerlocalhost:20001> config name=meminfo

producer=<$HOSTNAME> instance=<$HOSTNAME>/meminfocomponent_id=<host number>

QuerycurrentsetsonanLDMSDaemonusing“ldms_ls”• Useldms_ls toquerythecurrentsetsavailableonanLDMSdaemon

$ ldms_ls –h localhost -x sock -p 10001

ovis-demo-01/meminfo

$

Getthesetinformationbeforestartingthe“meminfo”sampler

$ ldms_ls –h localhost -x sock -p 10001 –v ovis-demo-01/meminfo

ovis-demo-01/meminfo: inconsistent, last update: Wed Dec 31 18:00:00 1969 [0us]

METADATA --------

Producer Name : ovis-demo-01

Instance Name : ovis-demo-01/meminfo

Schema Name : meminfo

Size : 1904

Metric Count : 45

GN : 2

DATA ------------

Timestamp : Wed Dec 31 18:00:00 1969 [0us]

Duration : [0.000000s]

Consistent : FALSE

Size : 400

GN : 1

-----------------

Querycurrentmetricvaluesbeforestartingthe“meminfo”sampler$ldms_ls -x sock -p 10001 -l ovis-demo-01/meminfo

ovis-demo-01/meminfo: inconsistent, last update: Wed Dec 31 18:00:00 1969 [0us]

M u64 component_id 1

D u64 job_id 0

D u64 MemTotal 0

D u64 MemFree 0

D u64 MemAvailable 0

D u64 Buffers 0

D u64 Cached 0

D u64 SwapCached 0

D u64 Active 0

D u64 Inactive 0

Startthe“meminfo”sampler

• Startthe“meminfo”samplerlocalhost:20001> start name=meminfo interval=1000000 offset=0

• Thisstartsthesamplerupdatingthemetricvaluesevery1second

Getthesetinformation$ ldms_ls -x sock -p 10001 –v ovis-demo-01/meminfo

ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:46:55 2017 [3486us]

METADATA --------

Producer Name : ovis-demo-01

Instance Name : ovis-demo-01/meminfo

Schema Name : meminfo

Size : 1904

Metric Count : 45

GN : 2

DATA ------------

Timestamp : Fri Feb 10 12:46:55 2017 [3486us]

Duration : [0.000068s]

Consistent : TRUE

Size : 400

GN : 259

-----------------

Querycurrentmetricvalues

$ldms_ls -x sock -p 10001 -l ovis-demo-01/meminfo

ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:50:25 2017[4156us]

M u64 component_id 1

D u64 job_id 0

D u64 MemTotal 1884188

D u64 MemFree 828244

D u64 MemAvailable 1639232

D u64 Buffers 948

D u64 Cached 915992

D u64 SwapCached 0

D u64 Active 84336

D u64 Inactive 891196

Checksourceforreference$cat/proc/meminfoMemTotal:1884188kBMemFree:828420kBMemAvailable:1639912kBBuffers:948kBCached:916396kBSwapCached:0kBActive:85144kBInactive:890212kBActive(anon): 58272kBInactive(anon):8372kBActive(file): 26872kBInactive(file):881840kB

Exercise:Manualsamplerconfiguration

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

• Killallofyourldmsd inpreparationforthenextsection$pkill ldmsd

• Killaparticularldmsd• ps auxw |grepldmsd |grep–vgrepovis_pu+35820.00.14016042204?Ssl 12:510:00ldmsd -xsock:10001-Ssamplerd.sock• kill3582

• Checktomakesureitisdead$ps auxw |grepldmsd |grep–vgrep

Startldmsd andsamplerpluginusingaconfigurationfile• ldmsd canbestartedusingaconfigurationfile

• Syntaxisidenticaltothatusedformanualconfiguration• CanbeusedtorunandconfigureBOTHsamplerandaggregatorldmsd

• Sampleconfigurationfileformeminfo example:$cat /home/ovis_public/demo/ldmsd/conf/simple_sampler.conf

load name=meminfo

config name=meminfo producer=<$HOSTNAME> instance=<$HOSTNAME>/meminfocomponent_id=<host number>

start name=meminfo interval=1000000

• Runldmsd usingthisconfigurationfile$ldmsd -x sock:10001 -l samplerd.log -S samplerd.sock –c /home/ovis_public/demo/ldmsd/conf/simple_sampler.conf

Querycurrentmetricvalues

$ldms_ls -x sock -p 10001 -l ovis-demo-01/meminfo

ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:50:25 2017 [4156us]

M u64 component_id 1

D u64 job_id 0

D u64 MemTotal 1884188

D u64 MemFree 828244

D u64 MemAvailable 1639232

D u64 Buffers 948

D u64 Cached 915992

D u64 SwapCached 0

D u64 Active 84336

D u64 Inactive 891196

Exercise:Staticsamplerconfiguration

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

ConfigurationToolsSummary

Dynamic/manualconfiguration(remoteorlocal)• ldmsd_controller – Pythonscriptthatcanconnecttoaldmsd viaaconfigurednetworksocketor alocalUnixDomainSocket

Staticconfiguration(local)• Configurationfile– loadedatldmsd runtime

Configurationoptionandtool.

• CMDlineconfiguration–c• ldmsctl

• CinterfacetoconfigureLDMSD.• Onlyforsamplerdaemon

• ldmsd_controller• PythoninterfacetoconfigureLDMSD.• ConnecttoanLDMSDusingUNIXdomainsocket(local)orsocket(remote).• Auto-completion• Commandhelp

• Moredetailscanbefoundathttps://www.opengridcomputing.com/wordpress/index.php/ovis-3-3-user-guide/#ldmsd-config

Startldmsd_controller

• ConnectwithUNIXdomainsocketldmsd_controller --sockname samplerd.sock

• Connectwithsocketldmsd_controller --host localhost --port 20001 --auth_file ~/.ldmsauth.conf

ldmsd_controller:Getcommandlist

samplerd.sock> help

Documented commands (type help <topic>):

========================================

EOF prdcr_del stop udata version

add prdcr_start store udata_regex

config prdcr_start_regex strgp_add updtr_add

env prdcr_stop strgp_del updtr_del

help prdcr_stop_regex strgp_metric_add updtr_match_add

include quit strgp_metric_del updtr_match_del

info say strgp_prdcr_add updtr_prdcr_add

load shell strgp_prdcr_del updtr_prdcr_del

loglevel source strgp_start updtr_start

logrotate standby strgp_stop updtr_stop

prdcr_add start term usage

DefinitelyuseforsamplerdDefinitelyuseforaggregatorsUsetoloadandconfigpluginGethelpanddaemonstatus

ldmsd_controller:commandhelp

samplerd.sock> help prdcr_add

Add an LDMS Producer to the Aggregator

Parameters:

name= A unique name for this Producer

xprt= The transport name [sock, rdma, ugni]

host= The hostname of the host

port= The port number on which the LDMS is listening

type= The connection type [active, passive]

interval= The connection retry interval (us)

LAB2: Aggregators

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

LDMSPluginArchitecture

Memory

LDMSAPI(libldms)

SamplerPlug-in Interface

TransportDriverInterface

MemorySampler

HSNSampler

RDMATransport

SocketTransport

LDMSDaem

on

MetricSet

MetricSet

MetricSet

MetricSet

StoragePlug-inInterface

LDMSAPI(libldms)

Storage

SOS

MySQL

CSVCSVStore

OtherStore

ConfigureaLDMSdaemon(ldmsd)toAggregatemetricset(s)

Goals:• Addlistofconnectionstosamplerldmsd’s• Starttheconnections• CreateanUpdatepolicy

• Howoftentogetametricset’supdate• Fromwhichsamplerldmsd’s toaggregate

• StarttheUpdatepolicy

Startanldmsd thatwillbeusedforaggregation

• StartLDMSD

ldmsd -x sock:10002 –m 10M -l aggd.log –S aggd.sock –p 20002

• -x: transport:listenerport• -m: Allocatesetmemoryforaggregatedmetricsets(default:512K)• -l: Specifythelogfilepath• -S: Specify“UnixDomainSocket”nameusedforlocalconfiguration• -p: Specifythelistenerportforremoteconfiguration

Interactiveaggregatorconfiguration

• Setup“ldmsd_controller”connectiontotheaggregatoroversocket$ldmsd_controller --host localhost --port 20002--auth_file ~/.ldmsauth.conf

Welcome to the LDMSD control processor

localhost:20002>

SimpleAggregatorConfiguration

• Configuretheaggregatortoaggregatethe“meminfo”setfromthesamplerdaemonlocalhost:20002> prdcr_add name=bar host=$HOSTNAME port=10001 xprt=sock type=active interval=20000000

localhost:20002> prdcr_start name=bar

• name:policytag• host:hostnameofthesamplerdaemon• port:Listenerportofthesamplerdaemon• xprt:Transportthesamplerdaemonlistenson• type:Always“active”• interval:Re-connectinterval

Pluginstatus(onagg afterstartedprdcr butbeforeupdtr)localhost:20002> status

Querycurrentmetricvaluesontheaggregator

$ldms_ls –h localhost -x sock -p 10002 -l

ovis-demo-01/meminfo: inconsistent, last update: Wed Dec 31 18:00:00 1969 [0us]

M u64 component_id 1

D u64 job_id 0

D u64 MemTotal 0

D u64 MemFree 0

D u64 MemAvailable 0

D u64 Buffers 0

D u64 Cached 0

D u64 SwapCached 0

D u64 Active 0

D u64 Inactive 0

SimpleAggregatorConfiguration

• Configuretheaggregatortoupdate the“meminfo”setlocalhost:20002> updtr_add name=foo interval=1000000 offset=200000

localhost:20002> updtr_prdcr_add name=foo regex=.*

localhost:20002> updtr_start name=foo

• name: policy tag

• interval: update interval (in usec)• Example: interval=1000000 means aggregate every 1 seconds

• offset: Target (in us) from <epoc sec>.000000

• Example: offset=10000 means aggregate every <interval> seconds at 10ms into the second.

• regex: regular expression to match the target producers tag(s)

Pluginstatus(onaggregatorafterstartedprdcr andupdtr)

localhost:20002> status

Querycurrentmetricvaluesontheaggregator

$ldms_ls -h locahost -x sock -p 10002 -l ovis-demo-01/meminfo

ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:50:25 2017 [4156us]

M u64 component_id 1

D u64 job_id 0

D u64 MemTotal 1884188

D u64 MemFree 828244

D u64 MemAvailable 1639232

D u64 Buffers 948

D u64 Cached 915992

D u64 SwapCached 0

D u64 Active 84336

D u64 Inactive 891196

Exercise:Validatemanualconfigurationandaggregationfromsampler

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Startldmsd andaggregationusingaconfigurationfile• ldmsd canbestartedusingaconfigurationfile

• Syntaxisidenticaltothatusedformanualconfiguration• CanbeusedtorunandconfigureBOTHsamplerandaggregatorldmsd

• Sampleconfigurationfileformeminfo example:$cat /home/ovis_public/demo/ldmsd/conf/simple_aggregator.conf

prdcr_add name=localhost host=$HOSTNAME port=10001 xprt=sock type=active interval=20000000

prdcr_start name=localhost

updtr_add name=foo interval=1000000 offset=200000

updtr_prdcr_add name=foo regex=.*

updtr_start name=foo

• Runldmsd usingthisconfigurationfile$ldmsd -x sock:10002 -l aggd.log -S aggd.sock –c /home/ovis_public/demo/ldmsd/conf/simple_aggregator.conf

Querycurrentmetricvalues

$ldms_ls -x sock -p 10002 -l ovis-demo-01/meminfo

ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:50:25 2017 [4156us]

M u64 component_id 1

D u64 job_id 0

D u64 MemTotal 1884188

D u64 MemFree 828244

D u64 MemAvailable 1639232

D u64 Buffers 948

D u64 Cached 915992

D u64 SwapCached 0

D u64 Active 84336

D u64 Inactive 891196

Exercise:Validatestaticaggregatorconfigurationandaggregationfromsampler

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

AggregatefromstudentVMs

• Killaggregatorldmsd• Restartldmsd using“-cstudents_all_aggregator.conf”• Killaggregatorldmsd• Restartldmsd using“-cstudents_subset_aggregator.conf”

Pluginstatus(onaggregator fromallstudents)

localhost:20002> status

Exercise:Validatestaticaggregatorconfigurationandaggregationfromsampler

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

LAB3: DynamicChangesandResilience

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

DynamicConfigurationChanges

• Dynamicconfiguration• Samplerdaemons

• stopsamplerplugins• startwithdifferentintervals

• Aggregatordaemons• stopprdcr/updtr/strgp• removeprdcr/updtr/strgp• changeinterval

DynamicChangesandRobustness• On-the-flyadditionsofsamplerswillbediscoveredbytheaggregatingldmsd

• Exercise– onestudentwilladdthevmstat samplervialdmsd_controllertohisrunningldmsd.Allotherswillseeitappearintheiraggregatorswhicharecollectingfromthatsampler.

• Exercise – onestudentwillstophismeminfo samplervialdmsd_controller inhisrunningldmsd.Allotherswillseeinldms_ls timestampoutputthatthatstudent’smetricsetceasestoupdate.

• Exercise – thesamestudentwillrestarthismeminfo samplervialdmsd_controller inhisrunningldmsd.Allotherswillseeinldms_ls timestampoutputthatthatstudent’smetricsetresumesupdating.

• SamplersandAggregatorscanbestartedinanyorder• LDMScollectionandtransporttopologiesarerobusttoSamplersandAggregatorsbeingkilledandrestarted

• Exercise – onestudentwillkillhisldmsd sampler.Allotherstudentswillseeinldms_ls timestampoutputthatthatstudent’smetricsetceasestoupdate

• Exercise – thesamestudentwillrestarthisldmsd sampler.Allotherstudentswillseeinldms_ls timestampoutputthatthatstudent’smetricsetresumesupdating.

LAB4: StoringdatainCSVstores

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

LDMSPluginArchitecture

Memory

LDMSAPI(libldms)

SamplerPlug-in Interface

TransportDriverInterface

MemorySampler

HSNSampler

RDMATransport

SocketTransport

LDMSDaem

on

MetricSet

MetricSet

MetricSet

MetricSet

StoragePlug-inInterface

LDMSAPI(libldms)

Storage

SOS

MySQL

CSVCSVStore

OtherStore

Storingdatatocsvfile(s)• Goals:

• Configureacsvstorewithldmsd_controller• Configureacsvstorewithconfiguration file• Storeoptions

• Exampleoutput:#Time,Time_usec,ProducerName,component_id,job_id,MemTotal,MemFree,MemAvailable,Buffers,Cached,SwapCached,Active,Inactive,Active(anon),Inactive(anon),Active(file),Inactive(file),Unevictable,Mlocked,SwapTotal,SwapFree,Dirty,Writeback,AnonPages,Mapped,Shmem,Slab,SReclaimable,SUnreclaim,KernelStack,PageTables,NFS_Unstable,Bounce,WritebackTmp,CommitLimit,Committed_AS,VmallocTotal,VmallocUsed,VmallocChunk,HardwareCorrupted,AnonHugePages,HugePages_Total,HugePages_Free,HugePages_Rsvd,HugePages_Surp,Hugepagesize,DirectMap4k,DirectMap2M1487105964.002482,2482,ovis-demo-09,9,0,1884188,571028,1688632,0,1212004,6108,104536,1122496,8276,8580,96260,1113916,0,0,839676,793956,420,0,10552,24812,1796,52124,40104,12020,1792,3280,0,0,0,1781768,387984,34359738367,7216,34359728128,0,2048,0,0,0,0,2048,47040,20500481487105963.002583,2583,ovis-demo-02,2,0,1884188,1665280,1671132,948,107512,0,71540,80920,44128,8308,27412,72612,0,0,839676,839676,0,0,44000,22264,8436,35680,24304,11376,1600,2940,0,0,0,1781768,296444,34359738367,7216,34359728128,0,6144,0,0,0,0,2048,34752,20623361487105963.001964,1964,ovis-demo-08,8,0,1884188,1623168,1644996,948,129700,0,89312,101956,60788,8332,28524,93624,0,0,839676,839676,0,0,60620,23912,8500,36456,24608,11848,1872,4364,0,0,0,1781768,403252,34359738367,7216,34359728128,0,16384,0,0,0,0,2048,44992,2052096

AggregatorConfigurationtostoremetricsetdatausingCSVstore

• Configuretheaggregatortostorethe“meminfo”settoacsvfile usingldmsd_controller

• Loadthestore_csv plugin• Configuretheplugin

$ldmsd_controller --host localhost --port 20002 --auth_file ~/.ldmsauth.conf

localhost:20002> load name=store_csv

localhost:20002> config name=store_csv path=/home/ovis_public/demo/ldmsd/data action=init buffer=0

• name:pluginname• path:Pathtothebasedirectoryforthecsvfilecontainer.Thisdirectorymustpre-exist.• action:‘init’toinitializetheplugin(otheractionswillnotbedescribedinthistutorial)• buffer:‘0’todisablebuffering• manpage:

• man/opt/ovis/share/man/man7/Plugin_store_csv.7– opensstore_csv pluginmanpages

AggregatorConfigurationtostoremetricsetdatausingCSVstore

• Configuretheaggregatortostorethe“meminfo”settoacsvfile.localhost:20002> strgp_add name=meminfo_store_csvplugin=store_csv container=csv schema=meminfolocalhost:20002> strgp_start name=meminfo_store_csv

• name:storagepolicytag• plugin:storepluginusedforstoringmetricsetdata• container:thestoragebackendcontainername.Forcsv,thisisthedirectorywheretheoutputfilewillgo.Thiswillbecreated.

• schema:metricsetschematobestored

PluginStatus(storeinfoonly)localhost:20002> status

ExaminingtheCSVfile• Thedataissavedin:/home/ovis_public/demo/ldmsd/data/csv/meminfo

1.Checkingthecsvfile$tail–f/home/ovis_public/demo/ldmsd/data/csv/meminfo• Ifaggregatingfromothers’vm’s,seemultiplehostsintheoutput

2.Datachanges:• Runthememeater executable$./a.out

• Comparethelivememeater outputwiththetail–fvalues

Exercise:StoreCSV

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Startcsvstorewithaconfigurationfilewithadvancedconfigurationoptions

• Aggregatorconfigurationfileat:/home/ovis_public/demo/ldmsd/conf/agg.confloadname=store_csvconfig name=store_csvpath=/home/ovis_public/demo/ldmsd/dataaction=initbuffer=0

rollover=120rolltype=1altheader=1strgp_add name=meminfo_store_csv schema=meminfo plugin=store_csv container=csvstrgp_startname=meminfo_store_csv

• Newconfigurationoptions:• Rolloverbytimeorsize:

• rollover=120 rolltype=1 – rollsoverevery120sec.Outputfileispostpended withepochtimestamp(meminfo.12345)

• Headerinaseparatefile:• altheader=1

Startcsvstorewithaconfigurationfilewithadvancedconfigurationoptions

• Uncommentthelinesforstore_csv only(notstore_function_csv)• Killcurrentaggregator(notthesampler)andRestartaggregator:

ldmsd -x sock:10002 -l agg.log -p 20002

-c /home/ovis_public/demo/ldmsd/conf/agg.conf

• Notethefilerolloverandalternateheader

Exercise:CSVstorewithaconfigurationfileandadvancedconfigurationoptions

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

LAB5: CalculatingderiveddataandsavingtoaCSVstore

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Storingdatatostorefunctioncsvfile(s)Goals:• Configureafunctioncsvstorewithldmsd_controller• Configureafunctioncsvstorewithaconfigurationfile• Functionoptions

Exampleoutput:#Time,Time_usec,DT,DT_usec,ProducerName,component_id,job_id,RAW_ACTIVE,RAW_ACTIVE.Flag,RAW_MEMTOTAL,RAW_MEMTOTAL.Flag,RATIO100,RATIO100.Flag, TimeFlag1487107627.002486,2486,0.999712,999712,ovis-demo-i03,103,0,828068,0,1884188,0,43,0,01487107628.002425,2425,0.999939,999939,ovis-demo-i03,103,0,975536,0,1884188,0,51,0,01487107629.002402,2402,0.999977,999977,ovis-demo-i03,103,0,975528,0,1884188,0,51,0,01487107630.018970,18970,1.016568,16568,ovis-demo-i03,103,0,980228,0,1884188,0,52,0,01487107631.002405,2405,0.983435,983435,ovis-demo-i03,103,0,1122996,0,1884188,0,59,0,0

Active/Memtotal ratioincreasingwhilememeater runs

Store_function_csv configurationfileConfigurationFileat/home/ovis_public/demo/ldmsd/conf/fct.conf#SCHEMANEW_METRICNAMEFUNCTIONN_MET<METS_CSV>SCALE|THRESHWRITEOUTmeminfo RAW_ACTIVERAW1Active11meminfo RAW_MEMTOTAL RAW1MemTotal 11meminfo RATIO100DIV_AB2RAW_ACTIVE,RAW_MEMTOTAL 100 1

• Functions:RAW(rawvalue),ScalarandVectoradd/subtract/multiply/divide,thresholdchecks,min/max

• manpage• man/opt/ovis/share/man/man7/Plugin_store_function_csv.7 – opensstore_function_csv pluginmanpages

• Chainvariablesforacomplexcomputation• V3Limitations(addressedinfutureversions):

• u64castatallsteps.Canusescaletokeepprecision.• Functionsareonlyperinstanceofametricset(e.g.,cannotcombinedatafrommeminfo andvmstat,cannotcombineinfofromdifferentcomponents)

• Outputflags:Flagforinvalidforeverycomputationandforageusec

AggregatorConfigurationtostoremetricsetdatausingstore_function_csv

• Configuretheaggregatortostorederiveddatafromthe“meminfo”settoacsvfile.

$ldmsd_controller --host localhost --port 20002 --auth_file ~/.ldmsauth.conf

localhost:20002> load name=store_function_csv

localhost:20002> config name=store_function_csv

path=/home/ovis_public/demo/ldmsd/data buffer=0 ageusec=2000000

derivedconf=/home/ovis_public/demo/ldmsd/conf/fct.conf

• action:‘init’toinitializetheplugin• derived_conf: derivedconfigurationfile(cantakemultiples:csv)• ageusec: flagwhentheDTbetweendatapointsisgreaterthanthisvalue

AggregatorConfigurationtostoremetricsetdatausingstore_function_csv

•Configuretheaggregatortostorederiveddatafromthe“meminfo”settoacsvfile.

localhost:20002> strgp_add name=mem_fplugin=store_function_csv container=csv_fctschema=meminfo

localhost:20002> strgp_start name=mem_f

PluginStatus(storeinfoonlyshown)localhost:20002> status

StoringderiveddatatoafunctionstoreCSVfile• Thedataissavedat/home/ovis_public/demo/ldmsd/data/csv_fct/meminfo

• Checkingthecsv_fct file:tail-f/home/ovis_public/demo/ldmsd/data/csv_fct/meminfo

Exercise:Store_function_csv

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

StoringderiveddatatoafunctionstoreCSVfileusingtheldmsd configurationfile

• Uncommentthelinesforstore_function_csv (store_csv linesarestilluncommented)

• Killcurrentaggregator(notthesampler)andRestartaggregator:ldmsd -x sock:10002 -l agg.log -p 20002

-c /home/ovis_public/demo/ldmsd/conf/agg.conf

• Checkingthecsv_fct filetail -f /home/ovis_public/demo/ldmsd/data/csv_fct/meminfo

• Runthememeater codeatsametimeasstoringdata:./a.out #thememeater executable

comparethelivememeater outputwiththetail–fvalues

Exercise:Store_function_csv withconfigurationfileandmemeater

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

LAB6: StoringthedatainanSOSdatabase

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

LDMSPluginArchitecture

Memory

LDMSAPI(libldms)

SamplerPlug-in Interface

TransportDriverInterface

MemorySampler

HSNSampler

RDMATransport

SocketTransport

LDMSDaem

on

MetricSet

MetricSet

MetricSet

MetricSet

StoragePlug-inInterface

LDMSAPI(libldms)

Storage

SOS

MySQL

CSVCSVStore

OtherStore

Configuretheaggregator’sSOSstoreplugin

• Steps:• Loadthestore_sos plugin• Configuretheplugin

localhost:20002> load name=store_soslocalhost:20002> config name=store_sospath=/home/ovis_public/demo/ldmsd/data/sos

• name:pluginname• path:PathtothedirectorytocontaintheSOSdatabase

AddastoragepolicytosavethememinfodatatotheSOSstore

•Configuretheaggregatortostorethe“meminfo”settoaSOSdatabase.

localhost:20002> strgp_add name=meminfo_sos plugin=store_sos container=meminfo schema=meminfolocalhost:20002> strgp_start name=meminfo_sos

• name:storagepolicytag• plugin:storepluginusedforstoringmetricsetdata• container:thestoragebackendcontainername• schema:metricsetschematobestored

Useaconfigurationfiletoconfigurethestorageback-end• Edittheconfigurationfileat~/demo/ldmsd/conf/agg.conf

• Uncommentthestore_sos configurationlines• Killcurrentaggregator(notthesampler)• Restarttheaggregator

ldmsd -x sock:10002 -l agg.log -p 20003 \–c ~/demo/ldmsd/conf/agg.conf

LAB7: ExploringdatainanSOSdatabase

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

Exercise:UsetheSOStoolstoexplorethedatabase• sos_cmd

• Createcontainers• Createandqueryschema• Importandquerydata

• lmq• PlotdatastoredintheSOSdatabase

• DatavisualizationonGrafana

Queryavailableschemasinyourdatabase

$ sos_cmd -C /home/ovis_public/demo/ldmsd/data/sos/meminfo/ -l

-attribute : MemTotaltype : UINT64idx : 5indexed : 0offset : 48

-attribute : MemFreetype : UINT64idx : 6indexed : 0offset : 56

Containernamegivenatstrgp_add

schema :name : meminfoschema_sz : 4504obj_sz : 408id : 129-attribute : timestamp

type : TIMESTAMPidx : 0indexed : 1offset : 8

QuerydataintheSOSdatabase

timestamp component_id MemFree Active-------------------------------- ------------------ ------------------ ------------------

1487100290.607418 0 1636160 80120 1487100300.609416 0 1636160 80120 1487100310.611474 0 1642688 76016

. . . 1487114607.002163 103 1628516 90320 1487114608.002077 103 1628516 90320

-------------------------------- ------------------ ------------------ ------------------Records 887636/887636.

sos_cmd -C/home/ovis_public/demo/ldmsd/data/sos/meminfo \-q-Smeminfo -Xcomp_time -Vtimestamp–Vcomponent_id -VMemFree -VActive|less

-qQuerythedatabase-SSchemaname-Xindexusedtoorderdata-Vonceforcolumnintheoutput

OutputthedataasaCSVfile

#timestamp,component_id,MemFree,Active1487100290.607418,0,1636160,801201487100300.609416,0,1636160,801201487100310.611474,0,1642688,76016. . . 1487114606.002196,103,1628548,903201487114607.002163,103,1628516,903201487114608.002077,103,1628516,90320#Records889483/889483.-------------------------------- ------------------ ------------------ ------------------Records 887636/887636.

sos_cmd -C/home/ovis_public/demo/ldmsd/data/sos/meminfo \-q-Smeminfo -Xcomp_time-Vtimestamp–Vcomponent_id -VMemFree -VActive -fcsv|less

-qQuerythedatabase-SSchemaname-Xindexusedtoorderdata-Vonceforcolumnintheoutput-fcsvformattheoutputasCSV

OutputthedataasaJSONfile

{"data":[{"timestamp":"1487100290.607418","component_id" :"0","MemFree":"1636160","Active":"80120"},{"timestamp":"1487100300.609416","component_id" :"0","MemFree":"1636160","Active":"80120"},{"timestamp":"1487100310.611474","component_id" :"0","MemFree":"1642688","Active":"76016"},{"timestamp":"1487100320.613736","component_id" :"0","MemFree":"1641272","Active":"77292"},. . .{"timestamp":"1487114606.002196","component_id" :"103","MemFree":"1628548","Active":"90320"},{"timestamp":"1487114607.002163","component_id" :"103","MemFree":"1628516","Active":"90320"},{"timestamp":"1487114608.002077","component_id" :"103","MemFree":"1628516","Active":"90320"}],"totalRecords":890414,"recordCount":890414}

sos_cmd -C/home/ovis_public/demo/ldmsd/data/sos/meminfo \-q-Smeminfo -Xcomp_time-Vtimestamp–Vcomponent_id -VMemFree -VActive -fjson |less

-qQuerythedatabase-SSchemaname-Xindexusedtoorderdata-Vonceforcolumnintheoutput-fcsvformattheoutputasJSon

LAB8: DataAnalysisandVisualizationfromanSOSdatabase

Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball

lmqLDMStooltoplottime-seriesgraphs

Queryrangeofdatesavailableinthedatabaselmq --path /home/ovis_public/demo/data/sos/meminfo \

--query dates --schema meminfo

There are data available from 02/13/17 14:47:44 (1487018864.002345) through 02/15/17 21:12:21 (1487214741.002282)

--path Thepathtothecontainer--query Whatisbeingqueried--schema Theschematoquery

Exercise:Plottime-seriesgraphofametric

$lmq --path ~/demo/ldmsd/data/sos/meminfo --query data --schema meminfo \

--metric_name MemFree --component_id 2

--path Thepathtothecontainer--query Whatisbeingqueried--schema Theschematoquery--metric_name Themetricdatatoplot--component_id Thecomponentdatatoplot

lmq plotofMemFee ofcomponent2

Exercise:Plotagraphshowingwindowedaverage,andrunningwindowedvariance

lmq --path~/demo/ldmsd/data/sos/meminfo --querydata--schemameminfo \--metric_name current_freemem --component_id2--bollinger

--path Thepathtothecontainer--query Whatisbeingqueried--schema Theschematoquery--metric_name Themetricdatatoplot--component_id Thecomponentdatatoplot--bollinger PlotBollingerbandsandoutliers

lmq plotofMemFree ofcomponent2

Recommended