40
MALT : MALloc Tracker A memory profiling tool 3/02/2019 MALT, Sébastien Valat 1

MALT : MALloc Tracker - FOSDEM

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MALT : MALloc Tracker - FOSDEM

MALT : MALloc Tracker

A memory profiling tool

3/02/2019 MALT, Sébastien Valat 1

Page 2: MALT : MALloc Tracker - FOSDEM

Questions

• We have good profiling tool for timings(eg. Valgrind or vtune)

• But for what memory profiling?

• Memory can be an issue :– Availability of the resource

– Performance

• Three main questions :– How to reduce memory footprint ?

– How to improve overhead of memory management ?

– How to improve memory usage ?

23/02/2019 MALT, Sébastien Valat

Page 3: MALT : MALloc Tracker - FOSDEM

Some issue examples

• I wanted to point :– Where memory is allocated.

– Properties of allocated chunks.

– Bad allocation patterns for performance.

3

__thread Int gblVar[SIZE];int * func(int size){

child_func_with_allocs();void * ptr = new char[size];double* ret = new double[size*size*size];for (auto it : iter_Items){

double* buffer = new double[size];//short and quick do stuffdelete [] buffer;

}return ret;

}

Indirect allocations

Leak

Short life allocations

Might lead to swap for large size

Global variables and TLS

MALT, Sébastien Valat3/02/2019

C++11 auto induced allocs

Page 4: MALT : MALloc Tracker - FOSDEM

What I want to provide

• Same approach than valgrind/kcachgind

• Mapped allocations on sources lines and call stacks

• Using a web-based GUI– I started with kcachgrind

– But wanted more flexibility and time charts

43/02/2019 MALT, Sébastien Valat

Page 5: MALT : MALloc Tracker - FOSDEM

How it works

• Use LD_PRELOAD to intercept malloc/free/… as Google heap profiler

• Map allocations on call stacks

• Build & consolidate summary metrics

• Generate JSON output file

53/02/2019 MALT, Sébastien Valat

Page 6: MALT : MALloc Tracker - FOSDEM

Source annotations

6

Metric selectorInclusive/Exclusive

Symbols Details of symbol or line

Call stacks reaching the selected

site.

Per line annotation

3/02/2019 MALT, Sébastien Valat

Web technology (NodeJS, D3JS, Jquery, AngularS)

Page 7: MALT : MALloc Tracker - FOSDEM

Call tree view

3/02/2019 MALT, Sébastien Valat 7

Page 8: MALT : MALloc Tracker - FOSDEM

Per thread statistics

3/02/2019 MALT, Sébastien Valat 8

Page 9: MALT : MALloc Tracker - FOSDEM

Fragmentation issue

• Memory consumption over time– Physical

– Virtual

– Requested (malloced)

93/02/2019 MALT, Sébastien Valat

Page 10: MALT : MALloc Tracker - FOSDEM

Dynamics

3/02/2019 MALT, Sébastien Valat 10

Page 11: MALT : MALloc Tracker - FOSDEM

Example on AVBP init phase

• Issue with reallocation on init

• Detected with allocation rate & cumulated allocatated mem.

11

Time

3/02/2019 MALT, Sébastien Valat

Page 12: MALT : MALloc Tracker - FOSDEM

Usage

• Optionally recompile with debug flags :

• Run

• Use the web view && http://localhost:8080:

• In case there is a QT wrapper embedding NodeJS + Webkit

123/02/2019 MALT, Sébastien Valat

gcc -g …

malt [--config=file.ini] YOUR_PRGM [OPTIONS]

malt-webview -i malt-{YOUR_PRGM}-{PID}.json

malt-qt -i malt-{YOUR_PRGM}-{PID}.json

Page 13: MALT : MALloc Tracker - FOSDEM

Status

• Open sourced since one year on https://github.com/memtt

• Co-hosted with a similar tool : NUMAPROF for Non Uniform Memory Access profiling.

• My research on memory management for HPC : http://svalat.github.io/

3/02/2019 MALT, Sébastien Valat 13

Page 14: MALT : MALloc Tracker - FOSDEM

QUESTIONS ?

Thank you.

3/02/2019 MALT, Sébastien Valat 14

Page 15: MALT : MALloc Tracker - FOSDEM

BACKUP

3/02/2019 MALT, Sébastien Valat 15

Page 16: MALT : MALloc Tracker - FOSDEM

Possibly huge impact

• Memory managementcan have huge impact on performance

• Extreme case on a 1.5 million C++ lines HPC simulation app. on a 16 processors server

• Can see 10-15%improvement on MySQLby changing allocator

3/02/2019 MALT, Sébastien Valat 16

0

50

100

150

200

250

300

350

400

450

500

Execution time (s)

User System Idle

4x

Page 17: MALT : MALloc Tracker - FOSDEM

Output, first idea, kcachegrind

Callgrind compatibiltiy

• Can use kcachgrind

• Might be usefull for some users, cannot provide all metrics.

173/02/2019 MALT, Sébastien Valat

Page 18: MALT : MALloc Tracker - FOSDEM

What is missing to kcachegrind

• Started with kcacegrind GUI…. But …

• Display human readable units– You prefer 15728640 or 15 MB ?

– I want to compare to what I expect.

• Cannot handle non sum cumulative metrics– Inclusive costs only rely on + operator

– Some mem. metrics requires max/min (eg. lifetime)

• No way to express time charts

• No way to express parameter distributions (eg. sizes).3/02/2019 MALT, Sébastien Valat 18

Page 19: MALT : MALloc Tracker - FOSDEM

Ideas of improvement

• Add NUMA statistics

• Provide virtual/physical ratio

• Estimate page fault costs

• Exploit traces in GUI for deeper analysis– Alive allocations at a certain time

– Fragmentation analysis

– Time charts from call sites

– Usage over threads for call sites

3/02/2019 MALT, Sébastien Valat 19

Page 20: MALT : MALloc Tracker - FOSDEM

Global summary

• Show global program statistics

203/02/2019 MALT, Sébastien Valat

Page 21: MALT : MALloc Tracker - FOSDEM

Temporal metrics

21

Profile over time :

▪ Allocation rate

▪ Physical / Virtual / Requested memory

▪ Stack size for each thread (require function instrumentation)

Example on YALES2 with gfortran :

3/02/2019 MALT, Sébastien Valat

Page 22: MALT : MALloc Tracker - FOSDEM

Chunk size distribution

22

Many really small allocations

Example from YALES2 with gfortran issue

3/02/2019 MALT, Sébastien Valat

Page 23: MALT : MALloc Tracker - FOSDEM

EXISTING TOOLS

233/02/2019 MALT, Sébastien Valat

Page 24: MALT : MALloc Tracker - FOSDEM

Existing tools

• Valgrind (massif)– Memory over time (snapshots) & functions

– Memory per function at peak

– Has a simple GUI

• Valgrind (memchek)– Leaks

– No real GUI

• Google heap profiler (tcmalloc)– Memory over time (snapshots)

– Faster then valgrind

– No GUI

3/02/2019 MALT, Sébastien Valat 24

Page 25: MALT : MALloc Tracker - FOSDEM

Existing tools / Google heap profiler

• Google heap profiler (tcmalloc):

– Small overhead.

– Similar metric than massif

– Only provide snapshots of allocated memory per stacks.

– Peak might not be captured.

– Lack of a real GUI to use it.

253/02/2019 MALT, Sébastien Valat

% pprof gfs_master profile.0100.heap255.6 24.7% 24.7% 255.6 24.7% GFS_MasterChunk::AddServer184.6 17.8% 42.5% 298.8 28.8% GFS_MasterChunkTable::Create176.2 17.0% 59.5% 729.9 70.5% GFS_MasterChunkTable::UpdateState169.8 16.4% 75.9% 169.8 16.4% PendingClone::PendingClone76.3 7.4% 83.3% 76.3 7.4% __default_alloc_template::_S_chunk_alloc49.5 4.8% 88.0% 49.5 4.8% hashtable::resize

Page 26: MALT : MALloc Tracker - FOSDEM

Existing tools

• TAU memory profiler– Provide profiles

– Follow stacks

– Track leaks

– Parallel, done for HPC/MPI

– Lack easy matching with sources

• FOM

3/02/2019 MALT, Sébastien Valat 26

Page 27: MALT : MALloc Tracker - FOSDEM

Existing tools / Commercials

• IBM Purify++ / Parasoft Insure++

– Commercial

– Leak detection, access checking, memory debugging tools.

– Use binary or source instrumentation.

– Windows / Redhat

• Visual Studio Ultimate Edition Memory profiler– Nice but windows only and commercial

273/02/2019 MALT, Sébastien Valat

Page 28: MALT : MALloc Tracker - FOSDEM

Stack tracking

• Two approach implemented : backtrace and instrumentation

• Backtrace (default) :– Work out of the box

– Manage all dynamic libraries

– Slow for large number of calls (~>10M)

• Instrumentation :– Need source recompilation (available) : -finstrument-function

– Or tools for binary instrumentation : MAQAO / Pintool (experimental)

– Faster for really large number of calls to malloc

– Only provide stacks for the instrumented binaries

283/02/2019 MALT, Sébastien Valat

Page 29: MALT : MALloc Tracker - FOSDEM

What is good in kcachgrind

• List of functions with exclusive/inclusive costs

• Nice call tree

• Annotated sources

3/02/2019 MALT, Sébastien Valat 29

Page 30: MALT : MALloc Tracker - FOSDEM

SOME VIEWS

303/02/2019 MALT, Sébastien Valat

Page 31: MALT : MALloc Tracker - FOSDEM

Global summary

• Provide a small summary

• Provide some warnings

313/02/2019 MALT, Sébastien Valat

Page 32: MALT : MALloc Tracker - FOSDEM

Global summary : top 5 functions

• Summarize top functions for some metrics

• Points to check

• Examples on YALES2

323/02/2019 MALT, Sébastien Valat

Page 33: MALT : MALloc Tracker - FOSDEM

Tracking stack memory

3/02/2019 MALT, Sébastien Valat 33

Display largest stack for thread ID

Stack space used by functions on peak

Stack size over time

Thread ID

Page 34: MALT : MALloc Tracker - FOSDEM

Chunk size distribution

34

Many really small allocations

Example from YALES2

3/02/2019 MALT, Sébastien Valat

Page 35: MALT : MALloc Tracker - FOSDEM

Global variables

3/02/2019 MALT, Sébastien Valat 35

Page 36: MALT : MALloc Tracker - FOSDEM

REAL CASES

363/02/2019 MALT, Sébastien Valat

Page 37: MALT : MALloc Tracker - FOSDEM

Performance

0

10

20

30

40

50

60

70

80

90

100

valgrind-memcheck

valgrind-massif

gperf

igprof

malt

malt-finstr

3/02/2019 MALT, Sébastien Valat 37

Page 38: MALT : MALloc Tracker - FOSDEM

Allocatable arrays on YALES2

38

And mostly really small allocations !

Huge number of allocation for a line programmer think it doesn’t do any !

Search intensive alloc functions

• Issue only occur with gfortran, ifort uses stack arrays.

MALT, Sébastien Valat3/02/2019

Page 39: MALT : MALloc Tracker - FOSDEM

• Examples on YALES 2, small allocations :

We can found allocs of 1B !

39

Many codes produce allocations of 1B.OK with moderation.

Search for the minimal chunk size.

3/02/2019 MALT, Sébastien Valat

Page 40: MALT : MALloc Tracker - FOSDEM

Fragmentation issue

• Example of fragmentation detection

• Using the time chart with physical, virtual and requested memory

• Solution : avoid interleaved allocation of chunks with different lifetime.

• Looking on source annotation : most of them can be avoided.

403/02/2019 MALT, Sébastien Valat