Vam: A Locality-Improving Dynamic Memory Allocator
Preview:
DESCRIPTION
Presents Vam, a memory allocator that improves cache-level and virtual memory locality. Vam is distributed with Heap Layers (www.heaplayers.org)..
Citation preview
- 1. Yi Feng & Emery Berger University of Massachusetts
Amherst A Locality-Improving Dynamic Memory Allocator
2. motivation
- Memory performance: bottleneck for many applications
- Heap data often dominates
- Dynamic allocators dictate spatial locality of heap
objects
3. related work
- Previous work on dynamic allocation
-
- Reducing fragmentation [survey: Wilson et al., Wilson &
Johnstone]
-
-
- Search inside allocator [Grunwald et al.]
-
-
- Programmer-assisted [Chilimbi et al., Truong et al.]
-
-
- Profile-based [Barrett & Zorn, Seidl & Zorn]
4. this work
- Replacement allocator calledVam
-
- Improves allocator & application locality
-
- Automatic and transparent
5. outline
-
- Virtual Memory Performance
6. Vam design
- Builds on previous allocator designs
-
- Doug Lea, default allocator in Linux/GNU libc
-
- Poul-Henning Kamp, default allocator in FreeBSD
-
- Reap [Berger et al. 2002]
7. DLmalloc
-
-
- coarse-grained, coalesced
-
- Object headers ease deallocation and coalescing
8. PHKmalloc
-
- Improve page-level locality
-
- Coarse size classes: 2 xorn *page size
-
- Page divided into equal-size chunks, bitmap for allocation
-
-
- Objects share headers at page start (BIBOP)
-
- Discards free pages viamadvise
9. Reap
-
- Capture speed and locality advantages of region allocation
while providing individual frees
-
- Pointer-bumping allocation
-
- Reclaims free objects on associated heap
10. Vam overview
-
- Improve application performance across wide range of available
RAM
-
- Fine-grained size classes
-
- No headers for small objects
- Implemented inHeap Layersusing C++ templates[Berger et al.
2001]
11. page-based heap
- Virtual space divided into pages
12. page-based heap Heap Space Page Descriptor Table free
discard 13. fine-grained size classes
- Small (8-128 bytes)andmedium (136-496 bytes)sizes
-
- dedicated per-size page blocks (group of pages)
-
- reap-like allocation inside block
available full 14. fine-grained size classes
- Largesizes(504-32K bytes)
-
- also 8 bytes apart, best-fit
-
- collocated in contiguous pages
- Extremely largesizes(above 32KB)
Contiguous Pages free free coalesce empty empty empty empty
empty 504 512 520 528 536 544 552 560 Free List Table 15. header
elimination
- Object headers simplify deallocation & coalescing but:
- Eliminated in Vam for small objects
header object per-page metadata 16. header elimination
- Need to distinguish headered from headerless objects
infree()
-
- Heap address space partitioning
address space 16MB area (homogeneous objects) partition table
17. outline
-
- Virtual memory performance
18. experimental setup
-
- 8KB L1 (data) cache, 512KB L2 cache, 64-byte cache lines
-
- Useperfctrpatch andperfextool to set Intel performance counters
(instructions, caches, TLB)
19. benchmarks
- Memory-intensive SPEC CPU2000 benchmarks
-
- custom allocators removed in gcc and parser
471 bytes 285 bytes 21 bytes 52 bytes Average Object Size 68K
21K 0.5K 4.4K Alloc Interval (# of inst) 30K 129K 2813K 373K Alloc
Rate (#/sec) 1.5M 5.4M 788M 9M Total Allocations 45MB 90MB 10MB
110MB Max Live Size 65MB 120MB 15MB 130MB VM Size 102 billion 114
billion 424 billion 40 billion Instructions 62 sec 43 sec 275 sec
24 sec Execution Time 255.vortex 253.perlbmk 197.parser 176.gcc 20.
space efficiency
- Fragmentation = max (physical) mem in use / max live data of
app
21. total execution time 22. total instructions 23. cache
performance
- L2 cache misses closely correlated to run time performance
24. VM performance
- Application performance degrades with reduced RAM
- Better page-level locality produces better paging performance,
smoother degradation
25. 26. Vam summary
- Outperforms other allocators both with enough RAM and under
memory pressure
- Improves application locality
-
- see paper for more analysis
27. the end
-
- http:// www.heaplayers.org
28. backup slides 29. TLB performance 30. average
fragmentation
- Fragmentation = average of mem in use / live data of app