Upload
harry-eaton
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
380C
• Where are we & where we are going– Managed languages
• Dynamic compilation• Inlining• Garbage collection• What else can you do when you examine the heap a
lot?– Why you need to care about workloads– Alias analysis– Dependence analysis– Loop transformations– EDGE architectures
1
2
380C lecture 18• Garbage Collection
– Why use garbage collection?– What is garbage?
• Reachable vs live, stack maps, etc.
– Allocators and their collection mechanisms• Semispace• Marksweep• Performance comparisons
• Mark Region– Incremental age based collection
• Write barriers: Friend or foe?• Generational • Beltway
Mark Region and Other Advances in Garbage
Collection
Kathryn S. McKinley Stephen M. BlackburnUniversity of Texas at Austin Australian National University
PLDI’08: Immix: A Mark-Region Collector With
Space Efficiency, Fast Collection, and Mutator Performance
Isn’t GC a bit retro?
4
“Languages without automated garbage collection are getting out of fashion. The chance of running into all kinds of memory problems is gradually outweighing the performance penalty you have to pay for garbage collection.”
Paul Jansen, managing director of TIOBE Software, in Dr Dobbs, April 2008
“Languages without automated garbage collection are getting out of fashion. The chance of running into all kinds of memory problems is gradually outweighing the performance penalty you have to pay for garbage collection.”
Paul Jansen, managing director of TIOBE Software, in Dr Dobbs, April 2008
Mark-CompactStyger, 1967
Mark-SweepMcCarthy, 1960
Semi-SpaceCheney, 1970
GC FundamentalsThe Time–Space Tradeoff
5
GC FundamentalsThe Time–Space Tradeoff
6
Our Goal
GC FundamentalsAlgorithmic Components
Allocation Reclamation
7
Identification
Bump Allocation
Free List
`̀
Tracing(implicit)
Reference Counting(explicit)
Sweep-to-Free
Compact
Evacuate
3 1
Mark-Compact [Styger 1967]
Bump allocation + trace + compact
Mark-Compact [Styger 1967]
Bump allocation + trace + compact
GC FundamentalsCanonical Garbage Collectors
8
`̀
Sweep-to-Free
Compact
Evacuate
Mark-Sweep [McCarthy 1960]
Free-list + trace + sweep-to-free
Mark-Sweep [McCarthy 1960]
Free-list + trace + sweep-to-free
Semi-Space [Cheney 1970]
Bump allocation + trace + evacuate
Semi-Space [Cheney 1970]
Bump allocation + trace + evacuate
Mark-SweepFree List Allocation + Trace + Sweep-to-Free
9
Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo
✓✓Space
efficientSpace
efficient
✓✓Simple,
very fast collection
Simple, very fast collection
Poor localityPoor locality
10
Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo
✓✓Space
efficientSpace
efficient
Mark-CompactBump Allocation + Trace + Compact
Expensive multi-pass collection
Expensive multi-pass collection
✓✓Good
localityGood
locality
Semi-SpaceBump Allocation + Trace + Evacuation
11
Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo
✓✓Good
localityGood
locality
Space inefficient
Space inefficient
Space inefficient
Space inefficient
Mark-Regionwith Sweep-To-Region
12
`̀
Sweep-to-Free
Compact
Evacuate
Reclamation
Sweep-to-Region
Mark-SweepFree-list + trace + sweep-to-free
Mark-SweepFree-list + trace + sweep-to-free
Mark-CompactBump allocation + trace + compact
Mark-CompactBump allocation + trace + compact
Semi-SpaceBump allocation + trace + evacuate
Semi-SpaceBump allocation + trace + evacuate
Mark-RegionBump + trace + sweep-to-region
Mark-RegionBump + trace + sweep-to-region
Mark-RegionBump Allocation + Trace + Sweep-to-Region
13
✓✓Simple,
very fast collection
Simple, very fast collection
✓✓Space
efficientSpace
efficient
✓✓Good
localityGood
locality
Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo
✓✓Excellent
performanceExcellent
performance
Naïve Mark-Region
14
• Contiguous allocation into regionsExcellent locality– For simplicity, objects cannot span regions
• Simple mark phase (like mark-sweep)– Mark objects and their containing region
• Unmarked regions can be freed
00
ImmixEfficient Mark-Region Garbage Collection
15
Lines and Blocks
16
Small Regions
Large Regions
✗ Fragmentation (can’t fill blocks)
✓More contiguous allocation ✗ Fragmentation (false marking)
Lines & BlocksN pages approx 1 cache line
✓Less fragmentation Objects span lines
✓Fast common case Lines marked with objects
✗ Increased metadata o/h
✗ Constrained object sizes
00
TLB locality, cache locality Block > 4 X max object size
Free FreeRecyclable lines Recyclable lines
Allocation Policy(Recycling)
17
• Recycle partially marked blocks first Minimizes fragmentation Maximizes sharing of freed blocks
• Recycle in address order– We explored other options
• Allocate into free blocks last
Opportunistic Defragmentation
18
00
• Identify source and target blocks– (see paper for heuristics)
• Evacuate objects in source blocks– Allocate into target blocks
• Opportunistic– Leave in place if no space, or object pinned
• Opportunistically evacuate fragmented blocks– Lightweight, uses same allocation mechanism– No cost in common case (specialized GC)
Other Optimizations
19
Implicit Marking
✓Most objects small Small objects implicitly mark next line✓V. Fast common case Large objects mark lines exactly Implicit line mark
Line mark
Overflow Allocation
Multi-line objects may skip many small holes Overflow allocation (used on failure)✓Large objects uncommon✓V. effective solution
✓✓
Results
Complete data available at:
http://cs.anu.edu.au/~Steve.Blackburn/pubs
20
Evaluation20 Benchmarks Hardware
21
Collectors
`̀
Methodology
DaCapoSPECjvm98
SPEC jbb2000
MMTkJikes RVM 2.9.3(Perf ≈ HotSpot 1.5)
Replay compilerDiscard outliersReport 95th %ile
Full HeapImmix
MarkSweepMarkCompact
SemiSpaceGenerational
GenIXGenMS
GenCopyStickyStickyIXStickyMS
Core 2 Duo2.4GHz, 32KB L1, 4MB L2, 2GB RAM
AMD Athlon 3500+
2.2GHz, 64KB L1, 512KB L2, 2GB
RAMPowerPC 970
1.6GHz, 32KB L1, 512KB L2, 2GB
RAM
Please see the paper for details.
Mutator Time
22
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo
Minimum Heap
23
GC Time
24
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo
Total Performance
25
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo
Generational Performance
26
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo
Sticky Performance
27
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo
PseudoJBB 2000
28
On 2.4GHz Core 2 Duo
PseudoJBB 2000
29
On 2.4GHz Core 2 Duo
Prior Work
http://www.ibm.com/developerworks/ibm/library/i-garbage1/
• IBM product collector–Mark-Region not characterized– Collector not evaluated– Product and basis for other research
• [Domani et al 2000][Kermany & Petrank 2006]
30
Mark-Region Collection
31
`̀
Sweep-to-Free
Compact
Evacuate
Mark-SweepFree-list + trace + sweep-to-free
Mark-SweepFree-list + trace + sweep-to-free
Mark-CompactBump allocation + trace + compact
Mark-CompactBump allocation + trace + compact
Semi-SpaceBump allocation + trace + evacuate
Semi-SpaceBump allocation + trace + evacuate
Mark-RegionBump allocation + trace + sweep-to-region
Mark-RegionBump allocation + trace + sweep-to-region
Sweep-to-Region
ImmixEfficient Mark-Region Collection
32
✓✓Simple,
very fast collection
Simple, very fast collection
✓✓Space
efficientSpace
efficient
✓✓Good
localityGood
locality
Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo
✓✓Excellent
performanceExcellent
performance
Open Source
Code available in JikesRVM 2.9.3 onward.
http://www.jikesrvm.org
Complete data available at:
http://cs.anu.edu.au/~Steve.Blackburn/pubs
33
Research History
• PLDI 1998– Clinger & Hanson postulated the
radioactive decay model for object lifetimes
• Genesis of Older-First– [Stefanovic, McKinley, Moss OOPSLA’99]
34
Garbage Collection Hypotheses
• Generational hypothesis: younger objects die quickly, so collect them first
• Older-first hypothesis: the collector can collect less the longer it waits
35
Survival function s(v) for object lifetime distribution
younger older
0 1/2V V
Age ordered heap
s(v)
Older-first Algorithm
36
Next Steps• Beltway
– [BJMM PLDI’02]– Increments– Belts– Combines generational and older-first
• Ulterior Reference Counting – [BM OOPSLA’03]– Reference count on-per-object basis– Responsiveness and throughput
• MMTk: [BCM SIGMETRICS’04 ICSE’04]– Toolkit for building & understanding GC– Motivated today’s work
37
3 4 5 6 7 8 9 10
33 34 35 36 37 38 39 40
0 1
Garbage Collection is the Answer to All Your Problems• Improves data and code locality
– [Huang et al. OOPSLA’02 ISMM’04, VEE’04]• Cooperative GC optimizations
– Colocation [Guyer OOPSLA’05]– Free-me [Guyer et al. PLDI’06]
• Finds leaks– [Bond ASPLOS’06, Jump POPL’07]
• Tolerates leaks– [Bond OOSLA’08]
• Helps with dynamic software updating!– [Subramaniam, Hicks ??’08]
• DaCapo Benchmarks– [Blackburn et al. OOPSLA’06 CACM’08]
38
380C
• Where are we & where we are going– Why you need to care about workloads– Managed languages
• Dynamic compilation• Inlining• Garbage collection
– Opportunity to improve data locality on-the-fly– Read: X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P.
Cheng, The Garbage Collection Advantage: Improving Program Locality, ACM Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp. 69-80, Vancouver, Canada, October 2004.
– Alias analysis– Dependence analysis– Loop transformations– EDGE architectures