Upload
ginger-olsen
View
34
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Avoiding Initialization Misses to the Heap. Jarrod Lewis, Bryan Black, and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison Intel Labs. http://www.ece.wisc.edu/~pharm. Motivation. Memory bandwidth is expensive - PowerPoint PPT Presentation
Citation preview
Avoiding Initialization Misses to the Heap
Jarrod Lewis, Bryan Black, and Mikko H. LipastiDepartment of Electrical and Computer
EngineeringUniversity of Wisconsin—Madison
Intel Labshttp://www.ece.wisc.edu/~pharm
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 2
Motivation Memory bandwidth is expensive
Shouldn’t waste on useless traffic Can be put to better use
Multithreading, prefetching, MLP, etc. Search and destroy useless traffic Focus of this talk: heap initialization Detect and optimize initialization of
newly allocated memory23% of misses in 2MB cache are
invalid
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 3
Dynamically Allocated Memory
malloc()
free()
initializing
load orstore
free()
store
UnallocatedInvalid
Heap Space
AllocatedInvalid
AllocatedValid
Invalid memory need not be transferred Provide interface that expresses this directly?
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 4
Talk Outline Motivation Analysis of Heap Behavior Detecting Initializing Writes Performance Analysis Conclusions
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 5
Allocation Analysis Two main modes
Single dominant allocation (up to 100MB) or Numerous moderate allocations
Initialization of allocations 88% initialized with store miss Little temporal reuse of free’d memory
Phase behavior Start of program often dominates Even SPEC has counterexamples (gcc,
vortex)
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 6
Cache Miss Behavior Init stores cause up to 60% of misses (avg 23%)
These are 35% of all compulsory misses
0%
20%
40%
60%
80%
100%
bzip gap gcc vortex
Load
Non-heap Store
Modify Store
Init Store
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 7
Talk Outline Motivation Analysis of Heap Behavior Detecting Initializing Writes Performance Analysis Conclusions
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 8
Detecting Initializing Writes Annotate malloc()
Record base, size in allocation range cache
Key questions What is working set? How are ranges represented?
Valid bits? Not scalable for 100M allocation Base + bound
How are ranges updated on writes? Split vs. truncate
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 9
Allocation Working Set
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 4 8 16 32 64 128 256 >256
Number of Allocation Ranges Tracked (FIFO)
Per
cen
tag
e of
All
In
itia
lizi
ng
Sto
re M
isse
s Id
enti
fied
bzip crafeon gapgcc gzipmcf parsperl twolvort vpr
4-8 entries sufficient, except parser needs 64
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 10
Sequential Initialization
B C D E F
B C D E F
A C D E F
A B
A
B
C ED F
1. Sequential
B C D E F
C D E F
A D E F
A B
A
B
C E F
1. Forward Sweep
A A
B
C
D
InitializationPattern
TrackingScheme
Allocated-InvalidInitializedUnknown
Forward sweep captures 90%+ except Bzip, gzip, perl
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 11
Alternating Initialization
A B C D E F
B C D E F
A C D E F
A B
A
B
C ED F
2. Alternating
B C D E F
B C D E
A C D E F
A B
A
C D F
2. Bidirectional Sweep
A
F
B
E
InitializationPattern
TrackingScheme
Allocated-InvalidInitializedUnknown
Bidirectional captures 90%+ of perl Doesn’t help bzip or gzip
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 12
Striding Initialization
A B C D E F
B C D E F
A C D E F
A B
A
B
C ED F
3. Striding
BC DE F
B DE F
A C D F
A B
A
B
C E D F
3. Interleaving
A
C
E
InitializationPattern
TrackingScheme
Allocated-InvalidInitializedUnknown
Interleaving captures 90%+ of gzip Still only 60% of bzip Bzip has a large allocation with random initialization
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 13
Talk Outline Motivation Analysis of Heap Behavior Detecting Initializing Writes Performance Analysis Conclusions
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 14
PharmSim Overview
Device simulation, etc. from SimOS-PPC [IBM ARL] PharmSim replaces functional simulators
Full OOO core model, values in rename registers Supports priv. mode, MMU, TLB, exceptions, interrupts, barriers,
flushes, etc. Lead developer: Trey Cain (thanks Trey!)
Block
Simple
SimOS-PPC-AIX 4.3.1-Disk driver-E’net driverE
thern
et
PharmSim-OOO Core-Gigaplane
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 15
Operating System Effects Widely accepted for SPECINT:
Safe to ignore O/S paths Most popular tool (Simplescalar)
Intercepts system calls Emulates on host, updates “flat” memory Returns “magically” with cache contents intact
We have found that [CAECW2002]: Omitting system references leads to dramatic
error (5.8x L2 miss rate, 100% IPC in worst case)
Specifically, AIX page fault handler eliminates many initializing write misses
Had we not used PHARMsim? Dramatically overstated performance benefit
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 16
AIX Page Installation Heap manager calls sbrk
AllocatedValid
Unallocated
Data segment
Heap manager calls sbrk Malloc returns block < 4KB
Heap manager calls sbrk Malloc returns block < 4KB Program writes to block
Heap manager calls sbrk Malloc returns block < 4KB Program writes to block
First reference causes page fault
Heap manager calls sbrk Malloc returns block < 4KB Program writes to block
First reference causes page fault
AIX installs entire page using dcbz
Unallocated
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 17
Block vs. Page Installation Page installation
Practically free as part of page fault Shortcomings of page installation
Pollutes cache Not scalable to superpages (AIX v5.1) Does not work for heap reuse
Our short simulations don’t show this benefit I.e. high overlap between initializing writes
and first reference to extended data segment
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 18
Integrating ARC
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 19
Speedup
Very aggressive core model Still can’t tolerate all store miss latency
Block mode slightly better than page mode Cache pollution, less coverage
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
gap mcf parser
blockpage
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 20
Program Phase Behavior Only benefits initialization program
phase Some programs initialize throughout
execution
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 21
Conclusions Initializing writes
Cause 23% of all misses in 2MB L2 Avoid miss with block or page mode
install Up to 41% performance improvement
Subject to initialization:computation ratio
Tracking allocation ranges Working set very small (4-8, 64) Forward/bidirectional/interleaved
sweep enables range truncation
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 22
Acknowledgments Originated as course project:
Gordie Bell, Trey Cain, Kevin Lepak PHARMsim infrastructure
Lead developer: Trey Cain Financial and equipment support
IBM and Intel Corp National Science Foundation University of Wisconsin
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 23
Questions?
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 24
Backup Slides
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 25
Invalid Memory Traffic Real data traffic that transfers invalid
data
Initializing Store Initial write to a storage location that
contains invalid data
Cache M ain M emory
X ddddA 0001
FETCH X
W RITEBACK AA - dea llocatedX - a llocated/un in itia lized
MISS X
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 26
Allocation Analysis Single dominant allocation vs. Numerous moderate allocations
0%
20%
40%
60%
80%
100%
gap-count
gap-size
gcc-count
gcc-size
>=16MB<16MB<256KB<2KB<64B
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 27
Initialization of Heap 88% initialized by store miss
Relatively little temporal reuse of freed memory
0%
20%
40%
60%
80%
100%
bzip eon gcc mcf
Uninit
Hit-Init
Miss-Init
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 28
PharmSim Pipeline
Decode Execute CommitMemFetch Translate
Substantially similar to IBM Power4 Some instructions “cracked” (1:2 expansion) Others (e.g. lmw) microcode stream
Mem Stage Interface to 2-level cache model Sun Gigaplane XB snoopy MP coherence Caches contain values, must remain coherent
No cheating! No “flat” memory model for reference/redirect
April 19, 2023
Avoiding Initialization Misses to the Heap – Mikko Lipasti 29
Machine ModelUnrealistically aggressive model to devalue the
impact of store misses. 8-wide, 6-stage pipeline 8K entry combining predictor 128 RUU, 64 LSQ entries, 64 write buffers 256KB 4-way associative L1D cache 64KB 2-way associative L1I 2MB 4-way associative L2 unified cache All cache blocks are 64 bytes L2 latency is 10 cycles Memory latency is 70 cycles.