Upload
sophie-osborne
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
1
Sampling-based Program Locality Approximation
Yutao Zhong, Wentao Chang
Department of Computer ScienceGeorge Mason University
June 8th,2008
2
Outline
• Background information
• Motivation
• Our sampling approach
• Experimental results
3
Reuse distance and reuse signature
a b c a a c b
• Reuse distance: the number of distinct data elements accessed between two consecutive uses of the same element
• Reuse signature: a histogram of reuse distances demonstrating the distribution of reuse distances over different lengths
2
2
Starting Point
Ending Point
4
Reuse signature application
• Relationship to cache behavior :• Capacity miss
<= reuse distance ≥ cache size• Reduce reuse distance
=> improve cache effectiveness• Current applications :
• Predict cache miss rate [Zhong+03][Marin & Mellor-Crummey 04] [Fang+05][Zhong+07]• Reorganize data [Zhong+04] • Provide caching hint [Beyls & D’Hollander 02]• Evaluate program optimizations [Beyls & D’Hollander 01] [Ding 00]
5
Reuse distance measurement
AccessTime Table
AccessTrace
DistanceHistogram
GetAccessed Memory
Address
Search Update
Address Search, Count Update
Last Record distance
Distance
① Large space and a long counting time required to store traces and count memory access
② Enormous efforts for memory-intensive program
Data Structure:
a c a b b aStarting Point
Ending Point
1
6
Motivation
• Sampling is generally effective to reduce the overhead of program behavior profiling
• We are devoted to balance efficiency and accuracy• Sample only 1% memory accesses• Improve measurement speed by 7.5 times in
average• Achieve over 99% accuracy
7
Sampling algorithms
• Utilize common structure of bursty tracing [Hirzel &
Chilimbi 01]
• Sampling rate r =|Is|/(|Is| +|IH|)
• Naïve sampling• Turn off profiling during hibernating intervals
• Non guarantee of accuracy
8
Naive sampling
. . c a b c a c a b c a c a b c d a . . . .
Memory access trace:
IH IS
Naïve sampling:
IH IS
① ② ③ ④1
Inaccurate measurement
⑤3
9
Biased sampling• Ignore datum that has been referenced within
the current hibernating period
• Measured distance always larger than or equal to actual distance
• Probability of being sampled not uniform
• Probability of being sampled not uniform
10
Biased sampling
. . c a b c a f a b c a c a b f d a . . . .
Memory access trace:
IH IS
Biased sampling:
IH IS
① ② ③ ④
⑤
11
History-preserved representative sampling
• Add an additional tag for each address in access trace
• Mark references within a sampling period as sampled in the tag
• Reuse will only be sampled when starting point marked sampled
12
History-preserved representative sampling
. . c a b c a f a b c a c a b f d a . . . .
Memory access trace:
IH IS
History-preserved representative sampling:
IH IS
① ② ③ ④
⑤
13
Further improvements
• Simplifying maintenance in hibernating intervals• Reference trace implementation: splay tree [Ding & Zhong
03]
• In sampling period, full tree maintenance
• In hibernating period, instead of a new leaf node for each access, we construct a single node for each hibernating period with a counter of the number of distinct accesses
• Fast sample tag marking and checking• To save space cost, we fix the length of sampling and
hibernating period, avoid additional tag
14
Experiments
• Benchmarks from SPEC 2006, Olden, Chaos:• Floating point programs: CactusADM, Milc,
Soplex, Apsi, MolDyn• Integer programs: Bzip2,Gcc, Libquatum,
Perimeter, TSP
• Instrumentation tool: Valgrind 3.2.3• Sampling rate : 1%• We run each individual benchmark with 3 to 6
different inputs• Repeat three time for each input
15
Experiments cont’d• Comparison of accuracy and efficiency
• Ding and Zhong ’s approximation method [Ding & Zhong 03]
• Time distance measurement [Shen+07]
• Implementation of four algorithms:• Naive sampling, biased sampling, basic and
optimized representative sampling
16
Accuracy
17
Efficiency
Sampling even outperforms the lower bound :time distance measurement
Generally, speedup is less when the input size is small
18
Efficiency
• Speedup of basic representative sampling : around 4-5 times for most cases
• Speedup of optimized representative sampling: • around 7-10 for most cases, up to 33 times • geometric mean is 7.5
• Sampling rate effect (TSP):
19
Related work• Reuse signature collection
• [Mattson+70] [Bennett & Kruskal 75] [Olken81] [Kim+91] [Sugumar & Abraham 93] [Almasi+02] [Ding & Zhong 03] [Shen+07]
• Selective monitoring• Time sampling [Zagha+96] [Anderson+97] [Burrows+00][Whaley 00] [Arnold & Sweeney 00] [Arnold & Ryder 01] [Hirzel & Chilimbi 01] [Chilimbi & Hirzel 02] [Itzkowitz+03] [Arnold & Grove 05]
• Data sampling [Larus 90] [Ding & Zhong 02] [Zhao+07]
• Uses of efficient locality analysis [Huang & Shen 96] [Li+96] [Ding 2000] [Beyls & D’ Hollander 01] [Almasi+02] [Beyls & D’ Hollander 02] [Zhong+04] [Marin & Mellor-Crummey 04] [Fang+05] [Zhong+07]
20
Future work
• Dynamically adjust sampling/hibernating lengths
• Store references in temporary buffer and then process them in batch
• Combine time sampling with data sampling
21
Thank you!
Questions?