Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Thread-Level SpecCan Be Ener
Jose RenauU i it f C lif i t S
Karin Strauss Luis C
University of California at Sahttp://masc.soe.ucsc.edu
Karin Strauss, Luis CJames Tuck, and JosUniversity of Illinois at Urbahttp://iacoma.cs.uiuc.edu
culation on a CMP rgy Efficient
t C
Ceze Wei Liu Smruti Sarangi
anta Cruz
Ceze, Wei Liu, Smruti Sarangi, ep Torrellasna-Champaign
Wire delay:N t i l lithiNot a single monolithic processo
Power:Energy-efficient design (simpleEnergy efficient design (simple
Complexity:Very large block reuse
Chip Multiprocessor withTh d L l S l iThread Level Speculatio
The 19th ACM International Conference on Supercomputing
Challenges
or
cores are efficient)cores are efficient)
Clock Reach
h?on?
g 2
Thread Le
for(i=0;X[Y[i]S ti l X[Y[i]
}Sequential
Compilers cannot parallelizTLS: Assume no dependen
for( ;X[Y[i]TLS Task A
i=0
for(
X[Y[i]}
TLS Task A
i=n/for(X[Y[i]
}
TLS Task Bi=n/
The 19th ACM International Conference on Supercomputing
}
evel Speculation (TLS)
i<n;i++) {] X[Z[i]]] = X[Z[i]]...
zences, hardware verifies
; ;i++) {] = X[Z[i]]...i<n/2
; ;i++) {
] X[Z[i]]...
/2 i<n; ;i++) {] = X[Z[i]]...
/2 i<n
g 3
Thread Le
TLS Hardware:Tracks data accesses at run-tiDetects dependence violationsKills and restarts tasks
S ti l TLS ( d i lSequential TLS (no dep viola
A A BB
B
The 19th ACM International Conference on Supercomputing
evel Speculation (TLS)
mes
ti ) TLS (d i l ti )
B
ations) TLS (dep violation)
AB B
B
B
g 4
Contrary to common wisContrary to common wisenergy-e
Identify the sources of energPropose novel energy-centrDesign energy-efficient memCMP
The 19th ACM International Conference on Supercomputing
Contributions
sdom TLS CMP can besdom, TLS CMP can be efficient
gy waste in TLSric optimizationsmory hierarchy for TLS
g 5
TLS: 27% faster and
1 CPU 6-is1 CPU 3-issue
6-issue: 23% faster and 5
The 19th ACM International Conference on Supercomputing
Main Results
d 28% more energy
TLS CMP
ssue 4 CPUs 3-issue with TLS
52% more energy
g 6
Source of energy waste W
Task squash DeTask squash De
Additional storage & logic Neg gin memory system Ne
Additional traffic SMin memory system pr
Additional instructions CoAdditional instructions Co
The 19th ACM International Conference on Supercomputing
Energy Cost of TLS
Why?
ependence violationependence violation
eed to version dataeed to version data
M aware cache coherence rotocol
ompiler overheadompiler overhead
g 7
Source of energy waste W
Task squash DeTask squash De
Additional traffic TLin memory system pr
Additional instructions CoAdditional instructions Co
The 19th ACM International Conference on Supercomputing
Energy Cost of TLS
Why?
ependence violationependence violation
LS aware cache coherence rotocol
ompiler overheadompiler overhead
g 8
Source of energy waste W
Task squash DeTask squash De
Additional storage & logic Neg gin memory system Ne
Additional traffic TLin memory system pr
Additional instructions CoAdditional instructions Co
The 19th ACM International Conference on Supercomputing
Energy Cost of TLS
Why?
ependence violationependence violation
eed to version dataeed to version data
LS aware cache coherence rotocol
ompiler overheadompiler overhead
g 9
Addit
Cache lines are associated toCache line tags are extended wCache line tags are extended w
Version ID TagVersion ID
Messages between caches c
The 19th ACM International Conference on Supercomputing
tional Storage & Logic
o taskswith Version IDwith Version ID
Data
compare Version IDs
g 10
Source of energy waste W
Task squash DeTask squash De
Additional storage & logic Neg gin memory system Ne
Additional traffic TLin memory system pr
Additional instructions CoAdditional instructions Co
The 19th ACM International Conference on Supercomputing
Energy Cost of TLS
Why?
ependence violationependence violation
eed to version dataeed to version data
LS aware cache coherence rotocol
ompiler overheadompiler overhead
g 11
More cache misses: C t di l l ti d tCannot displace speculative dat
Handling multiple versions:Example: Find the correct versioExample: Find the correct versio
Detect data dependence violNeed extra checks
The 19th ACM International Conference on Supercomputing
Additional Traffic
tta
on on cache misson on cache missations across tasks
g 12
Energy-
Substantial energy reductio
Source of energy waste En
Overlooked in performance-
Source of energy waste En
Task squash StaEn
Additional storage & logic Av
Additional traffic
Additional instructions En
The 19th ACM International Conference on Supercomputing
Centric Optimizations
n
nergy-centric optimization
-centric designs
nergy centric optimizationall after second restart nergy-aware profilinggy p g
void walking the cache
---
nergy-aware profiling
g 13
A
Spawn & commit instructionsC ti l il tiConventional compiler optimcode partitioning into tasksLive ins spillingLive-ins spilling
~15% additional i
The 19th ACM International Conference on Supercomputing
Additional Instructions
s inserted by compileri ti t ff ti d tizations not so effective due to
instructions
g 14
Reduce number of checksR d t f h h kReduce cost of each checkEliminate low-return work
Example: Energy-aware profPrune tasks that are expected toPrune tasks that are expected toenergy cost
The 19th ACM International Conference on Supercomputing
Design Philosophy
filingo give minor speedups at higho give minor speedups at high
g 15
Sim
Uni-4i Uni-6i
70 @ 5GH (
1 CPU 6-i1 CPU 3-issue
70nm @ 5GHz (same area apAll processors have same pipe16KB L1 cache (1 cycle slowe1MB L2 h hi1MB L2 cache on-chip
The 19th ACM International Conference on Supercomputing
mulation Environment
TLS4-3i
)
issue 4 CPUs 3-issue with TLS
prox.)eline depthr in TLS due to versioning)
g 16
The 19th ACM International Conference on Supercomputing
Performance
g 17
The 19th ACM International Conference on Supercomputing
Power
g 18
The 19th ACM International Conference on Supercomputing
Cost of TLS
g 19
The 19th ACM International Conference on Supercomputing
Cost of TLS
g 20
The 19th ACM International Conference on Supercomputing
Cost of TLS
g 21
The 19th ACM International Conference on Supercomputing
Cost of TLS
g 22
The 19th ACM International Conference on Supercomputing
Cost of TLS
g 23
The 19th ACM International Conference on Supercomputing
Cost of TLS
g 24
Conclusions: TL
6issue
rgy
23% speedup+87% power
Ener
3issue
p
3issue
27% speedup, +59%
Performance
Results for sing
The 19th ACM International Conference on Supercomputing
LS Power is Promising
3% speedup, -15% power
TLS
% power
gle thread applications (SPECint2000)
g 25
Quesi
Jose RenauU i it f C lif i t S
Karin Strauss Luis C
University of California at Sahttp://masc.soe.ucsc.edu
Karin Strauss, Luis CJames Tuck, and JosUniversity of Illinois at Urbahttp://iacoma.cs.uiuc.edu
ions?
t C
Ceze Wei Liu Smruti Sarangi
anta Cruz
Ceze, Wei Liu, Smruti Sarangi, ep Torrellasna-Champaign
BacSlid
kupdes
The 19th ACM International Conference on Supercomputing
Processor
g 28