Upload
reginald-rodgers
View
233
Download
0
Embed Size (px)
Citation preview
Bitwidth Analysis with Application to Silicon
CompilationMark Stephenson
Jonathan Babb
Saman Amarasinghe
MIT Laboratory for Computer Science
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Goal
• For a program written in a high level language, automatically find the minimum number of bits needed to represent:– Each static variable in the program– Each operation in the program.
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Usefulness of Bitwidth Analysis
• Higher Language Abstraction
• Enables other compiler optimizations1. Synthesizing application-specific processors
2. Optimizing for power-aware processors
3. Extracting more parallelism for SIMD processors
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Bitwidth Opportunities
• Runtime profiling reveals plenty of bitwidth opportunities.
• For the SPECint95 benchmark suite,– Over 50% of operands use less than half the
number of bits specified by the programmer.
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Analysis Constraints
• Bitwidth results must maintain program correctness for all input data sets– Results are not runtime/data dependent
• A static analysis can do very well, even in light of this constraint
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Bitwidth Extraction
• Use abundant hints in the source language to discover bitwidths with near optimal precision.
• Caveats– Analysis limited to fixed-point variables.– We assume source program correctness.
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
The Hints
• Bitwidth refining constructs1. Arithmetic operations
2. Boolean operations
3. Bitmask operations
4. Loop induction variable bounding
5. Clamping operations
6. Type castings
7. Static array index bounding
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
1. Arithmetic Operations
• Exampleint a;
unsigned b;
a = random();
b = random();
a = a / 2;
b = b >> 4;
a: 32 bits b: 32 bits
a: 31 bits b: 32 bits
a: 31 bits b: 28 bits
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
2. Boolean Operations
• Example
int a;
a = (b != 15);
a: 32 bits
a: 1 bit
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
int a;
a = random() & 0xff;
3. Bitmask Operations
• Example
a: 32 bits
a: 8 bits
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
• Applicable to for loop induction variables.
• Example
int i;
for (i = 0; i < 6; i++) {
…
}
4. Loop Induction Variable Bounding
i: 32 bits
i: 3 bits
i: 3 bits
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
5. Clamping Optimization
• Multimedia codes often simulate saturating instructions.
• Exampleint valpred
if (valpred > 32767)
valpred = 32767
else if (valpred < -32768)
valpred = -32768
valpred: 32 bits
valpred: 16 bits
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
6. Type Casting (Part I)
• Example
int a;
char b;
a = b;
a: 32 bits b: 8 bits
a: 8 bits b: 8 bits
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
6. Type Casting (Part II)
• Example
int a;
char b;
b = a;
a: 32 bits b: 8 bits
a: 8 bits b: 8 bits
a: 8 bits b: 8 bits
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
7. Array Index Optimization
• An index into an array can be set based on the bounds of the array.
• Example int a, b;
int X[1024];
X[a] = X[4*b];
a: 32 bits b: 32 bits
a: 10 bits b: 8 bits
a: 10 bits b: 8 bits
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
• Data-flow analysis
• Three candidate lattices– Bitwidth– Vector of bits– Data-ranges
Propagating Data-Ranges
a = a + 1a: 4 bits
a: 5 bitsPropagating bitwidths
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
• Data-flow analysis
• Three candidate lattices– Bitwidth– Vector of bits– Data-ranges
Propagating Data-Ranges
a = a + 1a: 1X
a: XXXPropagating bit vectors
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
• Data-flow analysis
• Three candidate lattices– Bitwidth– Vector of bits– Data-ranges
Propagating Data-Ranges
a = a + 1a: <0,13>
a: <1,14>Propagating data-ranges
Four bits are required
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Propagating Data-Ranges
• Propagate data-ranges forward and backward over the control-flow graph using transfer functions described in the paper
• Use Static Single Assignment (SSA) form with extensions to:– Gracefully handle pointers and arrays.
– Extract data-range information from conditional statements.
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
a2 = a1:(a10)a3 = a2 + 1
Example of Data-Range Propagation
a0 = input()a1 = a0 + 1
a1 < 0
a4 = a1:(a10)c0 = a4
a5 = (a3,a4)b0 = array[a5]
Range-refinement functionstrue
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
a2 = a1:(a10)a3 = a2 + 1
Example of Data-Range Propagation
a0 = input()a1 = a0 + 1
a1 < 0
a4 = a1:(a10)c0 = a4
a5 = (a3,a4)b0 = array[a5]
<-128, 127>
<-127, -1>
<-126, 0>
<-127, 127>
<0, 127>
<0, 127>
<-126, 127> array’s bounds are [0:9]<0, 9>
<0, 9>
<-1, -1>
<0, 9>
<-1, 9><-2, 8>
<0, 9>
true
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
What to do with Loops?
• Finding the fixed-point around back edges will often saturate data-ranges.
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
What to do with Loops?
• Finding the fixed-point around back edges will often saturate data-ranges.
• Example
a0 = 0
y0 = 1
y1 = (y0, y2)
a1 = (a0, a3)
y1 < 100
a2 = a1 + 5
y2 = y1 + 1
a: 0..0
y: 1..1
y: 1..1
a: 0..0a: 0..5
y: 1..2
a: 0..10
y: 1..3
a: 0..20
y: 1..5
a: 0..25
y: 1..6
a: 0..
y: 1..
a = 0
for (y = 1; y < 100; y++)
a = a + 5;
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Our Loop Solution
• Find the closed-form solutions to commonly occurring sequences.– A sequence is a mutually dependent group of
instructions.
• Use the closed-form solutions to determine final ranges.
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Finding the Closed-Form Solution
a = 0
for i = 1 to 10
a = a + 1
for j = 1 to 10
a = a + 2
for k = 1 to 10
a = a + 3
...= a + 4
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Finding the Closed-Form Solution
a = 0
for i = 1 to 10
a = a + 1
for j = 1 to 10
a = a + 2
for k = 1 to 10
a = a + 3
...= a + 4
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
a = 0 <0,0>
for i = 1 to 10
a = a + 1 <1,460>
for j = 1 to 10
a = a + 2 <3,480>
for k = 1 to 10
a = a + 3 <24,510>
...= a + 4 <510,510>
• Non-trivial to find the exact ranges
Finding the Closed-Form Solution
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
a = 0 <0,0>
for i = 1 to 10
a = a + 1 <1,460>
for j = 1 to 10
a = a + 2 <3,480>
for k = 1 to 10
a = a + 3 <24,510>
...= a + 4 <510,510>
• Non-trivial to find the exact ranges
Finding the Closed-Form Solution
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
a = 0 <0,0>
for i = 1 to 10
a = a + 1 <1,460>
for j = 1 to 10
a = a + 2 <3,480>
for k = 1 to 10
a = a + 3 <24,510>
...= a + 4 <510,510>
• Can easily find conservative range of <0,510>
Finding the Closed-Form Solution
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
a = 0
for i = 1 to 10
a = a + 1
for j = 1 to 10
a = a + 2
for k = 1 to 10
a = a + 3
...= a + 4
• Figure out the iteration count of each loop.
Solving the Linear Sequence
<1,10>
<1,100>
<1,100>
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
a = 0
for i = 1 to 10
a = a + 1
for j = 1 to 10
a = a + 2
for k = 1 to 10
a = a + 3
...= a + 4 • Find out how much each instruction contributes to sequence using iteration
count.
Solving the Linear Sequence
<1,10>
<1,100>
<1,100>
<1,10>*<1,1>=<1,10>
<1,100>*<2,2>=<2,200>
<1,100>*<3,3>=<3,300>
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
a = 0
for i = 1 to 10
a = a + 1
for j = 1 to 10
a = a + 2
for k = 1 to 10
a = a + 3
...= a + 4 • Sum all the contributions together, and take the data-range union with the
initial value.
Solving the Linear Sequence
<1,10>
<1,100>
<1,100>
<1,10>*<1,1>=<1,10>
<1,100>*<2,2>=<2,200>
<1,100>*<3,3>=<3,300>
(<1,10>+<2,200>+<3,300>)<0,0>=<0,510>
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Results
• Standalone Bitwise compiler.– Bits cut from scalar variables– Bits cut from array variables
• With the DeepC silicon compiler.
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Percentage of Original Scalar Bits
0
20
40
60
80
100
soft
float
adpc
m
bubb
leso
rt life
intm
atm
ul
jaco
bi
med
ian
mpe
gcor
r
conv
olve
hist
ogra
m
intf
ir
pari
ty
pmat
ch sor
benchmark
perc
enta
ge o
f bit
s re
mai
ning
with Bitwise dynamic profile
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Percentage of Original Array Bits
0102030405060708090
100
softf
loat
adpc
m
bubb
leso
rt life
intm
atm
ul
jaco
bi
med
ian
mpe
gcor
r
conv
olve
hist
ogra
m
intfi
r
pari
ty
pmat
ch sor
benchmark
perc
enta
ge o
f bits
rem
aini
ng
with Bitwise dynamic profile
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
DeepC Compiler Targeted to FPGAs
Suif Frontend
C/Fortran program
Pointer alias and other high-level analyses
MachSuif Codegen
DeepC specialization
Raw parallelization
Traditional CAD optimizations
Physical Circuit
Bitwidth Analysis
Verilog
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
FPGA Area
0
200
400
600
800
1000
1200
1400
1600
1800
2000
adpc
m (
8)
bubb
leso
rt (
32)
conv
olve
(16
)
hist
ogra
m (
16)
intfi
r (3
2)
intm
atm
ul (
16)
jaco
bi (
8)
life
(1)
med
ian
(32)
mpe
gcor
r (1
6)
new
life
(1)
parit
y (3
2)
pmat
ch (
32)
sor
(32)
Are
a (C
LB
co
un
t)
Without bitwise With bitwise
Benchmark (main datapath width)
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
FPGA Clock Speed (50 MHz Target)
Without bitwise With bitwise
0
25
50
75
100
125
150
adp
cm
bub
bles
ort
conv
olve
hist
ogra
m
intf
ir
intm
atm
ul
jaco
bi life
med
ian
mpe
gcor
r
new
life
pari
ty
pmat
ch sor
XC
4000
-09
Clo
ck S
pee
d (
MH
Z)
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Power Savings
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
bubblesort histogram jacobi pmatch
Av
era
ge
Dy
na
mic
Po
we
r (m
W)
Without bitwidth analysis With bitwidth analysis
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Related Work
• Data-range propagation for branch prediction [Patterson]
• Symbolic data-range analysis [Rugina et al.]
• Bitwidth propagation [Ananian]
• Bit-vector propagation [Rahzdan, Budiu et al.]
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Summary
• Developed Bitwise: a scalable bitwidth analyzer– Standard data-flow analysis– Loop analysis– Incorporate pointer analysis
• Demonstrate savings when targeting silicon from high-level languages– 57% less area– up to 86% improvement in clock speed– less than 50% of the power
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Power Savings
• C ASIC– IBM SA27E process
• 0.15 micron drawn
– 200 MHz
• Methodology – C RTL– RTL simulation Register switching activity– Synthesis reports dynamic power
June 19th, 2000 www.cag.lcs.mit.edu/bitwise
Mismatched Bitwidths
• When operands of an instruction are of differing sizes– type conversion instructions are added,
converting both operands • to an integer of the widest of the two, and
• with the appropriate sign