44
Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

Embed Size (px)

Citation preview

Page 1: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

Bitwidth Analysis with Application to Silicon

CompilationMark Stephenson

Jonathan Babb

Saman Amarasinghe

MIT Laboratory for Computer Science

Page 2: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Goal

• For a program written in a high level language, automatically find the minimum number of bits needed to represent:– Each static variable in the program– Each operation in the program.

Page 3: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Usefulness of Bitwidth Analysis

• Higher Language Abstraction

• Enables other compiler optimizations1. Synthesizing application-specific processors

2. Optimizing for power-aware processors

3. Extracting more parallelism for SIMD processors

Page 4: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Bitwidth Opportunities

• Runtime profiling reveals plenty of bitwidth opportunities.

• For the SPECint95 benchmark suite,– Over 50% of operands use less than half the

number of bits specified by the programmer.

Page 5: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Analysis Constraints

• Bitwidth results must maintain program correctness for all input data sets– Results are not runtime/data dependent

• A static analysis can do very well, even in light of this constraint

Page 6: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Bitwidth Extraction

• Use abundant hints in the source language to discover bitwidths with near optimal precision.

• Caveats– Analysis limited to fixed-point variables.– We assume source program correctness.

Page 7: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

The Hints

• Bitwidth refining constructs1. Arithmetic operations

2. Boolean operations

3. Bitmask operations

4. Loop induction variable bounding

5. Clamping operations

6. Type castings

7. Static array index bounding

Page 8: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

1. Arithmetic Operations

• Exampleint a;

unsigned b;

a = random();

b = random();

a = a / 2;

b = b >> 4;

a: 32 bits b: 32 bits

a: 31 bits b: 32 bits

a: 31 bits b: 28 bits

Page 9: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

2. Boolean Operations

• Example

int a;

a = (b != 15);

a: 32 bits

a: 1 bit

Page 10: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

int a;

a = random() & 0xff;

3. Bitmask Operations

• Example

a: 32 bits

a: 8 bits

Page 11: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

• Applicable to for loop induction variables.

• Example

int i;

for (i = 0; i < 6; i++) {

}

4. Loop Induction Variable Bounding

i: 32 bits

i: 3 bits

i: 3 bits

Page 12: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

5. Clamping Optimization

• Multimedia codes often simulate saturating instructions.

• Exampleint valpred

if (valpred > 32767)

valpred = 32767

else if (valpred < -32768)

valpred = -32768

valpred: 32 bits

valpred: 16 bits

Page 13: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

6. Type Casting (Part I)

• Example

int a;

char b;

a = b;

a: 32 bits b: 8 bits

a: 8 bits b: 8 bits

Page 14: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

6. Type Casting (Part II)

• Example

int a;

char b;

b = a;

a: 32 bits b: 8 bits

a: 8 bits b: 8 bits

a: 8 bits b: 8 bits

Page 15: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

7. Array Index Optimization

• An index into an array can be set based on the bounds of the array.

• Example int a, b;

int X[1024];

X[a] = X[4*b];

a: 32 bits b: 32 bits

a: 10 bits b: 8 bits

a: 10 bits b: 8 bits

Page 16: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

• Data-flow analysis

• Three candidate lattices– Bitwidth– Vector of bits– Data-ranges

Propagating Data-Ranges

a = a + 1a: 4 bits

a: 5 bitsPropagating bitwidths

Page 17: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

• Data-flow analysis

• Three candidate lattices– Bitwidth– Vector of bits– Data-ranges

Propagating Data-Ranges

a = a + 1a: 1X

a: XXXPropagating bit vectors

Page 18: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

• Data-flow analysis

• Three candidate lattices– Bitwidth– Vector of bits– Data-ranges

Propagating Data-Ranges

a = a + 1a: <0,13>

a: <1,14>Propagating data-ranges

Four bits are required

Page 19: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Propagating Data-Ranges

• Propagate data-ranges forward and backward over the control-flow graph using transfer functions described in the paper

• Use Static Single Assignment (SSA) form with extensions to:– Gracefully handle pointers and arrays.

– Extract data-range information from conditional statements.

Page 20: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

a2 = a1:(a10)a3 = a2 + 1

Example of Data-Range Propagation

a0 = input()a1 = a0 + 1

a1 < 0

a4 = a1:(a10)c0 = a4

a5 = (a3,a4)b0 = array[a5]

Range-refinement functionstrue

Page 21: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

a2 = a1:(a10)a3 = a2 + 1

Example of Data-Range Propagation

a0 = input()a1 = a0 + 1

a1 < 0

a4 = a1:(a10)c0 = a4

a5 = (a3,a4)b0 = array[a5]

<-128, 127>

<-127, -1>

<-126, 0>

<-127, 127>

<0, 127>

<0, 127>

<-126, 127> array’s bounds are [0:9]<0, 9>

<0, 9>

<-1, -1>

<0, 9>

<-1, 9><-2, 8>

<0, 9>

true

Page 22: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

What to do with Loops?

• Finding the fixed-point around back edges will often saturate data-ranges.

Page 23: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

What to do with Loops?

• Finding the fixed-point around back edges will often saturate data-ranges.

• Example

a0 = 0

y0 = 1

y1 = (y0, y2)

a1 = (a0, a3)

y1 < 100

a2 = a1 + 5

y2 = y1 + 1

a: 0..0

y: 1..1

y: 1..1

a: 0..0a: 0..5

y: 1..2

a: 0..10

y: 1..3

a: 0..20

y: 1..5

a: 0..25

y: 1..6

a: 0..

y: 1..

a = 0

for (y = 1; y < 100; y++)

a = a + 5;

Page 24: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Our Loop Solution

• Find the closed-form solutions to commonly occurring sequences.– A sequence is a mutually dependent group of

instructions.

• Use the closed-form solutions to determine final ranges.

Page 25: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Finding the Closed-Form Solution

a = 0

for i = 1 to 10

a = a + 1

for j = 1 to 10

a = a + 2

for k = 1 to 10

a = a + 3

...= a + 4

Page 26: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Finding the Closed-Form Solution

a = 0

for i = 1 to 10

a = a + 1

for j = 1 to 10

a = a + 2

for k = 1 to 10

a = a + 3

...= a + 4

Page 27: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

a = 0 <0,0>

for i = 1 to 10

a = a + 1 <1,460>

for j = 1 to 10

a = a + 2 <3,480>

for k = 1 to 10

a = a + 3 <24,510>

...= a + 4 <510,510>

• Non-trivial to find the exact ranges

Finding the Closed-Form Solution

Page 28: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

a = 0 <0,0>

for i = 1 to 10

a = a + 1 <1,460>

for j = 1 to 10

a = a + 2 <3,480>

for k = 1 to 10

a = a + 3 <24,510>

...= a + 4 <510,510>

• Non-trivial to find the exact ranges

Finding the Closed-Form Solution

Page 29: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

a = 0 <0,0>

for i = 1 to 10

a = a + 1 <1,460>

for j = 1 to 10

a = a + 2 <3,480>

for k = 1 to 10

a = a + 3 <24,510>

...= a + 4 <510,510>

• Can easily find conservative range of <0,510>

Finding the Closed-Form Solution

Page 30: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

a = 0

for i = 1 to 10

a = a + 1

for j = 1 to 10

a = a + 2

for k = 1 to 10

a = a + 3

...= a + 4

• Figure out the iteration count of each loop.

Solving the Linear Sequence

<1,10>

<1,100>

<1,100>

Page 31: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

a = 0

for i = 1 to 10

a = a + 1

for j = 1 to 10

a = a + 2

for k = 1 to 10

a = a + 3

...= a + 4 • Find out how much each instruction contributes to sequence using iteration

count.

Solving the Linear Sequence

<1,10>

<1,100>

<1,100>

<1,10>*<1,1>=<1,10>

<1,100>*<2,2>=<2,200>

<1,100>*<3,3>=<3,300>

Page 32: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

a = 0

for i = 1 to 10

a = a + 1

for j = 1 to 10

a = a + 2

for k = 1 to 10

a = a + 3

...= a + 4 • Sum all the contributions together, and take the data-range union with the

initial value.

Solving the Linear Sequence

<1,10>

<1,100>

<1,100>

<1,10>*<1,1>=<1,10>

<1,100>*<2,2>=<2,200>

<1,100>*<3,3>=<3,300>

(<1,10>+<2,200>+<3,300>)<0,0>=<0,510>

Page 33: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Results

• Standalone Bitwise compiler.– Bits cut from scalar variables– Bits cut from array variables

• With the DeepC silicon compiler.

Page 34: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Percentage of Original Scalar Bits

0

20

40

60

80

100

soft

float

adpc

m

bubb

leso

rt life

intm

atm

ul

jaco

bi

med

ian

mpe

gcor

r

conv

olve

hist

ogra

m

intf

ir

pari

ty

pmat

ch sor

benchmark

perc

enta

ge o

f bit

s re

mai

ning

with Bitwise dynamic profile

Page 35: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Percentage of Original Array Bits

0102030405060708090

100

softf

loat

adpc

m

bubb

leso

rt life

intm

atm

ul

jaco

bi

med

ian

mpe

gcor

r

conv

olve

hist

ogra

m

intfi

r

pari

ty

pmat

ch sor

benchmark

perc

enta

ge o

f bits

rem

aini

ng

with Bitwise dynamic profile

Page 36: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

DeepC Compiler Targeted to FPGAs

Suif Frontend

C/Fortran program

Pointer alias and other high-level analyses

MachSuif Codegen

DeepC specialization

Raw parallelization

Traditional CAD optimizations

Physical Circuit

Bitwidth Analysis

Verilog

Page 37: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

FPGA Area

0

200

400

600

800

1000

1200

1400

1600

1800

2000

adpc

m (

8)

bubb

leso

rt (

32)

conv

olve

(16

)

hist

ogra

m (

16)

intfi

r (3

2)

intm

atm

ul (

16)

jaco

bi (

8)

life

(1)

med

ian

(32)

mpe

gcor

r (1

6)

new

life

(1)

parit

y (3

2)

pmat

ch (

32)

sor

(32)

Are

a (C

LB

co

un

t)

Without bitwise With bitwise

Benchmark (main datapath width)

Page 38: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

FPGA Clock Speed (50 MHz Target)

Without bitwise With bitwise

0

25

50

75

100

125

150

adp

cm

bub

bles

ort

conv

olve

hist

ogra

m

intf

ir

intm

atm

ul

jaco

bi life

med

ian

mpe

gcor

r

new

life

pari

ty

pmat

ch sor

XC

4000

-09

Clo

ck S

pee

d (

MH

Z)

Page 39: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Power Savings

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

bubblesort histogram jacobi pmatch

Av

era

ge

Dy

na

mic

Po

we

r (m

W)

Without bitwidth analysis With bitwidth analysis

Page 40: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Related Work

• Data-range propagation for branch prediction [Patterson]

• Symbolic data-range analysis [Rugina et al.]

• Bitwidth propagation [Ananian]

• Bit-vector propagation [Rahzdan, Budiu et al.]

Page 41: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Summary

• Developed Bitwise: a scalable bitwidth analyzer– Standard data-flow analysis– Loop analysis– Incorporate pointer analysis

• Demonstrate savings when targeting silicon from high-level languages– 57% less area– up to 86% improvement in clock speed– less than 50% of the power

Page 42: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Page 43: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Power Savings

• C ASIC– IBM SA27E process

• 0.15 micron drawn

– 200 MHz

• Methodology – C RTL– RTL simulation Register switching activity– Synthesis reports dynamic power

Page 44: Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Mismatched Bitwidths

• When operands of an instruction are of differing sizes– type conversion instructions are added,

converting both operands • to an integer of the widest of the two, and

• with the appropriate sign