1
Rapid Estimation of Power Consumption for Hybrid FPGAs
Chun Hok Ho1, Philip Leong2, Wayne Luk1, Steve Wilton3
1 Department of Computing, Imperial College London 2 Department of Computer Science and Engineering, Chinese University of Hong Kong 3 Department of Electrical and Computer Engineering, University of British Columbia9 September 2008
2
Overview
1. Motivation2. Contributions3. Related Work4. Rapid Power Estimation Flow5. Technology Mapper6. Evaluation7. Future work + Conclusion
3
Motivation
For a new hybrid FPGA architectureHow do we assess power dissipation rapidly?
How do we map application into such architecture effectively?
4
Contributions
High level power estimation flowEstimate the power using various vendor
toolchain and technique
Hybrid FPGA technology mapperProduce netlist/bitstream based on dataflow
graph (DFG)
5
U1:fpmul
control status
Q D
U2:fpadd
control status
U{D-1}:wb
bit 0
bit 1
bit 2
bit N-1
control status
Output Mux
Input Buses (M)
Feedback Registers (F)
FeedbackMux
Output Buses
(R)
control
Control Signal Input Status Flag Output
Floating Point
Multiplier
Floating Point Adder/Subtractor
status
bit 0
bit 1
bit 2
bit N-1
control status
U0:wb
D=9, M=4, R=3, F=3, 2 add, 2 mul: best density over benchmarks
Related work Hybrid FPGA: architecture [1]
[1] C. Ho et. al , “Domain-Specific Hybrid FPGA: Architecture and Floating Point Applications”, FPL 2007
6
Related work: Virtual Embedded Blocks [1]
Distr ibuted VEBs in a vir tual FPGAEm bed ded B loc k in ASIC
tp d
L
W
Equivalent VEB using LC
L '
W'
W L W ' L'tpd tpd'
tpd '
• Dummy blocks used to model coarse-grained block’s area and delay
• Timing analyzer can be used to determine hybrid’s performance (including fine-to-coarse routing and delays)
[1] C. Ho et. al, “Virtual Embedded Blocks: A Methodology for Evaluating Embedded Elements in FPGAs ”, FCCM 2006
7
Power estimation flow Different tools chain involved
VEB modelling flow FPGA power spreadsheet model ASIC power compiler flow
Limitation Dynamic power consumption only
(power loss due to switching activity) Constant activity rate is assumed Core only – no I/O power is assessed First order estimation
Accurate simulation based model is required
8
Power estimation flow
Pall – Total power dissipations
Pfgu – power dissipated in fine-grained unit (FGU)
Pcgu – power dissipated in coarse-grained unit (CGU)
Pr – power dissipated in routing between FGU and CGU
9
Power estimation flow (Pfgu)
1. Synthesis the circuit with VEB flow
2. Measure the power of the circuit with spreadsheet approach (P’)
Constant activity rate of 12.5% applied
3. Measure the power of the VEB with spreadsheet approach (Pveb)
4. Pfgu = P’ - Pveb
10
Power estimation flow (Pcgu)
1. Synthesis the coarse-grained unit with ASIC flow
2. Configure the ASIC netlist with bitstream
3. Apply constant activity rate on all the nets
4. Estimate the dynamic power with power compiler tool
11
Power estimation flow (Pr)
Pr can be modeled by providing suitable output loading in estimating Pcgu
Output loading can be calibrated by referring existing embedded block
Embedded multiplier blocks in Virtex II is used in calibration.
12
Power estimation flow (Pr)
1. Measure the power of multiplier in FPGA using spreadsheet (Pem)
2. Implement a multiplier in ASIC flow
3. Measure the power of ASIC multiplier (Pam)
4. Adjust loading capacitance (CL)such that Pam ~= Pem
5. Apply CL in estimating Pcgu
13
Technology mapper A tool for producing netlist/bitstream from
high level description Reuse existing C-to-gate compiler
CHiMPS [1]Trident [2] fly [3]
Only backend is different – technology mapper
[1] A. Putnam, et. al, “CHiMPS: A C-Level Compilation Flow for Hybrid CPU-FPGA Architectures”, FPL 2008[2] J. Tripp, et. al, “Trident: An FPGA Compiler Framework for Floating-Point Algorithms”, FPL 2005[3] C. Ho, et. al, “Fly - A Modifiable Hardware Compiler”, FPL 2002
14
Technology mapper
Dataflow graph
Technology Mapper
a b
+
3 input bus4 output bus
3 adders2 multipliers
...
Architectural description
U0: cgu port map(…);U1: cgu port map(…);
Interconnet in HDL
U0:001101011..U1:110101011..
Bitstreams
15
Technology mapper
Greedy algorithmNot optimal but effective in most cases
Pack as much operations in a single coarse-grained unit as possible
No suitable block – use soft core Coarse-grained units use up – use soft
core
16
Mapping example
fadd tmp1, a, bfadd tmp2, c, dfmul tmp3, tmp1, tmp2fsrt tmp4, tmp3fmul z, tmp4, g
+
a
g
+
c d
×
√×
b
z
tmp1tmp2
tmp3
tmp4
17
Mapping example
fmul
feedback registers
fadd faddfmul
tmp1 tmp2
ab cd
fadd tmp1, a, b
fadd tmp2, c, d
18
Mapping example
fmul tmp3, tmp1, tmp2
fmul
feedback registers
fadd faddfmultmp1
tmp2
tmp3
ab cd
19
Mapping example
fmul
feedback registers
fadd faddfmultmp1
tmp2
ab cd
Floating point square root implemented by
fine-grained unit
tmp3
fsqrt tmp4, tmp3 No square root dedicated block, use fine-grained unit
20
Mapping example
fmul z, tmp4, g Instantiate another
coarse-grained unit and connect altogether
fmul
feedback registers
fadd faddfmultmp1
tmp2
ab cd
fmul
feedback registers
fadd faddfmul
z
g
Floating point square root implemented by
fine-grained unit
Final circuit. Soft-core square root is instantiated
21
Evaluation
How effective of the technology mapper?Compare with optimal mapping
How much power/energy can be reduced by introducing coarse-grained unit?Compare with existing FPGA devices
22
Evaluation 8 benchmark circuits
DSP computation kernels: e.g. bflyLinear algebra: e.g. mm3Complete application: e.g. bgmSynthetic benchmark: e.g. syn2
Circuits are mapped to hybrid FPGA using technology mapper
Synthesized to Xilinx Virtex II devices for comparison
23
EvaluationTechnology mapper
24
EvaluationPower reduction
* syn7 is implemented on XC2V8000-5
25
EvaluationEnergy reduction
Energy reduced by 14 times on average
26
Future work
Integration of technology mapper into existing compilerTrident, fly
Simulation based power estimation flow for more accurate results
Power estimation comparison with HHVPR [1] flow
Static power consumption?
[1] N. Choy, et. al, “Activity-Based Power Estimation and Characterization of DSP and Multiplier Blocks in FPGAs”, FPT 2006
27
Conclusion Rapid power estimation flow on hybrid FPGA
VEB flow, FPGA power spreadsheet, ASIC power compiler
Technology mapper for hybrid FPGA Target different coarse-grained units DFG input to cope with existing compiler Produce netlist and bitstream
Assess hybrid FPGA power consumption Power reduced by 4 times Energy reduced by 14 times
Recommended