22
Explicit Modeling of Control and Data for Improved NoC Router Estimation Andrew B. Kahng +* , Bill Lin * and Siddhartha Nath + UCSD CSE + and ECE * Departments {abk, billlin, sinath}@eng.ucsd.edu

Explicit Modeling of Control and Data for Improved NoC Router Estimation

Embed Size (px)

DESCRIPTION

Explicit Modeling of Control and Data for Improved NoC Router Estimation. Andrew B. Kahng +* , Bill Lin * and Siddhartha Nath + UCSD CSE + and ECE * Departments { abk , billlin , sinath }@ eng.ucsd.edu. Outline. Motivation Our work: Overview Methodology - PowerPoint PPT Presentation

Citation preview

Explicit Modeling of Control and Data for Improved NoC

Router Estimation

Andrew B. Kahng+*, Bill Lin*

and Siddhartha Nath+

UCSD CSE+ and ECE* Departments{abk, billlin, sinath}@eng.ucsd.edu

2

Outline

• Motivation• Our work: Overview• Methodology• Flit-level power estimation• Summary

3

NoC Modeling So Far… (ORION)

Arbiter

XBAR

BUF I

BUFE

BUFW

BUFN

BUFSLink

Link

Link

Link

SRC

Link

Link

Link

Link

SINK

ORION1.0 (2002)

6NOR + 2INV + DFF

ORION2.0 (2009)

6NOR + 2INV + DFF

Leakage power

Clock power

4

What Is The Problem?

• RTL code mismatch• Logic transformation and

technology mapping mismatch

Arbiter

XBAR

BUF I

BUFE

BUFW

BUFN

BUFSLink

Link

Link

Link

SRC

Link

Link

Link

Link

SINK

6NOR + 2INV + DFF

5

How Bad Is It?Router RTL generators:Netmaker – Cambridge, UKStanford NoC - Stanford

5 6 8 100

10000

20000

30000

40000

50000

60000

ORION2.0 NetMaker Stanford

# Ports

Ins

tan

ce

Co

un

t

460%

16 24 32 640

5000

10000

15000

20000

25000

ORION2.0 NetMaker Stanford

Flit-Width (bits)

Ins

tan

ce

co

un

t

89%

Why such large errors?Assumed logic template inaccurateControl logic not modeledImplementation details missing

6

• Motivation• Our work: Overview• Methodology• Flit-level power estimation• Summary

Outline

7

P - #PortsV - #VCsB - #BUFsF – Flit-width

Key idea: No assumed logic template Component models derived from actual RTL

synthesized with cell libraries

We Propose: Step 1• Derive router component block parametric models from

post-synthesis netlists

P V B F # Instances

10 2 8 32 3300

8 2 8 32 2112

5 2 8 32 825

~P2

~P2

P V B F # Instances

5 2 8 16 400

5 2 8 32 825

5 2 8 64 1673

~F

XBAR ~ P2F

8

We Propose: Step 2

• Automatic fitting of models with post-P&R power and area

XBAR ~ P2F

P V B F Area

5 2 8 16 1439.9

5 2 8 32 2916.0

5 2 8 64 5867.4

8 2 8 32 7465.1

LSQRXBARarea =

a1.P2F + a0

Key idea: Capture implementation details using automatic regression fit

Characterization performed only once and usable for multiple design space explorations

9

• Motivation• Our work: Overview• Methodology• Flit-level power estimation• Summary

Outline

10

Model Development

• Two RTL generators:– Netmaker (Cambridge, UK)– Stanford NoC

• SP&R tools:– Cadence RC & Synopsys DC for

hierarchical synthesis to analyze each block

– Cadence SOC Encounter for P&R

NoC router RTL generators

Impl params: Clock Frequency

µArch params: P, V, B, F

Synthesis and P&R: DC/RC, SOCE

Analysis of blocks: XBAR, SW & VC arbiter, Input & Output

buffers

New models for each component block

Component Model

XBAR P2F

SWVC 9(P2V2 + P2 + PV – P)

InBUF 180PV + 2PVBF + 2P2VB + 3PVB + 5P2B + P2 + PF + 15P

OutBUF 25P + 80PV

CLKCTRL 0.02(SWVC + InBUF + OutBUF)

11

Overall Methodology

• Manual– Quick and easy– Misses implementation

details

Basic Regression fit

Manual

Estimates for gate count

ORION_NEW models

LSQR

Technology Library

Cell area

Cell leakage

Pin cap.

Internalenergy

Area Power: leakage, internal, switching

Post P&R data per block

Std. cell count & area

Leakage power

Internal power

Switching power

• LSQR– Accurate (captures implementation

details)– One-time overhead (generation of

P&R training data points)

12

NEW 2.0 NEW 2.0 NEW 2.0 NEW 2.045nm 65nm 45nm 65nm

Stanford NoC NetMaker

0%

20%

40%

60%

80%

100%Avg Max Min

POWER

6.5x reduction

Results: Area And Power

NEW 2.0 NEW 2.0 NEW 2.0 NEW 2.045nm 65nm 45nm 65nm

Stanford NoC NetMaker

0%

20%

40%

60%

80%

100%Avg Max MinAREA

4x reduction

Methodology scales across technologies, router RTL generators

13

• Motivation• Our work: Overview• Methodology• Flit-level power estimation• Summary

Outline

14

Flit-level Power Estimation• Dynamic power estimation using flit-level bit encodings• Have integrated with full-system NoC simulator (GARNET)

Post-P&R router netlist

Testbench Gate-level simulation

VCDPower

analysis

Power ReportRegression fit

ORION_NEW models

Flit-level power model

GARNET gem5

Flit-level power estimates

15

Results: Flit-level Power

• Accurate estimation of flit-level dynamic power

Flit NEW 2.0 Flit NEW 2.0Stanford NoC NetMaker

0%

20%

40%

60%

80%Avg Max Min

3.6x reduction

16

• Motivation• Our work: Overview• Methodology• Flit-level power estimation• Summary

Outline

17

Summary• New hybrid modeling methodology: relax the

template mindset– Explicitly models control and data signals– Captures RTL and implementation details

• Using proposed parametric regression methodology, worst-case estimation errors reduced by a factor of– 6.5x from ORION2.0 for power– 4x from ORION2.0 for area

• We propose an application of our methodology for flit-level dynamic power modeling and integration with GARNET– 3.6x worst-case error reduction in dynamic power estimation

• Ongoing: Non-parametric modeling of post-P&R power and area

18

Thank You !

19

Back upBack up

20

Regression analysis approach• Multi-step regression fit

– Step 1: Fit instances of each router component with post-layout instance counts

a1. Instsmodel <component> + a0 = Inststool <component>

Step 2a: Fit area of each router component with post-layout area

b1. InstsRmodel <component> + b0 = Areatool <component>

InstsRmodel <component> = a1. Instsmodel <component> + a0

Step 2b: Fit power of each router component with post-layout power (leakage, internal, switching separately)

{c5, d5, e5}. InstsRmodel XBAR + {c4, d4, e4}.InstsR model SWVC +

{c3, d3, e3}.InstsRmodel InBUF + {c2, d2, e2}.InstsR

model OutBUF + {c1, d1, e1}.InstsR

model CLKCTRL + {c0, d0, e0} = {Pleak tool,Pint tool, PSW tool}

21

Related work

• Architecture templates– ORION2.0

• Gate-level analytical models

• Parametric regression– Pre- and post-layout

power estimation– RTL simulations

• Non-parametric regression– MARS

NoC Modeling

Regression model

Parametric Non-parametric

ORION_NEW + regression;

flit-level

Circuit model

Arch templates

Analytical

Significant Departure: Relax the “template” mindset

Control

Tool

22

Results

5 6 8 100

10000

20000

30000

40000

50000

60000ORION2.0NetMakerStanford NoC

# Ports

Inst

ance

Co

un

t

5 6 8 100

5001000150020002500300035004000

NEWNetMakerStanford NoC

# Ports

Inst

ance

Co

un

t16 24 32 64

0

5000

10000

15000

20000

25000ORION2.0NetMakerStanford NoC

Flit-Width (bits)

Inst

ance

Co

un

t

16 24 32 640

5000

10000

15000

20000

25000NEWNetMakerStanford NoC

Flit-Width (bits)In

stan

ce C

ou

nt

• Avg. estimation error in # instances reduced from 109.5% to 8.8% – Avg. estimation error in area reduced to 9.8%– Avg estimation error in power reduced to 4.58%