21
Circuit Modeling for Practical Many-core Architecture Design Exploration Redefining design abstractions Dean Truong Bevan Baas VLSI Computation Lab University of California, Davis

Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

  • Upload
    dothien

  • View
    227

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Circuit Modeling for Practical Many-core Architecture

Design Exploration

Redefining design abstractions

Dean TruongBevan Baas

VLSI Computation LabUniversity of California, Davis

Page 2: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Outline

Motivation Circuit Model of Two Cores DVFS Control and Results Improved Model Tomasulo’s Algorithm Conclusions

Page 3: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Motivation We have the hardware Many-core chip capable of per-core DVFS Inter-core communication through FIFOs

We need an accurate and intuitive modeling of the system to find an optimal DVFS scheme

Achieved through appropriate analogies Circuit analogy is useful because circuit analysis has

well developed CAD and mathematical tools Control analogy is useful because control systems

analysis is very mature Model many-core systems like circuits?

Page 4: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Outline

Motivation Circuit Model of Two Cores DVFS Control and Results Improved Model Tomasulo’s Algorithm Conclusions

Page 5: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Circuit Model of Two Cores ProducerConsumer application Charge (Q) = sample Current (I) = samples/sec;

data rate Conductance (G) = cycles/sec;

(core) frequency Voltage (V) = samples/cycle;

(CPI x IC)-1

CPI = cycles per instruction IC = instructions per sample

Ohms Law: I = V x G data rate = samples/cycle x

cycles/sec Capacitance (C) = Q/V = cycles Time = C/G = seconds!

Producer(fp)

Consumer(fc)

CFIFO

ProducerConductance

ProducerVoltage

ConsumerConductance

UFIFO iFIFO

iCiP

ConsumerVoltage

UFIFO

UConsumer

iFIFO

iP

iC

+

—+

++GC

GP

FIFOCs1

UProducer

Page 6: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Circuit Model (cont.) Want UFIFO = 0 UFIFO > 0: FIFO full UFIFO < 0: FIFO empty When UFIFO ≠ 0 core(s)

are stalling (UFIFO/UCore)% of the time

Continuous time model DVFS algorithm only acts

over long periods compared to core frequencies

Tcontrol >> 1/fCore

Page 7: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Outline

Motivation Circuit Model of Two Cores DVFS Control and Results Improved Model Tomasulo’s Algorithm Conclusions

Page 8: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

DVFS Control Strategy Varying the conductance

(frequency) makes control nonlinear Approximate: two linear

control systems selected by a mux

Controller selects High and Low states through the mux Voltage dithering!

Controller samples UFIFOevery dither period Controller is “discrete”

Modeled in MATLAB and implemented in software using processor DVFS instructions

Page 9: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Measured Results Worst case:

46.6 mW at VddHigh = 1.3 V; fH = 1.2 GHz

Ideal case: 12.0 mW at

VddHigh = 0.9 V; fH = 550 MHz

DVFS: 15.6 mW at

VddHigh = 1.3 V, VddLow = 0.8 V; fH = 1.2 GHz, fL = 300 MHz

Settling time Several hours!

PL = 74.4%, PH = 25.6%

Ideal: PL = 72.22%, PH = 27.78%

Page 10: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Outline

Motivation Circuit Model of Two Cores DVFS Control and Results Improved Model Tomasulo’s Algorithm Conclusions

Page 11: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Saturation Modeling

So far our circuit model does not saturate the data rate when UFIFO ≠ 0 If UFIFO < 0,

iP > UProducer • GP

If UFIFO > 0, iC > UConsumer • GC

Have to add current limiters

iCiP

CFIFO

GP

UProducer

GC

UFIFO

UConsumer

CFIFO

GP

UProducer

GC

UConsumer

Page 12: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Saturation Modeling (cont.) MOSFETs have linear and

saturation regions Linear = resistor Saturation = current limiter

PMOS: set Vth = 0, if UFIFO ≤ 0 iP = -IDS = (k/2)VGS

2 , saturation: VDS ≤ VGS VGS = -UP

Want: iP = UPGP k = 2GP/UP

Else linear: VDS > VGS iP = k(VGSVDS – (VDS

2)/2) = -(2GPVDS – (VDS

2)/(2UP)) Want: iP = (UP – UFIFO)GP = -VDSGP

UP

UFIFO

iPDS

G

I DS

VDS

Linea

rSaturation

Page 13: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Improved Model Make a new MOS model in SPICE to

accommodate modified linear region Can adjust conductance by changing W/L Switch between two MOSFETs—High and Low

iCiP

CFIFOUProducer

UFIFO

UConsumer

MCL

MCHMPH

MPL

DVFS ControllerPMOS (producer)MPH: (W/L)HighMPL: (W/L)Low

NMOS (consumer)MCH: (W/L)HighMCL: (W/L)Low

Page 14: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Outline

Motivation Circuit Model of Two Cores DVFS Control and Results Improved Model Tomasulo’s Algorithm Conclusions

Page 15: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Tomasulo’s Algorithm

Used in the IBM 360/91 floating-point unit to allow out-of-order execution

Tomasulo’s algorithm enables out-of-order execution through register renaming via reservation stations and load/store buffers

Let us see if we can model this algorithm… How much can be abstracted into a circuit

element or logic gate?

Page 16: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

InstructionQueue

Load/StoreAddressBuffers

Memory UnitMemory

Instruction Unit Register

File

Address Unit

Reservation Station

X

Reservation StationStore

Data Buffer

Common Data Bus (CDB)

Tomasulo’s Algorithm (cont.)

Page 17: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Tomasulo’s Circuit

CInstr

UQueue

Inst

ruct

ion_

Type

CAdd_Qk

CAdd_Qj

CMult_Qj

CMult_Qk

CLoad_Qj-A

CStore_Qj-Qk-A

ULoad_Qj-A

Issu

e Add

_Qk

Issu

e Mul

t_Q

j

Issu

e Mul

t_Q

k

Issu

e Loa

d_Q

j-A

Issu

e Sto

re_Q

j-Qk-

A

Issu

e Add

_Qj

UIssue

Inst

ruct

ion_

Rd

UMult_Qk

UStore_Qj-Qk-A

UMult_Qj

UAdd_Qk

UAdd_Qj

Full S

tore

Full L

oad

Full M

ult

Full A

dd

Instruction Decode

Pulse Generators

UBias_Half

RS_Update

InstructionUnit

IInstr

CDB Arbiter

CD

BW

rBk A

dd

CDBWrBkMult

CDBWrBkLoad

CDBWrBkStore

Page 18: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Tomasulo’s Circuit (cont.)

WrD

ata A

dd_V

j

WrD

ata A

dd_V

k

WrD

ata M

ult_

Vj

WrD

ata M

ult_

Vk

WrD

ata L

oad_

Vj

WrD

ata S

tore

_Vj-V

k

CAdd_Vk

CAdd_Vj

CMult_Vj

CMult_Vk

CLoad_Vj-A

CStore_Vj-Vk

ULoad_Vj-A

UMult_Vk

UStore_Vj-Vk

UMult_Vj

UAdd_Vk

UAdd_Vj

UReg/CDB

CD

BW

rBk A

dd

CDBWrBkMult

CDBWrBkLoad

CDBWrBkStore

Exe

cute

Mul

t

Exe

cute

Add

Exe

cute

Load

Exe

cute

Sto

re

MemoryUnit

WrBkMem

Reg

iste

r_ID Pulse Generators

Register Status

CDBWrBkRegister

RS_ValueUpdate

UBias_LSB

InstructionUnit

UAdd

WrBkAdd

UMult

WrBkMultLAdd_Pipeline

LMult_Pipeline

CDBWrBkForwarding

CDB Arbiter

Page 19: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Outline

Motivation Circuit Model of Two Cores DVFS Control and Results Improved Model Tomasulo’s Algorithm Conclusions

Page 20: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Conclusions System level modeling with circuit equivalents Reuse circuit design principles and CAD Easily translates into control theory or signals and

systems analysis Reuse mixed-signal simulation tools to develop

higher level systems Optimal DVFS requires a holistic approach

from the circuit layer to application layer Circuit modeling can ease translation between layers

The physical world can be modeled as circuits Streamlines development of cyber-physical systems

Page 21: Circuit Modeling for Practical Many-core Architecture ...vcl.ece.ucdavis.edu/pubs/2010.06.DAC/DVFS_DAC_v1.pdf · Circuit Modeling for Practical Many-core Architecture Design Exploration

Acknowledgements ST Microelectronics NSF Grant 430090, 0903549 and CAREER award

546907 Intel SRC GRC Grant 1598, 1971 and CSR Grant 1659 Intellasys UC Micro SEM J.-P. Schoellkopf, K. Torki, S. Dumont,

Y.-P. Cheng, R. Krishnamurthy and M. Anders