1 System-level Power Optimization. 2 Contents Low Power System Implementation Techniques Circuit...

System-level Power Optimization

Contents

Low Power System Implementation Techniques Circuit level

Clock gating MTCMOS Multiple voltage supply

Architecture level Memory Optimization Bus Optimization

Dynamic Power Management in System Level Introduction to DPM Structure of DPM Component-level DPM scheme DPM Policy Dynamic Voltage Scaling

Circuit Level Low Power System Implementation Techniques

Clock gating Most popular method for power reduction of clock

signals Need circuit to generate enable signal

Increases complexity of control logic Timing critical to avoid clock glitches at AND gate output

Additional gate delay on clock signal -> clock skew

Circuit Level Low Power System Implementation Techniques

Power gating ; Disconnecting the power source Applicable for each voltage island Long transient due to large capacitance An generate noise due to large inductive component Needs good power switch

MTCMOS ; a kind of power gating? Low VTH devices in logic to maintain performance when

active. High VTH current switch (header or footer) to cutoff

leakage path when sleep. Scheduling algorithm which controls sleep signal is

important.

Virtual GND

Virtual VDD

header

footersleep

Input Output

Circuit-Level Low-Power System Implementation Techniques

Multiple Supply Voltages Slows down non-critical path with lower voltage

supply Two or more power grids Need high-efficiency voltage converters for dynamic

voltage scaling ; Down conversion is cheaper than up-conversion. Dynamic power scheduling algorithm is important.

Low voltage supplyCritical path: need high speed logic

High voltage supply

Circuit-Level Low-Power System Implementation Techniques

Architecture-Level Low-Power System Implementation Techniques

Memory Optimization Code density minimization

Goal Minimize program memory occupation to reduce the

bandwidth of processor-memory communication Approaches

Employ custom instruction sets Object code compression

Memory Optimization

Custom instruction set Shorter size instruction sets than regular instruction

sets Example : ARM Thumb code (16-bit instruction)

Need a specific architecture for 16-it instruction support

Inst 1

Inst 2Inst 3Inst 4Inst 5

Inst 1Inst 2Inst 3Inst 4Inst 5

In this case,3/5 bandwidth reduction

Memory Optimization

Object code compression The size is the same for all instructions , but some or all

instructions are encoded and saved in instruction memory.

Available solution for embedded processors Same architecture can be used for different subset of

instructions Exploit the small subset of instructions used by firmware

Approaches Full code compression Selective code compression

Memory Optimization Full code compression

Replace all instructions with binary patterns of minimum width, [log2 N], where N is the number of instructions in the inst. set

Advantage Memory bandwidth for instruction is decreased. Advantageous when k > log2 N

Disadvantage Size of IDT may be very large because N is not small. log2 N may not be a multiple of 8.

IDT : Instruction Decompression Table

k bitsk bits

MemoryMemory

log2N bits bits

CoreMemoryMemory

Addr. Addr.

Inst. Inst.

Memory Optimization

Selective Code Compression Most program traces are covered by a small subset of

instructions. Compression of only such subset – instructions that

maximize program coverage Program is a mix of compressed and uncompressed

instructions.

8 bits

CoreMemoryMemory

Buffer

Controller

Memory Optimization

Advantage Size of IDT is fixed and limited. Instruction fetching/decompression logic has reduced

complexity. Disadvantage

Requires a controller to handle instruction fetching

Memory Optimization

Data density optimization Same principle as code density optimization For the purpose of reducing memory traffic

dynamic size of the data-set More complex than code compression, because both

compression and decompression are required Hardware compression/decompression unit needed

Design trade-off between speed and power

Bus power optimization A large amount of power is dissipated in data

communication over heavily-loaded on-chip or off-chip busses.

Reduce switching activity on busses via signal encoding for power saving

Approaches Bus-invert coding Gray code addressing

PBus = n x C x Vdd2 x freq x activity , for an n-bit bus

Architecture Level Low Power System Implementation Techniques

Bus Optimization

Bus-invert coding Add redundant line INV to bus

When INV = 0 Data is equal to remaining bus lines

When INV = 1 Data is complement of remaining bus lines

At each cycle decide whether sending the true or compliment signal leads to fewer toggles

Sourcedata Received

Data bus

INV signalPolarity

Decision logic

Bus Optimization

Gray code addressing Most instruction addresses are consecutive

Use Gray code to address Word-oriented machines

Increments by 4 (32 bit) or by 8 (64bit) Modify Gray code to switch 1 bit per increment Gray code adder needed for jump

Dec Gray(i=1) Gray(i=4) Gray(i=8)012345678

000000010011001001100111010101001100

000000010011001001000101011101101100

000000010011001001100111010101001000 i : incrementi : increment

Introduction to DPM

Dynamic Power Management (DPM) DPM controls power consumption of

components based on its usage. Prediction of component usage is essential. Methods

Shutdown (clock gating, power gating) Slowdown (frequency scaling, voltage scaling, VTH

scaling)

2DDL VfCP

f VDD f VDD

2)6.0(2/' DDL VfCP

ETPE 36.0'

0.6 VDDVDD

Structure of DPM

Levels of embodiments of DPM Component level

Circuit, Block Power mode

System level Policy

The procedure which controls the power level of each module in a system

Circuit…

Block 1

Policy

Circuit Circuit…

Block n

Circuit…

System

power mode

request

Component Level DPM Scheme

Circuit level Clock off by clock gating Power off by footer/header of MTCMOS Multiple voltage supply

Block level Power off by shutdown of power supply to IPs When power off pattern of two block are similar,

shutdown together.

GND source

Virtual VDD Virtual GNDVDD source

Component Level DPM Scheme

Power mode Each state has

combination of enabled DPM technique. ex) The case that system

uses clock gating and block shutdown

Transitions between modes of operation have a cost.

10μs 90μs

P=50mW P=0.16mW

P=400mW

Wait for interrupt Wait for wake-up event

Power state machine for the StrongARM processor

Idle Sleep

Power mode

Clock gatingBlock

shutdown

Run disabled disabled

Idle enabled disabled

Sleep enabled enabled

SA-100 Microprocessor Technical Reference Manual, Intel, 1998

DPM Policy

Predictive technique Uses a regression equation based on previous “On” and

“Off” times of the component to estimate the next “turn on” time.

Limitation It cannot handle components with more than two power

modes.Running

(R)Sleep

(S)Wake-up

Go-to-sleep

Predictive power management scheme

R E S RW

R E W R

R E S RW

R E S RW I

Pre-wakeup scheme

I: Idle state E: Entering state W: Waking up state

M. Srivastava et al, “Predictive system shutdown and other architectural techniques for energy efficient programmable computation”,

IEEE TVLSI, Vol. 4, No.1 ,1996

C.H. Hwang et al, “A predictive system shutdown method for energy saving of event-driven computation”,

Proc. Int. Conf. on Computer Aided Design, pages 28-32, Nov. 1997

DPM Policy

Markov process Markov process is a process which uses a previous

state and pre-characterized probability to choose next state.

Power management optimization has been studied within the framework of Markov process.

When system is modeled as Markov chains It can model the uncertainty in system power

consumption and response times. It can model complex systems with many power states,

buffers, queues. It can compute power management policies that are

globally optimum.

G.A. Paleologo et al, “Policy optimization for dynamic power management”, Proc. DAC, 1998

DPM Policy

Power Manager

Service RequestorService Providerqueue

Request

ObservationObservation

Command

Structure of stochastic DPM

FSM of each module

Dynamic Voltage Scaling

DVS Reducing VDD is a single most effective way to reduce

power consumption. Reducing VDD is limited by the worst-case condition. Performance requirement varies with time. Solution

Slowdown : perform the job with just-in-time performance

DVS Applied Processor

Transition overhead Max 70μs for 5~80MHz transition Max 4μJ for 5~80MHz transition

ARMCore

16KBCache

SystemCo-processor

WriteBuffer

Regulator

Fdesired VDD

System BUS 64KBSRAM ...

I/OChip

VBat T.D. Burd et al, “A dynamic voltage scaled microprocessor system”, IEEE JSSC, Nov. 2000

DPM using DVS on SoC

Divide SoC into 4 power domains Persistent 3.3V : I/O drivers and receivers Persistent 1.0V : PLL Persistent 1.8V : RTC, sleep management DVS : 1.0V ~ 1.8V (10mV/μs)

K.J. Nowka et al, “A 32-bit PowerPC System-on-a-Chip with support for dynamic voltage scaling and dynamic frequency scaling”, IEEE JSSC, Nov. 2002

1 System-level Power Optimization. 2 Contents Low Power System Implementation Techniques Circuit...

Documents

Overcoming Gating Factor

GATING GEM

Gating-ML 2flowcyt.sourceforge.net/gating/20150113.pdfJan 13, 2015 · Gating-ML 2.0 International Society for Advancement of Cytometry (ISAC) standard for representing gating descriptions

Direct Side Gating

05) Gating System

Charge-Recycling MTCMOS: Circuit Techniques and …...2006/11/16 · Multi-Threshold CMOS (MTCMOS) † It is also called guarding, power gating, ground gating, using sleep transistor,

Optimization of power in different circuits using MTCMOS

UNIT III GATING AND RISERING - rgcetpdy.ac.in YEAR/INDUSTRIAL CASTING... · 1 UNIT III GATING AND RISERING The term 'gating' or 'gating system' refers to all the passageways through

Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Gating Design

UNIT 47: Gating Design Wizard - Finite Solutionsold.finitesolutions.com/SOLIDCastWorkbook/Unit 47.pdfUNIT 47: Gating Design Wizard ... model does not include gating or risering,

Coarse-Grain MTCMOS Sleep Transistor Sizing … MTCMOS Sleep Transistor Sizing Using Delay Budgeting Ehsan Pakbaznia and Massoud Pedram ... VVSS 1 3 4. Sleep Transistor Layout VDD

Comparitive analysis of power optimization using mtcmos

Gating-ML 2 - SourceForgeflowcyt.sourceforge.net/gating/latest.pdf · owcyt.sf.net/gating/latest.pdf Gating-ML { Gating Description in Flow Cytometry Contents ... 16 Ellipse with

TRIPLE-THRESHOLD STATIC POWER …summit.sfu.ca/system/files/iritems1/8248/etd3138.pdfTRIPLE-THRESHOLD STATIC POWER MINIMIZATION TECHNIQUE IN HIGH-LEVEL SYNTHESIS USING 90NM MTCMOS

The Gating Experiments

ORIGINAL RESEARCH Open Access Gating, enhanced gating, and beyond

Gating Current

Gating-Risering Sec2

Hrishikesh Amur, Karsten Schwan Georgia Tech. Circuit level Circuit level: DVFS, power states, clock gating (ECE) Chip and Package Chip and Package: power