62
1 1 Increasing Reliability of Increasing Reliability of Performance-critical Performance-critical Pipeline structures Pipeline structures Niranjan Soundararajan Niranjan Soundararajan Advisors: Vijaykrishnan Narayanan Advisors: Vijaykrishnan Narayanan Anand Sivasubramaniam Anand Sivasubramaniam Computer Systems Lab (CSL) Computer Systems Lab (CSL) Microsystems Design Lab (MDL) Microsystems Design Lab (MDL) Computer Science and Engineering Computer Science and Engineering The Pennsylvania State University The Pennsylvania State University

11 Increasing Reliability of Performance-critical Pipeline structures Niranjan Soundararajan Advisors: Vijaykrishnan Narayanan Anand Sivasubramaniam Anand

Embed Size (px)

Citation preview

1111

Increasing Reliability of Increasing Reliability of Performance-critical Pipeline Performance-critical Pipeline

structuresstructures

Niranjan SoundararajanNiranjan Soundararajan

Advisors: Vijaykrishnan Narayanan Advisors: Vijaykrishnan Narayanan Anand SivasubramaniamAnand Sivasubramaniam

Computer Systems Lab (CSL)Computer Systems Lab (CSL)Microsystems Design Lab (MDL)Microsystems Design Lab (MDL)Computer Science and EngineeringComputer Science and EngineeringThe Pennsylvania State UniversityThe Pennsylvania State University

22

Reliability – Increasing Reliability – Increasing ImportanceImportance

Decreasing transistor size

More transistors

Power/Temperature Hotspots

Increasing Market Segments

HARDWARERELIABILITY

22

33

Performance critical Performance critical pipeline structurespipeline structures

Fetch Decode

BHTBTB

Icache

RATIssue Queue

Load/StoreQueue

Reorder Buffer

ARF

Dcache

ALU

FRONT END BACK END

Inst

Inst Retires

Out-of-order entry activityBack-to-Back wakeupMulti-width pipelineClock frequency increase

Alloc

44

Transistor Failure

Manufacturing Defects Wearout

Failu

re R

ate

Time

Random Errors

Solutions to reduce non-uniform aging due to

NBTI, HCE on microprocessor structures

Solutions to address impact of Process

Variations on Issue Queue

Soft Error impact of DVFS on vulnerability of GALS architectures

Bounding vulnerability of processor structures to

provide reliability guarantees

55

OutlineOutlineMotivationMotivation

ContributionsContributions

Vulnerability bounding mechanismsVulnerability bounding mechanisms

Other solutionsOther solutions– Impact of DVFS on architectural vulnerability Impact of DVFS on architectural vulnerability

of GALS architecturesof GALS architectures– Address process variations in issue queueAddress process variations in issue queue– Mitigate NBTI, HCE degradation in structures Mitigate NBTI, HCE degradation in structures

Conclusion and Future workConclusion and Future work

55

66

Introduction to Soft Errors

pn+ n+

-+

-+ -

+

-+

N

1

0

Error

Strike creates electron-hole pairs that can be absorbed by source/diffusion areas of the transistor to change state of device

Source: M. Tahoori

7777

Impact of Soft ErrorsImpact of Soft Errors

Severity Severity – In 2003, Fujitsu released SPARC64 In 2003, Fujitsu released SPARC64

with 80% with 80% ofof 200,000 latches 200,000 latches covered by transient fault covered by transient fault protectionprotection

Single Event Upset (SEU) Single Event Upset (SEU) modelmodel

MetricsMetrics– MTBF : Mean Time Between MTBF : Mean Time Between

FailuresFailures

– FIT : Failure in Time = 1 failure FIT : Failure in Time = 1 failure

in a billion hours. in a billion hours.

FITFITeffeff = FIT = FITrawraw * AVF * AVF

Severity of Soft Error Rates

Source: Shekar Borkar, Intel 2004

0

50

100

150

180 130 90 65 45 32 22 16

Chip Feature SizeRe

lative

Sof

t Erro

r Rat

e Inc

reas

e

88

Architectural Vulnerability Factor Architectural Vulnerability Factor (AVF)(AVF)

LD A

BR

ST B

ST B

ADD

Wrong PathDead Store

User Visible Output

Architecturally Correct Execution (ACE) Instruction

AVF - Fraction of bits in a structure vulnerable to soft errors

- ACE bits / (ACE bits + UnACE bits)

- Fn (Size, Time)

unACE Instruction

99

AVF: Why is it important to Micro-architects?System Specification

Architectural Design Logic Synthesis Circuit Design

Physical DesignFabrication and Packaging

AVF

FITraw

AVF per structure

System Reliability = ∑ (FITraw * AVF)

1010

State-of-ArtMicroprocessor design: Multi-dimensional problem involving Performance, Power and Reliability

Transient Fault Tolerance– Simultaneous Redundant Threading (SRT)– Lockstepping

Optimization techniques– Parashar et al., ISCA’04– Gomaa et al., ISCA’05– Parashar et al., ASPLOS’06– Reddy et al., ASPLOS’06

Performance OverheadPerformance Overhead

Single point in Single point in Performance-Reliability spacePerformance-Reliability space

1111

Micro-architectural Reliability Micro-architectural Reliability KnobKnob

Relia

bili

ty

Performance

FITrequired

More ReliableLess Performance

Less ReliableMore Performance

FITeff = FITraw * AVFFITraw and AVF being

constants

Ideal Solution FITraw inflexibleTune AVF to meet

specifications

“Challenge for computer architects is not to provide absolute guaranteesin reliability, but rather how to provide the adequate amount of reliability at the lowest cost for the target market segment”

Architecture Design for Soft Errors – Shubu Mukherjee, Intel1111

1212

Contributions

First work that provides micro-architectural knobs to satisfy processor reliability budgets for transient faults

Proactive and Reactive mechanisms to monitor and bound vulnerabilities of processor structures at cycle-level granularity

1313

AVF Monitoring Reorder Buffer/Physical Register File

Issue Queue

ALU

ReorderBuffer (PRF)

RAT ARF

Commit

Pipeline In-order

Pipeline out-of-order Pipeline

In-order

Fetch Decode

Reorder Buffer (ROB)

1. Large pipeline structure holding number of instructions

2. Each instruction spends significant percentage of lifetime in ROB

14141414

AVF Monitoring MechanismAVF Monitoring MechanismReorder Buffer (ROB)Reorder Buffer (ROB)

N

Dispatch Event

Reorder Buffer

Commit Event

B

Filled at Dispatch

Filled at WB

R

Writeback Event

Mis-speculation

N entriesEach entry B bits

Result R bits

1515

Vulnerability Control via Throttling (VCT)Vulnerability Control via Throttling (VCT)DISPATCH

WRITEBACK

REORDER BUFFER

STALL DISPATCHAND WRITEBACK

Writebackcannot be

stalled

Entire EntryACE at Dispatch

N - Entry

Size = Fn (AVF Bound)

1515

1616

VCT Performance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0% 20% 40% 60% 80% 100%AVF Bounds

Avg

Per

form

ance

w.r

.t s

ingl

e th

read

High Integrity Low Integrity

VCT

1717

Advantages of a Reactive Bounding Mechanism

AVF Bound ExceededVerify Results

Early Accounting of WritebacksMis-speculated Instructions

Reorder Buffer

1818

Simultaneous Redundant Threading (SRT): Importance of Selective

Redundancy

ReorderBuffer (PRF)

Fetch Decode

ARF

ISQ ALURAT ARF

RAT

Redundant Thread After Primary Thread

Redundant Executionprotects entire pipeline

AVF goes down

Result VerificationReduces AVF

19191919

ReorderBuffer (ROB)

Fetch Decode

ARF

ISQ ALURAT ARF

RAT

Result Buffer

Greedy Heuristic

AVF Bound Exceeded

Vulnerability Control via Selective Redundancy (VCSR) Infrastructure

2020

VCSR Performance

0.4

0.5

0.6

0.7

0.8

0.9

1

0% 20% 40% 60% 80% 100%AVF Bounds

Avg

Per

form

ance

w.r

.t s

ingl

e

thre

ad

SRT

VCT

VCSR

High Integrity Low Integrity

21212121

OptimizationsOptimizationsPrimary Thread Out Of Order CommitPrimary Thread Out Of Order Commit

ReorderBuffer (PRF)

Fetch Decode

ARF

ISQ ALURAT ARF

RAT

Result Buffer

Writeback – Commit ROB AVF

affected

Sec. Thread maintains architected state

Non-compacting Reorder BufferReduces AVF

Performance Boost since lesser instare re-executed

2222

VCH with OOO Commit Performance

0.4

0.5

0.6

0.7

0.8

0.9

1

0% 20% 40% 60% 80% 100%AVF Bounds

Avg

Per

form

ance

w.r

.t s

ingl

e

thre

ad

SRT

VCT

VCSR

High Integrity Low Integrity

VCH(OOO)

2323

Impact of vulnerability bounding

Per-cycle vulnerability bounds, guaranteeing FIT rates are met

Future Work– Looking at developing a system-level AVF

monitoring and bounding infrastructure

2424

OutlineOutlineMotivationMotivation

ContributionsContributions

Vulnerability bounding mechanismsVulnerability bounding mechanisms

Summary of other worksSummary of other works– Impact of DVFS on architectural vulnerability Impact of DVFS on architectural vulnerability

of GALS architecturesof GALS architectures– Address process variations in issue queueAddress process variations in issue queue– Mitigate NBTI, HCE degradation in structures Mitigate NBTI, HCE degradation in structures

Conclusion and Future workConclusion and Future work

2424

2525

Multiple domains, each driven by individual Multiple domains, each driven by individual clocks clocks – Need for global clock network avoidedNeed for global clock network avoided

GALS enables fine-grained VF scaling tuned to individual domains– DVFS provides high performance per watt

DVFS algorithms for GALS architectures are DVFS algorithms for GALS architectures are studied w.r.t studied w.r.t IPC per wattIPC per watt

Voltage scalingVoltage scaling affects FITaffects FITrawraw, , Frequency Frequency scalingscaling affects AVFaffects AVF

Need for vulnerability analysis in GALS Architectures

Reliability Impact ignored• Impact on AVF due to applying different DVFS algorithms

• Help designers choose DVFS algorithms meeting reliability requirements

2626

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

I ssue Queue

Nor

mal

ized

AVF

Threshold

AD

ModAD

PI

Greedy

AVF impact across algorithmsAVF impact across algorithms

Significant AVF Significant AVF variations when variations when applying different applying different algorithmsalgorithms

Most DVFS Most DVFS algorithms lead to algorithms lead to worser AVF than worser AVF than Non-DVFS Non-DVFS

38% variationLower is

better

2626

2727

OutlineOutlineMotivationMotivation

ContributionsContributions

Vulnerability bounding mechanismsVulnerability bounding mechanisms

Other solutionsOther solutions– Impact of DVFS on architectural vulnerability Impact of DVFS on architectural vulnerability

of GALS architecturesof GALS architectures– Address process variations in issue queueAddress process variations in issue queue– Mitigate NBTI, HCE degradation in structures Mitigate NBTI, HCE degradation in structures

Conclusion and Future workConclusion and Future work

2727

2828

Process Variation

Static Dynamic

•Aging •Thermal EffectsSystematic Random

10

100

1000

10000

1000 500 250 130 65 32

Technology Node (nm)

Mea

n N

umbe

r of D

opan

t Ato

ms

•Sub-wavelength Lithography•Overlay

•Dose•RDF

Process Variation (PV) - IntroductionProcess Variation: Variation in characteristics between two identically

designed circuits

1980 1990 2000 2010 2020

100nm

1m

10nm

1980 1990 2000 2010 20201980 1990 2000 2010 2020

100nm

1m

10nm

193nm193nm248nm248nm

365nm365nmLithographyLithographyWavelengthWavelength

65nm65nm

90nm90nm

130nm130nm

GenerationGeneration

GapGap

45nm45nm

32nm32nm

180nm180nm

13nm 13nm EUVEUV

1980 1990 2000 2010 2020

100nm

1m

10nm

1980 1990 2000 2010 20201980 1990 2000 2010 2020

100nm

1m

10nm

193nm193nm248nm248nm

365nm365nmLithographyLithographyWavelengthWavelength

65nm65nm

90nm90nm

130nm130nm

GenerationGeneration

GapGap

45nm45nm

32nm32nm

180nm180nm

13nm 13nm EUVEUV [J. Tschanz et al., DAC 2005]

•Performance and Power impact significant

•Lack of predictability in timing characteristics lead to loss of yield

Definite need to address PV at circuit and microarchitectural level

2828

2929

ContributionsStudy the impact of PV on the Issue Queue of a microprocessor

PV-unaware design has about 21% performance degradation w.r.t Non-PV design

PV is a non-deterministic phenomenon. Design-time static partitioning not possible. Our solution enables the fast and slow entries to co-exist

Instruction steering and sub-component switching schemes to reduce the impact of PV

Performance loss is about 1.3% w.r.t Non-PV design

3030

Issue Queue Entry

Select LogicDispatch

Write

ForwardingComparison

IssueReadForwarding

Write

V Opcode R Tag Operand R Tag Operand Dest Tag

Tag1 Tag N

DISPATCHWRITE

FORWARDING

SELECT INST. READY

INSTRUCTIONISSUE

Valid BitSet

Valid BitReset

Operand ReadyBit Set

ALLOC LOGIC

ISQ Full

Allocstalls Dispatch

t t+1 t+2 t+3

Time

Instruction wait forReady Operands

3131

Results

Stalls reduced w.r.t specific activity

Operand and port-switching further reduce stalls to a minimum

I PC

1.2

1.25

1.3

1.35

1.4

1.45

Non-PV Shutdown MCD PV-Aware

12%7.3%

1.3%

3232

OutlineOutlineMotivationMotivation

ContributionsContributions

Vulnerability bounding mechanismsVulnerability bounding mechanisms

Other solutionsOther solutions– Impact of DVFS on architectural vulnerability Impact of DVFS on architectural vulnerability

of GALS architecturesof GALS architectures– Address process variations in issue queueAddress process variations in issue queue– Mitigate NBTI, HCE degradation in structures Mitigate NBTI, HCE degradation in structures

Conclusion and Future workConclusion and Future work

3232

3333

Increasing impact of Increasing impact of transistor wearouttransistor wearout

Transistor lifetime Transistor lifetime decreasing with newer decreasing with newer technologiestechnologies

Conservative Guardbands Conservative Guardbands impact performanceimpact performance

System longevity affects System longevity affects revenuerevenueMore than 50% organizations, More than 50% organizations,

machine-age > 10 yearsmachine-age > 10 years

Time

Failu

re R

ate

Infant Mortality

Event Related

(random)

Device Wear-out

Useful life (years)

DecreasingTechnology

Source: Intel

Poll by Gartner Research, Source: J. Blome, Micro 2007

3434

ContributionsNBTI, HCE impact increasing in upcoming technologies

Conventional collapsing issue queues have unwanted instruction movement across entries– Collapsing required for age-based selection

Round-Robin scheme to provide restricted collapsing

Restricted collapsing balances switching activity, not losing much of age-based selection

3535

Implementation

SPEC2K Benchmark

Simplescalar Architectural simulator

[ISQ]

Read Delay Degradation

100M instructions

Capture Rd / Wr / Sw / Data probabilities per cell

HSpice (32nm, 380K)10-year degradation

Transistor-level Degradation modelTypically, solutions

look at worst-case probabilitiesthat might rarely occur

3636

ResultsResults

Read Delay

0

2

4

6

8

10

12

14

16

18

Deg

radat

ion

(%)

Conventional

Round Robin

32% reduction

Performance

1.6

1.62

1.64

1.66

1.68

1.7

IPC Conventional

Round Robin

1% reduction

3737

ConclusionConclusionGrowing Reliability concernGrowing Reliability concern““Pop culture of reliability has arrived”Pop culture of reliability has arrived”

- - Dr. Phil Emma, IBM [Architecture Design for Soft Dr. Phil Emma, IBM [Architecture Design for Soft Errors]Errors]

Work looks at increasing the fault-tolerance Work looks at increasing the fault-tolerance in back-endin back-end– Soft errors Soft errors – Process variationProcess variation– WearoutWearout

3737

3838

Current WorkCurrent Work

Multi-core design Multi-core design have come to prominencehave come to prominence

While cache have ECC, the multiple pipelines While cache have ECC, the multiple pipelines involve structures holding data – ECC is hardinvolve structures holding data – ECC is hard– Total vulnerability to soft errors increasesTotal vulnerability to soft errors increases

Study the Study the impact on AVF of different impact on AVF of different structures in a multi-core environmentstructures in a multi-core environment

3838

3939

Future Work

Multi-core– Cores increase, market segments increase– ILP vs TLP vs Clock frequency increase– Application/Hardware sense best

configuration

Reconfigurable Hardware– Defect Tolerance– Verification time increasing– “Firmware update” to control functionality

40404040

4141

Backup slides

4242

DVFS AlgorithmsDVFS AlgorithmsThresholdThreshold– VF scale use fixed thresholds. Preset thresholds affects VF scale use fixed thresholds. Preset thresholds affects

algorithm efficiencyalgorithm efficiency

Attack-Decay(AD)Attack-Decay(AD)– Based on util. in adjacent intervals. Attack whenever big util. Based on util. in adjacent intervals. Attack whenever big util.

change. Otherwise decay. Greedy nature affects efficiencychange. Otherwise decay. Greedy nature affects efficiency

Modified Attack-Decay (ModAD)Modified Attack-Decay (ModAD)– Attack phase modified to correspond to util. change. Large VF Attack phase modified to correspond to util. change. Large VF

swing can affect performance per wattswing can affect performance per watt

PIPI

GreedyGreedy– Sample and Hold phase. VF scaling based on EDSample and Hold phase. VF scaling based on ED22 of past 2 of past 2

intervalsintervals

µk = µk-1 + KI (q’k – qref) + Kp (q’k – q’k-1)

fk = µk / IPC

4242

4343

Vulnerability EfficiencyVulnerability Efficiency

Non-DVFS has the best Non-DVFS has the best vulnerability efficiencyvulnerability efficiency– On average, AD and PI On average, AD and PI

provide the best provide the best vulnerability efficiencyvulnerability efficiency

40% variation

4343

Lower is better

4444

Round Robin schemeRound Robin scheme

4444

Clk

Ctrl Bit

New Inst

Tail

PseudoHead (PH)

Clk

Ctrl Bit N

1 1 1 0 0PH

Later EntriesCollapse Control

Vector

Clk

Ctrl Bit 0

Head

4545

Reliability Issues of ImportanceReliability Issues of Importance

Solutions that are robust but overhead-aware Solutions that are robust but overhead-aware as wellas well

4545

4646

ContributionsContributions

Hardware Failure

Permanent Temporary

Transient Intermittent

Radiation Non-Radiation

Wearout

Soft Errors Power supply

Process variation

• Bounding vulnerability of processor structures to provide reliability guarantees • Study impact of DVFS on vulnerability of GALS architectures

Solutions to address impact of process variations on issue queue

Source: ISCA 2005 tutorial4646

Solutions to reduce non-uniform aging due to NBTI, HCE on microprocessor structures

4747

ResultsResults

0.4

0.5

0.6

0.7

0.8

0.9

1

0% 20% 40% 60% 80% 100%AVF Bounds

Avg

Per

form

ance

w.r

.t s

ingl

e

thre

ad

SRT

Throttling (T)

SR

High Integrity Low Integrity

SR with T(OOO)

4747

4848

Issue Queue

RAT

Alloc

ISQEntry id

Op

STag1

STag2 D

Tag

Stall OptimizationTable

- - -

- - -

Dest Tag

STALL

Slow Entry Bit

Source Tags (STag1, STag2)

Dem

ux

Deco

der

Dest

Tag

PV-aware steering - OptiSteerPV-aware steering - OptiSteer

Non-Collapsing

Assigns ISQEntry

4848

4949

Intra-Entry Variation schemes Intra-Entry Variation schemes Operand- and Port-SwitchingOperand- and Port-Switching

V Opcode R Tag R Tag Operand Dest Tag

Dispatch

Op STag1 Operand1 STag2 DTag

Op STag2 STag1 Operand1 DTag

Op STag1 Operand1 STag2 DTag

Dispatch Write

Issue Read

Operand

Operand Switch

Port Switch

4949

5050

Timeline of ISQ activitiesTimeline of ISQ activities

DISPATCHWRITE

FORWARDING

SELECT INST. READY

INSTRUCTIONISSUE

Valid BitSet

Valid BitReset

Operand ReadyBit Set

ALLOC LOGIC

ISQ Full

Allocstalls Dispatch

t t+1 t+2 t+3

Time

Instruction wait forReady Operands

Slow Dispatch Write

OperandSwitch

SELECT INST. READY

PortSwitch

SOT Fill

SOT ValueRequired

Forwarding Stall

PortSwitch

Less instructionsselected

Slow issue read

5050

5151

Conventional Collapsing ISQConventional Collapsing ISQ

5151

Clk

Ctrl Bit N

Tail

Issu

e

N

210

CollapsingLogic

Age-ordering forInstruction Selection

Clk

Ctrl Bit 1

HeadCollapse

5252

Round Robin schemeRound Robin scheme

5252

Clk

Ctrl Bit

New Inst

Tail

PseudoHead

Head

Collapse

Collapse

5353

NBTI/HCENBTI/HCE

NBTI – Traps due to negative voltage at gate NBTI – Traps due to negative voltage at gate (input “0”)(input “0”)– Dominant in PMOS transistorDominant in PMOS transistor– Increased when holding same data for long Increased when holding same data for long

periods periods

HCE – Traps due to high electric field near the HCE – Traps due to high electric field near the draindrain– Dominant in NMOS transistorDominant in NMOS transistor– Increased when switching activity is highIncreased when switching activity is high

VVthth shift accumulates over time, affects timing shift accumulates over time, affects timing5353

5454

ContributionsContributionsGlobal solutionsGlobal solutions– Body BiasingBody Biasing

Frequency boost increases leakage. Non-ideal for Issue QueueFrequency boost increases leakage. Non-ideal for Issue Queue

– Time-borrowingTime-borrowingAbsorbing clock jitter and skew becomes difficultAbsorbing clock jitter and skew becomes difficult

Structure-specific solutionsStructure-specific solutions– Solutions for register file, and cachesSolutions for register file, and caches

Issue Queue performance-determining structure, Issue Queue performance-determining structure, operation combines CAM, SRAM cellsoperation combines CAM, SRAM cells

•PV is a non-deterministic phenomenon. Our solution enables the fast and slow entries to co-exist

•Instruction steering and sub-component switching schemes are proposed to reduce the impact of PV

5454

5555

ResultsResults

1

1.1

1.2

1.3

1.4

1.5

NonPV PV-unAware SpeedSteer OptiSteer

I PC

1.43

1.14

1.31

1.36

5555

1.431.42

5656

Throughput comparisonThroughput comparison

5656

10.5% relative decrease

5757

Switching ActivitySwitching Activity

5757

5858

Wearout phenomenaWearout phenomena

5858

Source: J. Blome. Micro 2007

Oxide Oxide

GS D

B

N+N+

P-wellIgb

Hot Carrier Effects Negative BiasTemperature Instability

Oxide BreakdownElectro-Migration•Factors

Temperature, switching activity, data (gate bias), Vdd, current density

• NBTI, HCE impact increasing in upcoming technologies

A. Tiwari, Micro 2008

S. Sapatnekar, ISQED 2006

5959

Optimizations – Vulnerability Control Hybrid

ReorderBuffer (PRF)

Fetch Decode

ARF

ISQ ALURAT ARF

RAT

Dispatch Bandwidthnot effectively utilized

Reduces bottleneck in in-order units like Result

Buffer

6060

Microprocessor Design:Multi-Dimensional Problem

Microprocessor design: Performance not single dimension– Power– Thermal effects– Reliability

Dimension-order driven by market– Aircraft, Health-care:

Reliability– Embedded: Power, Thermal– Desktops, Game Consoles:

PerformanceMitigation of Transient Faults at the System Level –

the TTA approach. Herman Kopetz, SELSE 2006

Data sensitivity – Application DependentINTEGRITY LEVEL of APPLICATION DOMAIN

LowLow

ModerateModerate

Very HighVery High

Very HighVery High

ModerateModerate

HugeHuge

LargeLarge

SmallSmall

Present-dayPresent-dayAutomotiveAutomotive

EnterpriseEnterpriseServerServer

FlightFlightControlControl

SafetySafetyCriticalCritical

HighHighIntegrityIntegrity

ModerateModerateIntegrityIntegrity

LowLowIntegrityIntegrity

ExamplesExamplesMarketMarketVolumeVolume

Data Data Integrity Integrity

RequiremenRequirementt

ApplicationApplication

ConsumerConsumerElectronicsElectronics

6161

Domain 3

Domain 2

GALS ArchitectureGALS Architecture

Domains driven by individual Domains driven by individual clocks clocks – Domain is internally Domain is internally

synchronoussynchronous

Careful tuning of global clock Careful tuning of global clock distribution network is distribution network is avoidedavoided– Better frequency scalingBetter frequency scaling

Different domains interact Different domains interact through FIFO Buffersthrough FIFO Buffers

Fetch

Decode

Rename

RegRead

RegRead

RegRead

FPISQ

MemISQ

IntISQ

Exec Exec Exec

WriteBack

WriteBack

WriteBack

Retire

RegFile

D-cache

Domain 1

Domain 2

Domain 3

Domain 4

Domain 6Domain

5

DVFS high performance per watt

GALS enables fine-grained VF scaling tuned to individual domains

6161

6262

ContributionsContributions

DVFS algorithms for GALS architectures DVFS algorithms for GALS architectures are studied w.r.t IPC per wattare studied w.r.t IPC per watt

Voltage scalingVoltage scaling affects FITaffects FITrawraw, , Frequency Frequency scalingscaling affects AVFaffects AVF

Reliability Impact ignored

• Impact on architectural vulnerability due to applying different DVFS algorithms

• Characterize the Vulnerability Efficiency (AVF*Watts/IPC) of DVFS algorithms

• Help designers choose DVFS algorithms meeting reliability requirements

6262