Exploring Wakeup-Free Instruction Scheduling Jie S. Hu, N. Vijaykrishnan, and Mary Jane Irwin...

Exploring Wakeup-Free Instruction Scheduling

Jie S. Hu, N. Vijaykrishnan, and Mary Jane IrwinMicrosystems Design LabThe Pennsylvania State University

Outline

Motivation Case study: Cyclone Towards high-performance wakeup-

free scheduler A general model Employing pre-check scheme A segmented issue queue

Conclusions and future work

Superscalar Issue Queue

rdyLopd tagL

opd tagRrdyR

rdyLopd tagL

opd tagRrdyR

instN-1 inst0

Wakeup LogicDelay = Ttagdrive + Ttagmatch + TmatchOR

Ttagdirve = c0 + (c1 + c2xIW)xN + (c3 + c4xIW + c5xIW2)xN2

Ttagmatch ,TmatchOR = c0 + c1xIW + c2xIW2

S. Palacharla et al., ISCA24

Superscalar Issue Queue

Selection LogicTselection = c0 + c1xlog4N

req enb

Issue Queue

3req enb

from/to other subtrees

root cell

Challenges in Dynamic Instruction Scheduling

Broadcast-based dynamic scheduler Higher complexity Power hungry A major limiter to clock frequency: increasing issue queue size, issue

width, wire delay, and shorten logic levels per pipeline stage Complexity Effective Issue

Speculative wakeup [Stark et.al.] Dependency chain based ordering [Canal/Gonzalez ICS 00//01;

Michaud/Seznec HPCA01; Segmented Issue queue [Raasch et.al. ISCA 2002] Wakeup-free dynamic scheduler [Ernst ISCA 2003 et.al.]

Lower complexity Lower power consumption Better scalability Have to trade performance loss

Our Goals

Explore the predictability of instruction issue latency

Identify the performance impediments in wakeup-free architectures

Design high-performance wakeup-free schedulers

Cyclone: Conflict in the Main Queue

FP benchmarks Int benchmarks

Order Enforced

Enforce ordered placement to avoid conflict between instructions with different latencies

Possible Structural Problems

Instruction promotion/forwarding incurs conflict along the path

Very limited instruction pool for selection Only entries in column 0 in the main queue can be issued Ready instructions (not in column 0) are delayed due to

conflict Limited number of issue ports has less tolerance to

mispredicted ready instructions Waste issue port Prevent ready instruction from issue Complete with newly decoded instructions due to replay

A General Model: WF-Replay

lat lat lat

Wakeup-FreeIssue Queue

lat lat lat

register file ready bits

replay?

Rename

Pre-schedule

decoder

Timing Table

Selection Logic

from FUs

Collapsing issue queue without

promotion. Conventional random selection logic

Given much wider issue width

How to relax the structural constraints?

Instruction is removed if no

replay is needed

Instruction Pre-scheduling

Rename/PSCHED0

reschedule?

ing Table

PSCHED1

depcheck

MUX control

Register M

apping Table

Adapted from Cyclone, D. Ernst et. al., ISCA’03

Latency Triggered Selection

lat lat lat

lat lat latlat

req enb

root cell

WF-Replay IPC (F4-I8 vs F4-I4)

Issue Width: 8 Issue Width: 4

WF-Replay loses 9.7% performance (IPC) to Base as the issue width reduces to 4 instruction per cycle

Competition at Issue Ports?

Issue Width: 8 Issue Width: 4

Precheck to Avoid Competition

Competition at issue port may delay ready (predictive) instructions

Delayed instructions may again compete with instructions dependent on them

Causing more instructions falsely ready or to be delayed

Wider issue port can avoid unnecessary competition at cost of higher complexity

Solution: preventing falsely ready instructions from selection by pre-checking register ready bits

WF-Precheck Scheduler

Issuing

Rename

Pre-schedule

decoder

Timing Table

Register Ready Bit Registerfrom Mem.

Selection Logic

ry latry latry latry latry latry

Precheck register ready bits when predicted latency

reaches 0

Selection request is filtered by ‘ry’ bit

Trade replay for pre-check

Only issue truly ready instructions

Complexity of Pre-checking

On the average, 40.2% instructions have both source operands ready and 45.4% instructions have one source operand ready at pre-schedule stage.

Pre-check request is less than 2 per cycle.

Issue Port Competition (F4-I4)

WF-Precheck IPC (F4-I4)

Impact of Load Related Predictions

How about Selection Logic?

Selection LogicTselection = c0 + c1xlog4N

req enb

Issue Queue

3req enb

from/to other subtrees

root cell

WF-Segment Issue Queue

Selection Logic

ry ry ry ry ry ry ry ry

4 issue ports

to FUsD

ispatch Routing

e / Pre-

schedulingT

from F

decoder

itchback path

WF-Segment Issue Queue

On the average, WF-Segment trades 3% IPC loss to WF-Precheck and 5% loss to the Base for optimizing selection logic.

Conclusions

Explore and identify the performance impediments in wakeup-free scheduling

High-performance wakeup-free dynamic schedulers WF-Replay: eliminates structural constraints WF-Precheck: avoids unnecessary competition at

issue ports WF-Segment: optimizes selection logic for high

clock speed

Future Work

Routing complexity analysis in WF-Segment scheduler

Power analysis for wakeup-free schedulers

Sophisticated pre-scheduler

Wire Delay Challenges

Increasing pipeline depth for high performance

Clock period (FO4) decreases dramatically

Cross-chip wire delay will be up to 10 cycles as technology shrinks

M. S. Hrishikesh et al, ISCA29

Stephen W. Keckler et al, ISSCC’03

Precheck as A Single Stage

Load/Store Dependence Predictor

Exploring Wakeup-Free Instruction Scheduling Jie S. Hu, N. Vijaykrishnan, and Mary Jane Irwin...

Documents

Chronology of Wakeup Calls

Index • Introduction to Microsystems • Microsystems ... Plaza... · • Microsystems Technologies • Introduction: Substrates and Materials • Processes of Microelectronics

Ilumi Ambient Wakeup Light Model: WL201 User Manual

Assessing Empirical Pure vs Shift vs Wakeup Contagion

Duty-Cycled Wireless Sensor Networks: Wakeup Scheduling ... · Duty-Cycled Wireless Sensor Networks: Wakeup Scheduling ... wakeup scheduling, routing and broadcasting. ... I would

1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The

ALADINO WAKEUP - TIM · 2015-02-27 · BENVENUTO Manuale d’uso ALADINO WAKEUP ver. 1.0 1 ALADINO WAKEUP Gentile Cliente, Ti ringraziamo per aver scelto ALADINO WAKEUP, un prodotto

Worldwide Wakeup Call ~ Explanations with Eva and Duane

Wakeup Scheduling

coral wakeup

WakeUp número 3

WakeUp - La transformation digitale, un levier de croissance

APAISADO - Estrategias de Innovación TECNALIA - WakeUp

Wakeup -- OCT/NOV 08

WakeUp número 2 - abril 2014

A Wakeup Call For The Sunrise States

Jd wakeup 8_palestra_2017

Wakeup Scheduling in Wireless Sensor Networks

CHRONOLOGY OF WAKEUP CALLS - History Home calls.pdf1 CHRONOLOGY OF WAKEUP CALLS Compiled by Colin Fries, NASA History Division Updated 3/13/2015 The idea for the Wakeup Call chronology

WakeUp platform - telefonicaid.github.iotelefonicaid.github.io/wakeup_platform_documentation/doc/pdf/wake… · WakeUp platform iv 9. Log traces ..... 26