21
Dynamic Dynamic Pipelines Pipelines IF ID RD WB ALU MEM1 FP1 BR MEM2 FP2 FP3 EX Dispatch Buffer R eorder Buffer (in order) (outoforder) (outoforder) (in order)

Dynamic Pipelines

  • Upload
    peony

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Dynamic Pipelines. Interstage Buffers. Superscalar Pipeline Stages. In Program Order. Out of Order. In Program Order. Impediments to High IPC. Superscalar Pipeline Design. Instruction Fetching Issues Instruction Decoding Issues Instruction Dispatching Issues - PowerPoint PPT Presentation

Citation preview

Page 1: Dynamic Pipelines

Dynamic PipelinesDynamic Pipelines

• • •

• • •

• • •

• • •IF

ID

RD

WB

ALU MEM1 FP1 BR

MEM2 FP2

FP3

EX

DispatchBuffer

ReorderBuffer

( in order )

( out of order )

( out of order )

( in order )

Page 2: Dynamic Pipelines

Interstage BuffersInterstage Buffers

• • •

• • •

• • •

Stage i

Buffer (n)

Stage i +1

Stage i

Buffer (> n)

Stage i + 1

n

n

Stage i

Buffer (1)

Stage i + 1

(a) (b)

(c)

• • •

( in order )

( out of order )_

( in order )1

1

• • •

( in order )

Page 3: Dynamic Pipelines

Superscalar Pipeline StagesSuperscalar Pipeline Stages

Instruction Buffer

Fetch

Dispatch Buffer

Decode

Issuing Buffer

Dispatch

Completion Buffer

Execute

Store Buffer

Complete

Retire

In Program

Order

In Program

Order

Outof

Order

Page 4: Dynamic Pipelines

Impediments to High IPCImpediments to High IPC

I-cache

FETCH

DECODE

COMMIT

D-cache

BranchPredictor Instruction

Buffer

StoreQueue

ReorderBuffer

Integer Floating-point Media Memory

Instruction

RegisterData

MemoryData

Flow

EXECUTE

(ROB)

Flow

Flow

Page 5: Dynamic Pipelines

Superscalar Pipeline DesignSuperscalar Pipeline Design

Instruction Fetching IssuesInstruction Decoding IssuesInstruction Dispatching IssuesInstruction Execution IssuesInstruction Completion & Retiring Issues

Page 6: Dynamic Pipelines

Instruction FlowInstruction Flow

Challenges:– Branches: control dependences

– Branch target misalignment

– Instruction cache misses Solutions

– Code alignment (static vs.dynamic)

– Prediction/speculation

Instruction Memory

PC

3 instructions fetched

Objective: Fetch multiple instructions per cycle

Page 7: Dynamic Pipelines

I-Cache OrganizationI-Cache OrganizationR

ow

De

co

de

r

••

CacheLine

••

TAG

TAG

Address

1 cache line = 1 physical row

••

Cache

Line

••

TAG

TAG

Address

1 cache line = 2 physical rows

TAG

TAG Ro

w D

ec

od

er

Page 8: Dynamic Pipelines

Issues in DecodingIssues in Decoding

Primary Tasks– Identify individual instructions (!)– Determine instruction types– Determine dependences between instructions

Two important factors– Instruction set architecture– Pipeline width

Page 9: Dynamic Pipelines

Predecoding in the AMD K5Predecoding in the AMD K5

Page 10: Dynamic Pipelines

Instruction Dispatch and IssueInstruction Dispatch and Issue

Parallel pipeline– Centralized instruction fetch– Centralized instruction decode

Diversified pipeline– Distributed instruction execution

Page 11: Dynamic Pipelines

Necessity of Instruction DispatchNecessity of Instruction Dispatch

Page 12: Dynamic Pipelines

Centralized Reservation StationCentralized Reservation Station

Page 13: Dynamic Pipelines

Distributed Reservation StationDistributed Reservation Station

Page 14: Dynamic Pipelines

Issues in Instruction ExecutionIssues in Instruction Execution

Current trends– More parallelism bypassing very challenging– Deeper pipelines– More diversity

Functional unit types– Integer– Floating point– Load/store most difficult to make parallel– Branch– Specialized units (media)

Page 15: Dynamic Pipelines

Bypass NetworksBypass Networks

O(n2) interconnect from/to FU inputs and outputs Associative tag-match to find operands Solutions (hurt IPC, help cycle time)

– Use RF only (Power4) with no bypass network– Decompose into clusters (21264)

PCI-Cache

BR Scan

BR Predict

Fetch Q

Decode

Reorder BufferBR/CRIssue Q

CRUnit

BRUnit

FX/LD 1Issue Q

FX1Unit LD1

Unit

FX/LD 2Issue Q

LD2Unit

FX2Unit

FPIssue Q

FP1Unit

FP2Unit

StQ

D-Cache

Page 16: Dynamic Pipelines

Specialized unitsSpecialized units

Page 17: Dynamic Pipelines

New Instruction TypesNew Instruction Types

Subword parallel vector extensions– Media data (pixels, quantized datum)often 1-2 bytes

– Several operands packed in single 32/64b register{a,b,c,d} and {e,f,g,h} stored in two 32b registers

– Vector instructions operate on 4/8 operands in parallel

– New instructions, e.g. motion estimationme = |a – e| + |b – f| + |c – g| + |d – h|

Substantial throughput improvement– Usually requires hand-coding of critical loops

Page 18: Dynamic Pipelines

Issues in Completion/RetirementIssues in Completion/Retirement

Out-of-order execution– ALU instructions– Load/store instructions

In-order completion/retirement– Precise exceptions– Memory coherence and consistency

Solutions– Reorder buffer– Store buffer– Load queue snooping (later)

Page 19: Dynamic Pipelines

A Dynamic Superscalar ProcessorA Dynamic Superscalar Processor

Page 20: Dynamic Pipelines

Impediments to High IPCImpediments to High IPC

I-cache

FETCH

DECODE

COMMIT

D-cache

BranchPredictor Instruction

Buffer

StoreQueue

ReorderBuffer

Integer Floating-point Media Memory

Instruction

RegisterData

MemoryData

Flow

EXECUTE

(ROB)

Flow

Flow

Page 21: Dynamic Pipelines

Superscalar SummarySuperscalar Summary

Instruction flow– Branches, jumps, calls: predict target, direction– Fetch alignment– Instruction cache misses

Register data flow– Register renaming: RAW/WAR/WAW

Memory data flow– In-order stores: WAR/WAW– Store queue: RAW– Data cache misses