30
8. Microarchitecture of Superscalars (6) Register renaming Dezső Sima Fall 2006 D. Sima, 2006

8. Microarchitecture of Superscalars (6) Register renaming

  • Upload
    illias

  • View
    57

  • Download
    0

Embed Size (px)

DESCRIPTION

8. Microarchitecture of Superscalars (6) Register renaming. Dezső Sima Fall 2006.  D. Sima, 2006. Overview. 1 The Principle of register renaming. 2 Design Space. 2.1 Overview. 2.2 Types of rename buffers. 3 Operation of register renaming. 4 Design parameters of register renaming. - PowerPoint PPT Presentation

Citation preview

Page 1: 8. Microarchitecture of Superscalars (6) Register renaming

8. Microarchitecture of Superscalars (6)Register renaming

Dezső Sima

Fall 2006

D. Sima, 2006

Page 2: 8. Microarchitecture of Superscalars (6) Register renaming

Overview

1 The Principle of register renaming•

2 Design Space•

2.1 Overview•

2.2 Types of rename buffers•

5 Implementation of renaming in superscalars•

5.1 The chronology of introducing register renaming•

5.2 Basic implementation schemes of register renaming•

3 Operation of register renaming•

4 Design parameters of register renaming•

6 Examples•

Page 3: 8. Microarchitecture of Superscalars (6) Register renaming

1. Principle of register renaming (1)

Aim: • Eliminating false data dependencies to relieve the issue bottleneck

WAW

False data dependencies

WAR

I1: mul r1, r2, r3I2: add r2, r4, r5Examples:

Write After Read(Anti dependency)

Write After Write:(Output dependency)

I1: mul r1, r2, r3I2: add r1, r4, r5

Page 4: 8. Microarchitecture of Superscalars (6) Register renaming

RBResults Retirement

Ops.

EU EU

AR

Source register numbers

1. Principle of register renaming (2)

Figure 1.1: The principle of register renaming

Basic principle to eliminate false data dependencies:

Then

- referenced source operands need to be fetched from the RB file, if they are actually renaned, else from the AR file,

- during dispatching a new rename buffer need to be allocated to each instruction whose destination register causes

false data depenency1,

- during retirement buffered results need to be transferred from the RB file to the AR file.

1 Usually, processors allocate to each dispatched instruction a rename buffer without checking for the existence of false data dependecies to reduce logic complexity.

False data dependencies are eliminated by writing generated results

temporarily to buffers, called the rename buffers (RB) instead of the referenced architectural registers (AR).

Page 5: 8. Microarchitecture of Superscalars (6) Register renaming

Layout of the rename buffers

Scope of register renaming

Rename rate

Register renaming

Layout of the register mapping

2. Design space of register renaming

2.1 Overview

Type of rename buffers

Page 6: 8. Microarchitecture of Superscalars (6) Register renaming

Types of rename buffers

Res.

2.2 Types of rename buffers

ARRR

Rename reg. file

Ops.

Reg. nrs.

Ret.

Page 7: 8. Microarchitecture of Superscalars (6) Register renaming

Rename reg. file

Allocated,valid

Available Allocated,not valid

Initialized

if instruction iscanceled

Reclaim,

Allocate, if instructionis dispatched

is retiredReclaim, if instruction

is finishedUpdate, if instruction

Res.ARRR

Ops.

Reg. nrs.

Ret.

Page 8: 8. Microarchitecture of Superscalars (6) Register renaming

AR FF

Types of rename buffers

Future file

Ops.

Reg. nrs.

Res. Res. Ret.

2.2 Types of rename buffers

PowerPC 603 (1993)PowerPC 604 (1995)PowerPC 620 (1996)

Power3 (1998)PA 8000 (1996)PA 8200 (1997)PA 8500 (1999)

ARRR

Rename reg. file

Ops.

Reg. nrs.

Ret.

Page 9: 8. Microarchitecture of Superscalars (6) Register renaming

AR FF

Future file

Ops.

Reg. nrs.

Res. Ret.

Valid

Notvalid

Initialized

Update if instruction is finished

Invalidate by referring to the same register as destination

The FF has as many entries as the ARand holds the most actual register values

Page 10: 8. Microarchitecture of Superscalars (6) Register renaming

AR, RRAR FF

Merged arch. and rename register file

Types of rename buffers

Future file

Ops.Ops.

Reg. nrs.

Res.

Reg. nrs.

Res. Res. Ret.

UltraSPARC III (1999)K7 (FX) (1999)K8 (FX) (2003)

2.2 Types of rename buffers

PowerPC 603 (1993)PowerPC 604 (1995)PowerPC 620 (1996)

Power3 (1998)PA 8000 (1996)PA 8200 (1997)PA 8500 (1999)

ARRR

Rename reg. file

Ops.

Reg. nrs.

Ret.

Page 11: 8. Microarchitecture of Superscalars (6) Register renaming

AR, RR

Merged arch. and rename register file

Ops.

Reg. nrs.

Res.

Instruction iscanceled

Availablenot valid

Instructionis completed

Initialized

RB,

AR RB,valid

Architectural registeris reclaimed

if this architectural register becomes renamed anew.

Entry is allocatedto a dispatched instruction

Instruction is finished

It needs a large number of physical registers.

During completion no physical transfer is neededfrom the rename buffer to the referenced architetural register

instead the former rename buffer changes its state and becomes

the referenced architectural register.

Page 12: 8. Microarchitecture of Superscalars (6) Register renaming

AR, RR

Power1 (1990)Power2 (1993)R10000 (1996)R12000 (1999)

Alpha 21264 (1998)Pentium 4 (FP) (2000)

K7 (FP) (1999)K8 (FP) (2003)

AR FF ROB AR

Merged arch. and rename register file

Holding renamed values in the ROB

Types of rename buffers

Future file

Ops.Ops.

Reg. nrs.

Ops.

Res.

Reg. nrs.

Reg. nrs.

Res. Res. Res.Ret. Ret.

UltraSPARC III (1999)K7 (FX) (1999)K8 (FX) (2003)

2.2 Types of rename buffers

PowerPC 603 (1993)PowerPC 604 (1995)PowerPC 620 (1996)

Power3 (1998)PA 8000 (1996)PA 8200 (1997)PA 8500 (1999)

ARRR

Rename reg. file

Ops.

Reg. nrs.

Ret.

Page 13: 8. Microarchitecture of Superscalars (6) Register renaming

Allocated,valid

Available Allocated,not valid

Initialized

if instruction iscanceled

Reclaim,

Allocate, if instructionis dispatched

is retiredReclaim, if instruction

is finishedUpdate, if instruction

Res.

Holding renamed values in the ROB

ROB AR

Ops.

Reg. nrs.

Ret.

ROB entries are extended to hold results as well.

During dispatching a new ROB entry with its result field

is allocated to each dispatched instruction.(The result field serves as the allocated rename buffer).

Page 14: 8. Microarchitecture of Superscalars (6) Register renaming

AR, RR

Power1 (1990)Power2 (1993)R10000 (1996)R12000 (1999)

Alpha 21264 (1998)Pentium 4 (FP) (2000)

K7 (FP) (1999)K8 (FP) (2003)

K5 (1995)K6 (1997)

Pentium Pro (1995)Pentium II (1997)Pentium III (1999)

Pentium 4 (FX) (2000)Pentium M (2003)

Core (2006)

AR FF ROB AR

Merged arch. and rename register file

Holding renamed values in the ROB

Types of rename buffers

Future file

Ops.Ops.

Reg. nrs.

Ops.

Res.

Reg. nrs.

Reg. nrs.

Res. Res. Res.Ret. Ret.

UltraSPARC III (1999)K7 (FX) (1999)K8 (FX) (2003)

2.2 Types of rename buffers

PowerPC 603 (1993)PowerPC 604 (1995)PowerPC 620 (1996)

Power3 (1998)PA 8000 (1996)PA 8200 (1997)PA 8500 (1999)

ARRR

Rename reg. file

Ops.

Reg. nrs.

Ret.

Page 15: 8. Microarchitecture of Superscalars (6) Register renaming

3. Operation of register renaming (1)

The actual rename process depends on both the rename technique implemented and the underlying microarchitecture.

Rename technique: using rename registers and mapping tables

Assumptions:

Page 16: 8. Microarchitecture of Superscalars (6) Register renaming

Rename registers:

Provide buffer space to temporarily hold instruction results

Rename registerfile (RR)

V

During dispatching the Valid bit of the allocated rename register becomes invalidated (v 0)

When the instruction becomes finished the result of the instruction is transferred to the allocated rename buffer entry and

the Valid bit is set (V 1), to indicate that the corresponding value is available.

Page 17: 8. Microarchitecture of Superscalars (6) Register renaming

3. Operation of register renaming (1)

The actual rename process depends on both the rename technique implemented and the underlying microarchitecture.

Rename technique: using rename registers and mapping tables

Assumptions:

Page 18: 8. Microarchitecture of Superscalars (6) Register renaming

A new entry is created while an instruction is dispatched

• by setting the „Entry valid” bit and

• writing the index of the allocated rename buffer („RB index”) to the entry that corresponds to the destination register of the dispatched instruction.A valid mapping is updated by writing a new „RB index” into it when the architectural register

belonging to that entry is renamed again.

An entry is invalidated when the instruction that actually belongs to that entry is retired.

In this way the mapping table continuously holds the latest allocations.

Mapping table:

It includes an entry to each architectural register.

Each entry has an „Entry valid” bit that indicates whether or not the corresponding architectural register is renamed and

in case of a renaming it holds the index of the associated rename buffer

(RB index).

Entryvalid

RBindex

Mappingtable

Look-upfor r7

6

7

8

0

1

1

12

14

"12"(RB index=12)

0

n-1

Page 19: 8. Microarchitecture of Superscalars (6) Register renaming

3. Operation of register renaming (1)

The actual rename process depends on both the rename technique implemented and the underlying microarchitecture.

Rename technique: using rename registers and mapping tables

Underlying microarchitechture:• in order dispatching• dynamic instruction issue• split FX and FP register files• operand fetch policy

• both alteratives are discussed

Assumptions:

Page 20: 8. Microarchitecture of Superscalars (6) Register renaming

3. Operation of register renaming (2)

Considered part of the microarchitecture for both dispatch bound and issue bound operand fetching :

• it executes only FX-instructions,• consists of an architectural register file (AR) and

a single execution unit (EU).

Page 21: 8. Microarchitecture of Superscalars (6) Register renaming

Mappingtable

Architectural registerfile (AR)

Rs1'

Rs2'Updatearch. rf.

Op1

Op2

Rd'

OC

Rd, Rs1, Rs2

Decoded instructions

Update RR

Update RS

Result, Rd'

OC, Rd', Op1, Op2

Rename registerfile (RR)

OC Rd' Op1/Rs1' V1 Op2/Rs2' V2

EU

Check valid bits

Rs1, Rs2

V

Bypassing

Op1/Rs1'

Op2/Rs2'

Dispatch

IssueReservation station

(RS)

3. Operation of register renaming (3)

Figure 3.1: An FX-core assuming buffered issue and dispatch bound operand fetching

Renamingdestination andsurce registers

Fetching op.s if validelse tags

When inst. retiredupdating the AR

After instr. executed,updating RS, RR

Issuing instr.when op.s ready

Page 22: 8. Microarchitecture of Superscalars (6) Register renaming

Mappingtable

Rename registerfile (RR)

Architectural registerfile (AR)

EU

Result, Rd'

Update RR

Rs1', Rs2'

Checking for availabilityof (Rs1'), (Rs2')

Op1

Op2

OC, Rd'

Decoded instructions

OC Rd, Rs1, Rs2

OC Rd’ Rs1' Rs2'

V

Rd' Rs2'

Rs1'

Reservationstation (RS)

Bypassing

Dispatch

Issue

3. Operation of register renaming (4)

Figure 3.2: An FX-core assuming buffered issue and issue bound operand fetching

Renaming destination and source registers

Dispatching instructionsinto the RS

Issuing inst. when operands valid,fetching op.s

Executing instr.updating RR

when instr. finished

Updating ARwhen inst. retires

Page 23: 8. Microarchitecture of Superscalars (6) Register renaming

Processor type/year of volume shipment

Type of renamebuffer

Number of rename buffers

Dispatch rate

Width ofthe issuewindow

Total number of rename buffers

Reorder width

FX FP (wdw) (nr) (nROB)

RISC processors

PowerPC 603 (1993) ren. reg. file na. 4 3 3 na. 5

PowerPC 604 (1995) ren. reg. file 12 8 4 12 20 16

PowerPC 620 (1996) ren. reg. file 8 8 4 15 16 16

POWER3 (1998) ren. reg. file 16 24 4 23 40 32

POWER4 (2001) merged 80 72 5 78 152 20*5

POWER5 (2004) merged 120 120 5 82 240 20*5

R10000 (1996) merged 32 32 4 48 64 32

R12000 (1998) merged 32 32 4 48 64 48

Alpha 21264 (1998) merged 48 41 4 35 89 80

PA 8000 (1986) ren. reg. file 56 56 4 56 112 56

PA 8200 (1987) ren. reg. file 56 56 4 56 112 56

PA 8500 (1989) ren. reg. file 56 56 4 56 112 56

PM1 (1996) merged 38 24 4 36 62 62

4. Design parameters of register renaming (1)

Source: Sima, D. „Register Renaming Techniques”, Computer Engineering Handbook, CRC PRESS 2006

Page 24: 8. Microarchitecture of Superscalars (6) Register renaming

Processor type/year of volume shipment

Type of renamebuffer

Number of rename buffers

Dispatch rate

Width ofthe issuewindow

Total number of rename buffers

Reorder width

FX FP (wdw) (nr) (nROB)

CISC (x 86) processors

Pentium Pro (1995) in the ROB 40 32 20 40 40

Pentium II (1997) in the ROB 40 32 20 40 40

Pentium III (1999) in the ROB 40 32 20 40 40

Pentium 4 (2000) (Willamette) merged 128 32 n.a. 128 126

Pentium 4 (2002) Northwood merged 128 3 n.a. 256? 2*126?

Pentium 4 (2004) Prescott merged 256 3 n.a. 512? 4*128?

Pentium M (2003) in the ROB 40 3 24 40 40

Core (2006) in the ROB 96 4 32 96 96

K5 (1995) in the ROB 16 42 11(?) 16 16

K6 (1996) in the ROB 24 32 24 24 24

K7 (1999) in the ROB/

merged72n.a.

32 54 88 24*3

K8 (2003)in the ROB/

merged72 120 32 60 192 24*3

4. Design parameters of register renaming (2)

Source: Sima, D. „Register Renaming Techniques”, Computer Engineering Handbook, CRC PRESS 2006

Page 25: 8. Microarchitecture of Superscalars (6) Register renaming

5. Implementation of renaming in superscalars

5.1 The chronology of introducing register renaming

Figure 5.1: Chronology of introducing register renaming

MC 88000

Gmicro

M

SPARC

PowerPC

PA

R

Nx/K

80x86

POWER

ES

MC 68000

Motorola

CYRIX

Sun/Hal

MIPS

AMD

Intel

IBM

HP

TRON

Compaq

PowerPCAlliance

Alpha

RISC processors

IBM

Motorola

CISC processors

The Nx586 has scalar issue for CISC instructions but a 3-way superscalar core for converted RISC instructions. **

- Partial renaming

- Full renaming

PPC designates PowerPC.*

***The dispacth rate of the POWER2 and P2SC is 6 along the sequential path while only 4 immediately after a branch.

Gmicro/500 (2)

Alpha 21064 (2) Alpha 21164 (4)

SuperSPARC (3)

PA7100 (2)

Pentium (2)

MC 68060 (3)

R 8000 (4)

POWER1 (4)(RS/6000)

12

ES/9000 (2)28

POWER2(6/4)***13

PentiumPro (3)24

Alpha 21264(4)7

PA8000 (4)9

PM1 (4) (SPARC64)

23

K5 (4) 32Nx586 (1/3)31**

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999

Pentium III (3)

PA8200(4)

UltraSPARC-2 (4)

K6 (3)

MII (2)

POWER3 (4)

PA 8500 (4)

R 12000 (4)

K7 (3)

UltraSPARC-3 (4)

MC88110 (2)

UltraSPARC (4)

PPC 601 (3)15* PPC 604 (4)

* 17

Pentium/MMX (2)

Pentium II (3)

PPC 620 (4)19*

PPC 603 (3)16*

R 10000 (4)21

PPC 602 (2)* 18

PA7200 (2)

M1 (2)29

14P2SC (6/4)

***

10 11

20

22

2526

30

33 34

8

Pentium 4 (3)27

2000

Source: Sima, D. „Register Renaming Techniques”, Computer Engineering Handbook, CRC PRESS 2006

Page 26: 8. Microarchitecture of Superscalars (6) Register renaming

5.2 The basic implementation schemes of register renaming

Merged arch. and rename register file

Holding renamed values in the ROB

Types of rename buffers

Future fileRename reg. file

Dispatchbound

Issuebound

Dispatchbound

Issuebound

Dispatchbound

Issuebound

Dispatchbound

Issuebound

Typ

es o

f re

n.bu

ffer

sO

p. f

et. p

oli.

Pro

posa

lsE

xam

ples

Keller (75)

Smith, Pleszkun, (85)

Sohi,Vajapeyam (87)Johnson (87)

PM1 (95)(SPARC 64)

ES/9000 (92)POWER1 (90)POWER2 (93)

Nx586 (94)R10000 (96)

P2SC (96)

R12000 (99)Pentium 4 (00)

POWER4 (01)POWER5 (04)

K7 (FP) (99)K8 (FP) (03)

PowerPC 603 (93)PowerPC 604 (95)PowerPC 620 (96)

POWER3 (98)PA 8000 (96)PA 8200 (97)PA 8500 (99)

Pentium Pro (95)Pentium II (97)Pentium III (99)Pentium M (03)

Core (06)

K7 (FX) (99)K8 (FX) (03)

Am29000 (95)K5 (95)

Lightning* (91)K6* (97)

UltraSPARC III (99)

Page 27: 8. Microarchitecture of Superscalars (6) Register renaming

6. Examples (1)

Rename register file

Source: Song, P. „IBM’s Power3 to Replace P2SC”, Microprocessor Report, Nov. 17, 1997

Figure 6.1: The microarchitecture of the POWER3

Page 28: 8. Microarchitecture of Superscalars (6) Register renaming

6. Examples (2)

Future file

Source: Horel, T. „UltraSPARC-III”, IEEE MICRO, May-June 99, pp. 73-95

WARF: Working and Architectural Register File (Future file)

Figure 6.2: The microarchitecture of the UltraSPARC-III

Page 29: 8. Microarchitecture of Superscalars (6) Register renaming

6. Examples (3)

Merged architectural and rename reg.

Figure 6.3: The microarchitecture of the Alpha 21264

Source: Kessler, R.E. et al. .„The Alpha 21264 Microprocessor Architecture”, h18002.www1.hp.com/alphaserver

Page 30: 8. Microarchitecture of Superscalars (6) Register renaming

6. Examples (4)

Holding renamed values in the ROB

Figure 6.4: The microarchitecture of the Core processor

Source: Kanter, D., „Intel’s next Generation Microarchitecture Unveiled”, Real World Tech., 2006 March 9.