17
Two-issue Super Scalar CPU

Two-issue Super Scalar CPU

Embed Size (px)

DESCRIPTION

Two-issue Super Scalar CPU. CPU structure, what did we have to deal with: double clock generation double-port instruction cache double-port instruction fetch (bubble handling) decode stage (instr handling, scoreboard implemented) - PowerPoint PPT Presentation

Citation preview

Page 1: Two-issue Super Scalar CPU

Two-issue Super Scalar CPU

Page 2: Two-issue Super Scalar CPU

CPU structure, what did we have to deal with:

- double clock generation- double-port instruction cache- double-port instruction fetch (bubble handling)- decode stage (instr handling, scoreboard

implemented)- execute stage (doubled execution unit, forwarding,

branch resolving, write-back ports)- load-store stage (memory access handling, doubled

write-back signal)

Page 3: Two-issue Super Scalar CPU

Top level model

• Global 50MHz clock connected do DLL component which performs clock frequency doubling

• Doubled clock needed to implement 4-port Block RAM

performance counter

CPU

chipset

DLL

CLK

IO interface

CLK0

CLK2x

Page 4: Two-issue Super Scalar CPU

Instruction cache

• Block RAM extension to two-port implementation

• Cache miss and hit tests for two ports

• One memory port• FSM responsible for

memory access is switched between two requests from instruction fetch

first port second port

Block RAM

FSM Memory Access

Page 5: Two-issue Super Scalar CPU

Instruction fetch

• Fetching two instruction from cache

• bubble insertion for each instruction stream

• instructions passed to the output in order

two instruction cache ports

Instruction Fetch

two decode stage portsbranch request

bubble1 bubble2

Page 6: Two-issue Super Scalar CPU

Decode stage

• Decoding two instructions• Quad-port Block RAM

inferred• Taking advantage from

doubled clock – double write-back handling

• Scoreboard implemented – set of conditions for checking data dependencies

• Bubble generation• Instruction stream

prepared for load-store stage

two instruction fetch ports

two execute stage ports

Scoreboard

Block RAM

Write-back

Instruction decoding

Write-back

Previous Instr.

Page 7: Two-issue Super Scalar CPU

Scoreboard

• Simplification of full scoreboard unit• Introduced as a set of conditions implemented in decode

stage• Used for bubble insertion of both types (concurrent and

consecutive instructions) and separating memory access instructions

• Presented by abtract instruction table consisted of two lines

Nr Instruction Idx_d Idx_a Idx_b Executability

In practice corresponds to Outputs of instructions fetch

1

2

MUL

ST

0 12

21 -

1

0

Page 8: Two-issue Super Scalar CPU

And few examples:

Firstly, normal operation without any bubble insertion, two instructions are fully independent

Write-backWrite-back

two instruction fetch ports

two execute stage ports

Block RAM

Instruction decoding

Scoreboard

Previous Instr.

Page 9: Two-issue Super Scalar CPU

Bubble insertion caused by data dependencies between concurrent instructions

two instruction fetch ports

two execute stage ports

Block RAM

Instruction decoding

Write-backWrite-back

Scoreboard

Previous Instr.

Page 10: Two-issue Super Scalar CPU

Bubble insertion caused by data dependencies between load instruction and consecutive arbitrary instructions

two execute stage ports

Block RAM

Instruction decoding

Write-backWrite-back

Instr Instr $1,$0LD $0 Instr

Scoreboard

Previous Instr.

Page 11: Two-issue Super Scalar CPU

Bubble insertion introduced to split two memory-access instructions

two execute stage ports

Block RAM

Instruction decoding

Write-backWrite-back

LD STST Instr

Scoreboard

Previous Instr.

Page 12: Two-issue Super Scalar CPU

Execute stage

• Doubled ALU • Resolving of branch

priority• Forwarding from

both instruction streams

• Write-back generation

two decode stage ports

two load store stage ports

Data forwarding

ALU ALU

Register

branch request

Page 13: Two-issue Super Scalar CPU

Load-store stage

• It is ensured that only one memory access instruction is passed to load store unit

• Memory access process is switched to the right instruction

• write back signals are generated

write back signals

write back from execute

memory access

write back multiplexing

memory ports

Page 14: Two-issue Super Scalar CPU

In action

Page 15: Two-issue Super Scalar CPU

Performance (1) – blinking leds

• Additional parameters:• Number of simulated cycles

: 124988• Execution Frequency of

Memory Access Instructions compared with number of all instructions:- Super Sc : 0,29- SIMD : 0,24

• ALU Instructions :- Super Sc : 0,14- SIMD : 0,13

Instruction/cycle

SIMDSuper scalar SIMD

0,5

0,42

Page 16: Two-issue Super Scalar CPU

Performance (2) - apfel• Additional parameters:• Execution Frequency of Memory

Access Instructions:- for both : 0,2

• ALU Instructions :- both : 0,4

• Measurement Results of Instruction Execution Frequency are surprising, probably because of many memory access instructions executed at the beginning of program(the longer the simulation time is, the better results we should get)

Instruction/cycle

SIMDSuper scalar SIMD

0,56

0,45

Page 17: Two-issue Super Scalar CPU

Synthesis• last version seen working on XCV300 was 2-way SIMD

(MUCH faster than HaPra CPU!)• 4-way SIMD and Super Scalar versions are too big for

XCV300...• ...and for unknown reasons don't work in XCV800• probably severe timing issues - running on 25MHz instead

of 50MHs doesn't help• (but 4-way SIMD

should work anyway!)

• all we've got is fully working simulation