19
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3 S4 S1 S2 S5 1 2 3 4 9 8 7 6 5 S1 S2 S5 S3 S4 1 2 3 4 8 7 6 5 1 2 3 4 7 6 5 1 2 3 4 6 5 1 2 3 4 5 Time

5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Embed Size (px)

Citation preview

Page 1: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

5-Stage Pipelining

Fetch Instruction

(FI)

FetchOperand

(FO)

Decode Instruction

(DI)

WriteOperand

(WO)

Execution Instruction

(EI)

S3 S4S1 S2 S5

1 2 3 4 98765S1

S2

S5

S3

S4

1 2 3 4 8765

1 2 3 4 765

1 2 3 4 65

1 2 3 4 5

Time

Page 2: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Five Stage Instruction Pipeline

Fetch instruction Decode

instruction Fetch operands Execute

instructions Write result

Page 3: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Two major difficulties

Data Dependency Branch Difficulties

Solutions: Prefetch target instruction Delayed Branch Branch target buffer (BTB) Branch Prediction

Page 4: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Data Dependency

Use Delay Load to solve:

Example:load R1 R1M[Addr1]

load R2 R2M[Addr2] ADD R3R1+R2

Store M[addr3]R3

Page 5: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Delay Load

Page 6: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Delay Load

Page 7: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Example

Five instructions need to be carried out:

Load from memory to R1Increment R2Add R3 to R4Subtract R5 from R6Branch to address X

Page 8: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Delay Branch

Page 9: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Rearrange the Instruction

Page 10: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Delayed Branch

In this procedure, the compiler detects the branch instruction and rearrange the machine language code sequence by inserting useful instructions that keep the pipeline operating without interrupts

Page 11: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Prefetch target instruction

Prefetch the target instruction in addition to the instruction following the branch

If the branch condition is successful, the pipeline continues from the branch target instruction

Page 12: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Branch target buffer (BTB)

BTB is an associative memory Each entry in the BTB consists of

the address of a previously executed branch instruction and the target instruction for the branch

Page 13: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Loop Buffer

Very fast memory Maintained by fetch stage of pipeline Check buffer before fetching from memory Very good for small loops or jumps The loop buffer is similar (in principle) to a

cache dedicated to instructions. The differences are that the loop buffer only retains instructions in sequence, and is much smaller in size (and lower in cost).

Page 14: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Branch Prediction

A pipeline with branch prediction uses some additional logic to guess the outcome of a conditional branch instruction before it is executed

Page 15: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Branch Prediction Various techniques can be used to predict

whether a branch will be taken or not:

Prediction never taken Prediction always taken Prediction by opcode Branch history table

The first three approaches are static: they do not depend on the execution history up to the time of the conditional branch instruction. The last approach is dynamic: they depend on the execution history.

Page 16: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Floating Point Arithmetic Pipeline Pipeline arithmetic units are

usually found in very high speed computers

They are used to implement floating-point operations, multiplication of fixed-point numbers, and similar computations encountered in scientific problems

Page 17: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Floating Point Arithmetic Pipeline Example for floating-point addition

and subtraction Inputs are two normalized floating-

point binary numbers X = A x 2^a Y = B x 2^b

A and B are two fractions that represent the mantissas

a and b are the exponents

Try to design segments are used to perform the “add” operation

Page 18: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Floating Point Arithmetic Pipeline      Compare the exponents      Align the mantissas      Add or subtract the

mantissas      Normalize the result

Page 19: 5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2

Floating Point Arithmetic Pipeline X = 0.9504 x 103 and Y = 0.8200 x 102 The two exponents are subtracted in the first

segment to obtain 3-2=1 The larger exponent 3 is chosen as the exponent

of the result Segment 2 shifts the mantissa of Y to the right to

obtain Y = 0.0820 x 103 The mantissas are now aligned Segment 3 produces the sum Z = 1.0324 x 103 Segment 4 normalizes the result by shifting the

mantissa once to the right and incrementing the exponent by one to obtain Z = 0.10324 x 104