Click here to load reader

Instruction pipelining

  • View

  • Download

Embed Size (px)



Text of Instruction pipelining


2. What is pipelining? The greater performance of the cpu is achieved byinstruction pipelining. 8086 microprocesor has two blocksBIU(BUS INTERFACE UNIT)EU(EXECUTION UNIT) The BIU performs all bus operations such as instructionfetching,reading and writing operands for memory andcalculating the addresses of the memory operands. Theinstruction bytes are transferred to the instruction queue. EU executes instructions from the instruction systembyte queue. Both units operate asynchronously to give the 8086 anoverlapping instruction fetch and execution mechanismwhich is called as Pipelining. 3. INSTRUCTION PIPELINING First stage fetches the instruction and buffers it. When the second stage is free, the first stagepasses it the buffered instruction. While the second stage is executing theinstruction,the first stage takes advantages ofany unused memory cycles to fetch and buffer thenext instruction. This is called instruction prefetch or fetchoverlap. 4. Inefficiency in two stageinstruction pipelining There are two reasons The execution time will generally be longer thanthe fetch time.Thus the fetch stage may have towait for some time before it can empty the buffer. When conditional branch occurs,then the addressof next instruction to be fetched becomeunknown.Then the execution stage have to waitwhile the next instruction is fetched. 5. Two stage instruction pipeliningSimplified viewwait new address waitFetchExecuteInstruction InstructionResultdiscard EXPANDED VIEW 6. Decomposition of instructionprocessingTo gain further speedup,the pipeline have morestages(6 stages) Fetch instruction(FI) Decode instruction(DI) Calculate operands (i.e. EAs)(CO) Fetch operands(FO) Execute instructions(EI) Write operand(WO) 7. SIX STAGE OF INSTRUCTIONPIPELINING Fetch Instruction(FI)Read the next expected instruction into a buffer Decode Instruction(DI) Determine the opcode and the operand specifiers. Calculate Operands(CO)Calculate the effective address of each source operand. Fetch Operands(FO) Fetch each operand from memory. Operands in registersneed not be fetched. Execute Instruction(EI)Perform the indicated operation and store the result Write Operand(WO) Store the result in memory. 8. Timing diagram for instruction pipelineoperation 9. High efficiency of instructionpipeliningAssume all the below in diagram All stages will be of equal duration. Each instruction goes through all the six stages ofthe pipeline. All the stages can be performed parallel. No memory conflicts. All the accesses occur simultaneously. In the previous diagram the instruction pipeliningworks very efficiently and give high performance 10. Limits to performance enhancementThe factors affecting the performance are1. If six stages are not of equal duration,then there will be some waiting time at various stages.2. Conditional branch instruction which can invalidate several instruction fetches.3. Interrupt which is unpredictable event.4. Register and memory conflicts.5. CO stage may depend on the contents of a register that could be altered by a previous instruction that is still in pipeline. 11. Effect of conditional branch oninstruction pipeline operation 12. Conditional branch instructions Assume that the instruction 3 is a conditionalbranch to instruction 15. Until the instruction is executed there is no way ofknowing which instruction will come next The pipeline will simply loads the next instructionin the sequence and execute. Branch is not determined until the end of time unit7. During time unit 8,instruction 15 enters into thepipeline. No instruction complete during time units 9through 12. This is the performance penalty incurred because 13. Simple pattern for high performance Two factors that frustrate this simple pattern forhigh performance are1. At each stage of the pipeline,there is someoverhead involved in moving data from buffer tobuffer and in performing various preparation anddelivery functions.This overhead will lengthenthe execution time of a single instruction.This issignificant when sequential instructions arelogically dependent,either through heavy use ofbranching or through memory accessdependencies2. The amount of control logic required to handlememory and register dependencies and tooptimize the use of the pipeline increases 14. Six-stage CPU instruction pipeline 15. Dealing with branches A variety of approaches have been taken for dealingwith conditional branches. Multiple streams Prefetch branch target. Loop buffer Branch prediction Delayed branch 16. Multiple streams In simple pipeline,it must choose one of the twoinstructions to fetch next and may make wrongchoice. In multiple streams allow the pipeline to fetch bothinstructions making use of two streams. Problems with this approach With multiple pipelines there are contention delaysfor the access to the registers and to memory. Additional branch instructions may enter thepipeline(either stream)before the original branchdecision is resolved.Each such instructions needsan additional branch.Examples: IBM 370/168 AND IBM 3033. 17. Prefetch Branch Target When a conditional branched is recognized,the targetof the branch is prefetched,in addition to the instructionfollowing the branch. This target is then saved until the branch instruction isexecuted. If the branch is taken,the target has already beenprefetched. The IBM 360/91 uses this approach. 18. Loop buffer A loop buffer is a small,very high-speed memorymaintained in instruction fetch stage. It contains n most recently fetched instructions insequence. If a branch is to be taken,the hardware first checkswhether the branch target is within the buffer. If so,the next instruction is fetched from the buffer. 19. Benefits of loop buffer Instructions fetched in sequence will be availablewithout the usual memory access time If the branch occurs to the target just a few locationsahead of the address of the branch instruction, thetarget will already be in the buffer. This is useful forthe rather common occurrence of IF-THEN and IF-THEN-ELSE sequences. This is well suited for loops or iterations, hencenamed loop buffer.If the loop buffer is large enoughto contain all the instructions in a loop,then thoseinstructions need to be fetched from memory onlyonce,for the first iteration. For subsequent iterations,all the needed instructionsare already in the buffer. 20. Cont.., Loop buffer is similar to cache. Least significant 8 bits are used to index the buffer and remaining MSB are checked to determine the branch target.Branch address Loop buffer 8 (256 bytes) Instruction to be decoded in case of hit Most significant address bits 21. Branch predictionVarious techniques used to predict whether a branch will be taken. They are Predict Never Taken Predict Always TakenSTATIC Predict by Opcode Taken/Not Taken Switch Branch History TableDYNAMIC 22. Static branch strategies STATIC(1,2,3)-They do not depend on theexecution history Predict Never TakenAlways assume that the branch will not betaken and continue to fetch instruction in sequence. Predict Always Taken Always assume that the branch will be takenand always fetch from target. Predict by OpcodeDecision based on the opcode of thebranch instruction. The processor assumes that thebranch will be taken for certain branch opcodes andnot for others. 23. Dynamic branch strategies DYNAMIC(4,5)-They depend on the executionhistory. They attempt to improve the accuracy of predictionby recording the history of conditional branchinstructions in a program. For example,one or more bits can be associatedwith conditional branch instruction that reflect therecent history. These bits are referred as taken/not taken switch. These history bits are stored in temporary high-speed memory. Then associate the bits with any conditional branchinstruction and make decision. Another possibility is to maintain a small table forrecent history with one or more bits in each entry. 24. Cont.., With only one bit of history, an error prediction will occurtwice for each use of the loop:once on entering the loopand once on exiting. The decision process can be represented by a finite-state machine with four stages. 25. Cont.., If the last two branches of the given instructionhave taken same path,the prediction is to makethe same path again. If the prediction is wrong it remains same for nexttime also But when again the prediction went wrong, theopposite path will be selected. Greater efficiency could be achieved if theinstruction fetch could be initiated as soon as thebranch decision is made. For this purpose, information must be saved, thatis known as branch target buffer,or a branchhistory table. 26. Branch history table It is a small cache memory associated withinstruction fetch stage. Each entry in table consist of elements: Address of branch instruction Some number of history bits. Information about the target instruction. The third field may contain address or targetinstruction itself. 27. Dealing with branches 28. Branching strategies If branch is taken,some logic in the processordetects that and instruct to fetch next instructionfrom target address. Each prefetch triggers a lookup in the branchhistory table. If no match is found,the next sequential instructionaddress is used for fetch. If match occurs, a prediction is made based on thestate of the instruction. When the branch instruction is executed,theexecute stage signals the branch history table logicwith result. 29. Delayed branch It is possible to improve pipeline performance byautomatically rearranging instructions within theprogram. So that branch instructions occur later thanactually desired. 30. Intel 80486 Pipelining Fetch From cache or external memory Put in one of two 16-byte prefetch buffers Fill buffer with new data as soon as old data consumed Average 5 instructions fetched per load Independent of other stages to keep buffers full Decode stage 1 Opcode & address-mo

Search related