Upload
veerusite
View
218
Download
0
Embed Size (px)
Citation preview
7/29/2019 CO_unit3
1/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 1
COMPUTER ORGANIZATION
UNIT -3
Instruction pipelining Pipelining Hazards, Dealing with Branches, 8086 Processor Family, Reduced Instruction Set Computers : Instruction Execution Characteristics, Large Register Files RISC Architecture
7/29/2019 CO_unit3
2/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 2
Instruction pipelining:
Pipelining is a technique ofdecomposing a sequential process into sub-operations, with each subprocess being executed in a special dedicated segment that operates competently with all other
segments.
Pipelining is an implementation technique whereby multiple instructions are overlapped in execution.
An instruction pipeline operates on a stream of instruction by overlapping the fetch, decode & execute
phases of the instruction cycle.
The pipeline organization of a CPU is similar to an assembly line : the work to be done in aninstruction is broken into smaller steps (pieces), each of which takes a fraction of the time needed to
complete the entire instruction. Each of these steps is a pipe stage (or a pipe segment).
Pipe stages are connected to form a pipe:
The time required for moving an instruction from one stage to the next: a machine cycle (often this
is one clock cycle).
The execution of one instruction takes several machine cycles as it passes through the pipeline.
Two stage pipeline: FI: fetch instruction
EI: execute instruction
We consider that each instruction takes execution time Tex.
Execution time for the 7 instructions, with pipelining:
(Tex/2)*8= 4*Tex
7/29/2019 CO_unit3
3/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 3
Working of Instructional Pipelining:
An instructional pipeline reads consecutive instructions from memory while previous instructionsare being executed in other segments.
This causes the instruction fetch & execute phases to overlap and perform simultaneousoperations.
When branch instruction is encountered, pipeline must be emptied and all the instructions thathave been read from memory after the branch instruction must be discarded.
A greater number of stages always provides better performanceSix stage pipeline:
FI: fetch instruction FO: fetch operand DI: decode instruction
EI: execute instruction CO: calculate operand address WO: write operand
Branch in a Pipeline:
7/29/2019 CO_unit3
4/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 4
Pipeline performance:
Pipeline performance measure is in terms of time taken in executing a program.
If a non-pipe line unit that performs a given task and takes a time equal to tn to complete.
The speed up of a pipe line processing over an equivalent non-pipe line processing is defined by the
ratio:
Where K= No. of segments in pipe line.Tp = Time taken by each segment to process a sub-operation.
n= No. of tasks.
Problems with Pipeline:
A greater number of stages increases the overhead in moving information between stagesand synchronization between stages.
With the number of stages the complexity of the CPU grows. With is difficult to keep a large pipeline at maximum rate because of pipeline hazards.
Pipelining Hazards:Pipeline hazards are situations that prevent the next instruction in the instruction stream from
executing during its designated clock cycle. The instruction is said to be stalled. When an instructionis stalled, all instructions later in the pipeline than the stalled instruction are also stalledInstructions earlier than the stalled one can continue. No new instructions are fetched during the stall.
Types of hazards:1. Structural hazards
2. Data hazards
3. Control hazards
Structural Hazards:Structural hazards occur when a certain resource (memory, functional unit) is requested by
more than one instruction at the same time.
Example: Instruction ADD R4,X fetches in the FO stage operand X from memory.The memory doesnt accept another access during that cycle.
Penalty: 1 cycleSolutions: Certain resources are duplicated in order to avoid structural hazards.
Functional units (ALU, FP unit) can be pipelined themselves in order to support several instructions at a
time.A classical way to avoid hazards at memory access is by providing separate data and instruction
caches.
7/29/2019 CO_unit3
5/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 5
Data Hazards:
This conflict arises when an instruction depends on the result of a pervious instruction, but this result isnot yet variable
We have two instructions, I1 and I2. In a pipeline the execution ofI2 can start before I1
has terminated. If in a certain stage of the pipeline, I2 needs the result produced by I1, but this resulthas not yet been generated, we have a data hazard.
Example:
Before executing its FO stage, the ADD instruction is stalled until the MUL instruction has written theresult into R2.
Penalty: 2 cycles
Solutions:
The problem of data dependency can be solved through the followings.
1.Operand forwarding: The hardware avoid the conflict by routing the data through special pathsbetween pipe line segments.
2.Through Compiler Programs: Insert the No. operation instruction in the program.
After the EI stage of the MUL instruction the result is available by forwarding. The penalty is reduced toone cycle.
Control Hazards: Control hazards are produced by as consequence of branch instructions. Unconditional branch: BR TARGET
TARGET _______
After the FO stage of the branch instruction the address of the target is known and it can be
fetched.
7/29/2019 CO_unit3
6/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 6
Conditional branch:
Handling branch difficulties: The methods used are
(i) Prefetch target instructions
(ii) Use of branch target buffer(iii) Use of loop buffer.(iv) branch prediction
(v) Delayed branch.
7/29/2019 CO_unit3
7/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 7
Dealing with Branches:A number of techniques can be used to minimize the impact of the branch instruction i.e the branchpenalty such are
Multiple Streams Prefetch Branch Target Loop buffer Branch prediction Delayed branching
Multiple Streams: Replicate the initial portions of the pipeline and fetch both possible next instructions Have two pipelines Prefetch each branch into a separate pipeline Use appropriate pipeline Increases chance of memory contention Must support multiple streams for each instruction in the pipeline
Prefetch Branch Target: Target of branch is prefetched in addition to instructions following branch Keep target until branch is executed Used by IBM 360/91
Loop buffer: Loop Buffer is small, very high speed memory maintained by the instruction fetch stage of
pipeline and containing n most recently fetched instructions in sequence. Look ahead, look behind buffer. If the branch is to be taken ,the hardware first checks whether branch target is within buffer, If
so next instruction is fetched from the buffer.
Benefits of Loop Buffer: With use of prefetching, Instruction fetched in sequence without the usual memory access
time. If the Branch occurs to target just a few locations ahead of the address of branch
instruction,the target is already in buffer.
Very good for small loops or jumps. If buffer is big enough, entire loop can be held in it -- reducing branch penalty c.f. cache Used by CRAY-1
7/29/2019 CO_unit3
8/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 8
Branch Prediction:
Make a good guess as to which instruction will be executed next and start that one down thepipeline.
If the guess turns out to be right, no loss of performance in the pipeline If the guess was wrong, empty the pipeline and restart with the correct instruction -- suffering
the full branch penalty. Static guesses: make the guess without considering the runtime history of the program
Predict never taken Predict always taken Predict based on the opcode
Dynamic guesses: track the history of conditional branches in the program Taken / not taken switch History table
Predict never taken:
Assume that jump will not happen
Always fetch next instruction
68020 & VAX 11/780VAX will not prefetch after branch if a page fault would result (O/S v CPU design)
Predict always taken:Assume that jump will happenAlways fetch target instruction
Predict by Opcode:
Some instructions are more likely to result in a jump than othersCan get up to 75% success
Taken/Not taken switch:
Based on previous historyGood for loops
Branch Prediction Flowchart:
7/29/2019 CO_unit3
9/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 9
Branch Prediction State Diagram
Dealing With Branches:
7/29/2019 CO_unit3
10/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 10
Delayed branch: Minimize the branch penalty by finding valid instructions to execute in the pipeline while the
branch address is being resolved.
Compiler is tasked with reordering the instruction sequence to find enough independentinstructions (wrt to the conditional branch) to feed into the pipeline after the branch that thebranch penalty is reduced to zero.
Consider the sequence:Instruction xInstruction x+1Instruction x+2
Conditional branch
Do not take jump until you have to Rearrange instructions Implemented on many RISC architectures
7/29/2019 CO_unit3
11/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 11
8086 Processor Family:
8086 Register Organization:
Intel 8086 was the first 16-bit microprocessor introduced by Intel in 1978.
The register organization includes the following types of Registers.
1. General Purpose:
There are 8 32-bit general purpose registers Used for all types of x86 instructions Holds the operands for address calculations. String instructions use the contents of ECX,ESI and EDI registers In 64-bit there are 16 64-bit general purpose registers.
2.Segment:
The 16-bit segment register selectors which segment selectors, which index into segment tables The Code Segment(CS):Register references the segment containing the instruction being
executed.
The Stack Segment(SS):Register references contains a user-visible stack. The Remaining segment registers(DS,ES,FS,GS) enable the user to separate the data segments
at a time.
3.FLAGS: The 32-bit EFLAGS register contain the conditional codes and various mode bits.
4.Instruction Pointer: Contain the address of the current instruction.
5.Numaric:
Each register holds an extended precision 80-bit floating point numbers. There are 8 registers that function as a stack, with push and pop operations available in the
instruction set.
6.Control:
The 16-bit control registers contains bit that control the operations of floating point unit. It include rounding, exception, precision controls
7.Staus:
16-bit status register contains bits that reflects the current state of floating point unit. It include 3-bit pointer to the top of the stack Conditional codes are reported
8.Tag word: 16-bit register contains a 2-bit tag for each floating point numeric register which indicates the
nature of the contents of corresponding register.
The four possible values are valid, zero, special and empty Enable program to check the contents of the numeric register without performing complex
decoding of actual data in the register.
7/29/2019 CO_unit3
12/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 12
EFLAGS Registers:
There is a special register in the processor called EFLAGS. This register is 32 bits wide and most of those
bits are used to track a variety of conditions in the processor. It includes the six condition codes (likecarry, parity, auxiliary, zero, sign, overflow) which reports results of an integer operations.
7/29/2019 CO_unit3
13/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 13
Trap Flag(TF): when set, causes an interrupt after the execution of each instruction. Used for
debugging.
Interrupt Enable Flag (IF): when set ,the processor will recognize the external interrupts.
Direction Flag (DF): It is used in string processing.
I/O privilege flags(IOPL):Used in protected mode to generate four levels of securityResume Flag(RF): It enables you to turn off certain exceptions while debugging code.
Identification Flag (IF):If this bit can be set and cleared, then the processor supports the ProcessorID
instruction. It provide information about vendor, family and model.
Nested Task Flag: Indicate current task is nested within another task in protected mode.
Virtual Mode: Allow the programmer to enable or disable virtual mode.
Virtual Interrupt Flag(VIF) & Virtual Interrupt Pending(VIP) are used in multi tasking
environment.
Control Registers:
MMX Registers:
MMX uses several 64 bit data types
Use 3 bit register address fields
8 registers No MMX specific registers
Aliasing to lower 64 bits of existing floating point registers
http://www.c-jump.com/CIS77/asm_images/io_privilege_levels.pnghttp://www.c-jump.com/CIS77/asm_images/io_privilege_levels.png7/29/2019 CO_unit3
14/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 14
Interrupt Processing:
Interrupt processing with in a processor is facility provided to support the operating system. It allow the application programmer to be suspended, in order that a variety of interrupt
conditions can be serviced and latter resumed.
Interrupts & Exceptions:
Interrupt is generated by a signal from hardware, and it may occur at random times during the
execution of a program.
Exception is generated from software an it is provoked by the execution of an instruction.
There are two sources of interrupts and exceptions.
Interrupts:Maskable:Received on the processors INTR pin.The processor does not recognize a maskable
interrupt unless the Interrupt Enable Flag(IF) is set.Nonmaskable: Received on the processors NMI pin, Reorganization of such interrupts can not
be prevented.
Exceptions:
Processor detected: Results when processor encounters an error while attempting to execute
an instruction.
Programmed: These are instructions that generate an exception.
Interrupt vector table:Each interrupt type assigned a numberIndex to vector table256 * 32 bit interrupt vectors
5 priority classes :
Class1: Traps Previous instructions
Class2: External Interrupts
Class3: Faults from fetching next instruction
Class4: Faults from decoding the next instruction
Class5: Faults on executing an instruction
7/29/2019 CO_unit3
15/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 15
RISC (Reduced Instruction Set Computers):
Major Advances in Computers:
The family concept IBM System/360 1964 DEC PDP-8 Separates architecture from implementation
Microporgrammed control unit Idea by Wilkes 1951 Produced by IBM S/360 1964
Cache memory IBM S/360 model 85 1969
Solid State RAM (See memory notes)
Microprocessors Intel 4004 1971
Pipelining Introduces parallelism into fetch execute cycle
Multiple processors Reduced Instruction Set Computer
Key features
Large number of general purpose registers or use of compiler technology to optimize register use Limited and simple instruction set Emphasis on optimising the instruction pipeline
Instruction Execution Characteristics:
Driving force for CISC:
Software costs far exceed hardware costs Increasingly complex high level languages Semantic gap Leads to:
Large instruction sets More addressing modes Hardware implementations of HLL statements
e.g. CASE (switch) on VAXIntention of CISC: Ease compiler writing Improve execution efficiency
Complex operations in microcode Support more complex HLLs
Execution Characteristics:
Operations performed Operands used Execution sequencing
7/29/2019 CO_unit3
16/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 16
Studies have been done based on programs written in HLLs Dynamic studies are measured during the execution of the program
Operations:
Assignments Movement of data
Conditional statements (IF, LOOP) Sequence control
Procedure call-return is very time consuming Some HLL instruction lead to many machine code operations
Operands:
Mainly local scalar variables Optimisation should concentrate on accessing local variables
Procedure Calls:
Very time consuming Depends on number of parameters passed Depends on level of nesting Most programs do not do a lot of calls followed by lots of returns Most variables are local (c.f. locality of reference)
Implications:
Best support is given by optimising most used and most time consuming features Large number of registers
Operand referencing Careful design of pipelines
Branch prediction etc. Simplified (reduced) instruction set
Large Register File:
Software solution Require compiler to allocate registers Allocate based on most used variables in a given time Requires sophisticated program analysis
Hardware solution Have more registers Thus more variables will be in registers
Registers for Local Variables:
Store local scalar variables in registers Reduces memory access Every procedure (function) call changes locality Parameters must be passed Results must be returned Variables from calling programs must be restored
Register Windows:
Only few parameters Limited range of depth of call Use multiple small sets of registers Calls switch to a different set of registers
7/29/2019 CO_unit3
17/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 17
Returns switch back to a previously used set of registers Three areas within a register set
Parameter registers Local registers Temporary registers Temporary registers from one set overlap parameter registers from the next This allows parameter passing without moving data
Circular Buffer Organization of overlapped windows:
Operation of Circular Buffer :
When a call is made, a current window pointer is moved to show the currently active registerwindow.
If all windows are in use, an interrupt is generated and the oldest window (the one furthest backin the call nesting) is saved to memory.
A saved window pointer indicates where the next saved windows should restore to.Global Variables:
Allocated by the compiler to memory
7/29/2019 CO_unit3
18/22
7/29/2019 CO_unit3
19/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 19
Why CISC:
Compiler simplification? Disputed Complex machine instructions harder to exploit Optimization more difficult
Smaller programs? Program takes up less memory but Memory is now cheap May not occupy less bits, just look shorter in symbolic form
More instructions require longer op-codes Register references require fewer bits
Faster programs Bias towards use of simpler instructions More complex control unit Microprogram control store larger thus simple instructions take longer to execute It is far from clear that CISC is the appropriate solution
RISC Characteristics:
One instruction per cycle Register to register operations Few, simple addressing modes Few, simple instruction formats
Hardwired design (no microcode) Fixed instruction format More compile time/effort
RISC VS CISC
7/29/2019 CO_unit3
20/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 20
RISC Pipelining:
Most instructions are register to register Two phases of execution
I: Instruction fetch E: Execute
ALU operation with register input and output For load and store
I: Instruction fetch E: Execute
Calculate memory address D: Memory
Register to memory or memory to register operationEffects of Pipelining:
Optimization of Pipelining: Delayed branch
Does not take effect until after execution of following instruction This following instruction is the delay slot
Delayed Load Register to be target is locked by processor Continue execution of instruction stream until register required Idle until load complete Re-arranging instructions can allow useful work whilst loading
Loop Unrolling
7/29/2019 CO_unit3
21/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 21
Replicate body of loop a number of times Iterate loop fewer times Reduces loop overhead Increases instruction parallelism Improved register, data cache or TLB locality
Example:
do i=2, n-1
a[i] = a[i] + a[i-1] * a[i+l]
end do
Becomes
do i=2, n-2, 2
a[i] = a[i] + a[i-1] * a[i+i]
a[i+l] = a[i+l] + a[i] * a[i+2]
end do
if (mod(n-2,2) = i) then
a[n-1] = a[n-1] + a[n-2] * a[n]
end if
Use of Delayed Branch:
7/29/2019 CO_unit3
22/22
[ II - IT- II semester Computer Organization -- Unit-3 ]
Assignment Questions
1.What is a pipeline register. What is the use of it? Explain in detail?2. (a) Differentiate RISC and CISC computers.
(b) Explain RISC pipelining.
3.Explain vector processing?4. (a) What is pipeline? Explain.
(b) Explain arithmetic pipeline.