CO_unit3

Embed Size (px)

Citation preview

  • 7/29/2019 CO_unit3

    1/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 1

    COMPUTER ORGANIZATION

    UNIT -3

    Instruction pipelining Pipelining Hazards, Dealing with Branches, 8086 Processor Family, Reduced Instruction Set Computers : Instruction Execution Characteristics, Large Register Files RISC Architecture

  • 7/29/2019 CO_unit3

    2/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 2

    Instruction pipelining:

    Pipelining is a technique ofdecomposing a sequential process into sub-operations, with each subprocess being executed in a special dedicated segment that operates competently with all other

    segments.

    Pipelining is an implementation technique whereby multiple instructions are overlapped in execution.

    An instruction pipeline operates on a stream of instruction by overlapping the fetch, decode & execute

    phases of the instruction cycle.

    The pipeline organization of a CPU is similar to an assembly line : the work to be done in aninstruction is broken into smaller steps (pieces), each of which takes a fraction of the time needed to

    complete the entire instruction. Each of these steps is a pipe stage (or a pipe segment).

    Pipe stages are connected to form a pipe:

    The time required for moving an instruction from one stage to the next: a machine cycle (often this

    is one clock cycle).

    The execution of one instruction takes several machine cycles as it passes through the pipeline.

    Two stage pipeline: FI: fetch instruction

    EI: execute instruction

    We consider that each instruction takes execution time Tex.

    Execution time for the 7 instructions, with pipelining:

    (Tex/2)*8= 4*Tex

  • 7/29/2019 CO_unit3

    3/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 3

    Working of Instructional Pipelining:

    An instructional pipeline reads consecutive instructions from memory while previous instructionsare being executed in other segments.

    This causes the instruction fetch & execute phases to overlap and perform simultaneousoperations.

    When branch instruction is encountered, pipeline must be emptied and all the instructions thathave been read from memory after the branch instruction must be discarded.

    A greater number of stages always provides better performanceSix stage pipeline:

    FI: fetch instruction FO: fetch operand DI: decode instruction

    EI: execute instruction CO: calculate operand address WO: write operand

    Branch in a Pipeline:

  • 7/29/2019 CO_unit3

    4/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 4

    Pipeline performance:

    Pipeline performance measure is in terms of time taken in executing a program.

    If a non-pipe line unit that performs a given task and takes a time equal to tn to complete.

    The speed up of a pipe line processing over an equivalent non-pipe line processing is defined by the

    ratio:

    Where K= No. of segments in pipe line.Tp = Time taken by each segment to process a sub-operation.

    n= No. of tasks.

    Problems with Pipeline:

    A greater number of stages increases the overhead in moving information between stagesand synchronization between stages.

    With the number of stages the complexity of the CPU grows. With is difficult to keep a large pipeline at maximum rate because of pipeline hazards.

    Pipelining Hazards:Pipeline hazards are situations that prevent the next instruction in the instruction stream from

    executing during its designated clock cycle. The instruction is said to be stalled. When an instructionis stalled, all instructions later in the pipeline than the stalled instruction are also stalledInstructions earlier than the stalled one can continue. No new instructions are fetched during the stall.

    Types of hazards:1. Structural hazards

    2. Data hazards

    3. Control hazards

    Structural Hazards:Structural hazards occur when a certain resource (memory, functional unit) is requested by

    more than one instruction at the same time.

    Example: Instruction ADD R4,X fetches in the FO stage operand X from memory.The memory doesnt accept another access during that cycle.

    Penalty: 1 cycleSolutions: Certain resources are duplicated in order to avoid structural hazards.

    Functional units (ALU, FP unit) can be pipelined themselves in order to support several instructions at a

    time.A classical way to avoid hazards at memory access is by providing separate data and instruction

    caches.

  • 7/29/2019 CO_unit3

    5/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 5

    Data Hazards:

    This conflict arises when an instruction depends on the result of a pervious instruction, but this result isnot yet variable

    We have two instructions, I1 and I2. In a pipeline the execution ofI2 can start before I1

    has terminated. If in a certain stage of the pipeline, I2 needs the result produced by I1, but this resulthas not yet been generated, we have a data hazard.

    Example:

    Before executing its FO stage, the ADD instruction is stalled until the MUL instruction has written theresult into R2.

    Penalty: 2 cycles

    Solutions:

    The problem of data dependency can be solved through the followings.

    1.Operand forwarding: The hardware avoid the conflict by routing the data through special pathsbetween pipe line segments.

    2.Through Compiler Programs: Insert the No. operation instruction in the program.

    After the EI stage of the MUL instruction the result is available by forwarding. The penalty is reduced toone cycle.

    Control Hazards: Control hazards are produced by as consequence of branch instructions. Unconditional branch: BR TARGET

    TARGET _______

    After the FO stage of the branch instruction the address of the target is known and it can be

    fetched.

  • 7/29/2019 CO_unit3

    6/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 6

    Conditional branch:

    Handling branch difficulties: The methods used are

    (i) Prefetch target instructions

    (ii) Use of branch target buffer(iii) Use of loop buffer.(iv) branch prediction

    (v) Delayed branch.

  • 7/29/2019 CO_unit3

    7/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 7

    Dealing with Branches:A number of techniques can be used to minimize the impact of the branch instruction i.e the branchpenalty such are

    Multiple Streams Prefetch Branch Target Loop buffer Branch prediction Delayed branching

    Multiple Streams: Replicate the initial portions of the pipeline and fetch both possible next instructions Have two pipelines Prefetch each branch into a separate pipeline Use appropriate pipeline Increases chance of memory contention Must support multiple streams for each instruction in the pipeline

    Prefetch Branch Target: Target of branch is prefetched in addition to instructions following branch Keep target until branch is executed Used by IBM 360/91

    Loop buffer: Loop Buffer is small, very high speed memory maintained by the instruction fetch stage of

    pipeline and containing n most recently fetched instructions in sequence. Look ahead, look behind buffer. If the branch is to be taken ,the hardware first checks whether branch target is within buffer, If

    so next instruction is fetched from the buffer.

    Benefits of Loop Buffer: With use of prefetching, Instruction fetched in sequence without the usual memory access

    time. If the Branch occurs to target just a few locations ahead of the address of branch

    instruction,the target is already in buffer.

    Very good for small loops or jumps. If buffer is big enough, entire loop can be held in it -- reducing branch penalty c.f. cache Used by CRAY-1

  • 7/29/2019 CO_unit3

    8/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 8

    Branch Prediction:

    Make a good guess as to which instruction will be executed next and start that one down thepipeline.

    If the guess turns out to be right, no loss of performance in the pipeline If the guess was wrong, empty the pipeline and restart with the correct instruction -- suffering

    the full branch penalty. Static guesses: make the guess without considering the runtime history of the program

    Predict never taken Predict always taken Predict based on the opcode

    Dynamic guesses: track the history of conditional branches in the program Taken / not taken switch History table

    Predict never taken:

    Assume that jump will not happen

    Always fetch next instruction

    68020 & VAX 11/780VAX will not prefetch after branch if a page fault would result (O/S v CPU design)

    Predict always taken:Assume that jump will happenAlways fetch target instruction

    Predict by Opcode:

    Some instructions are more likely to result in a jump than othersCan get up to 75% success

    Taken/Not taken switch:

    Based on previous historyGood for loops

    Branch Prediction Flowchart:

  • 7/29/2019 CO_unit3

    9/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 9

    Branch Prediction State Diagram

    Dealing With Branches:

  • 7/29/2019 CO_unit3

    10/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 10

    Delayed branch: Minimize the branch penalty by finding valid instructions to execute in the pipeline while the

    branch address is being resolved.

    Compiler is tasked with reordering the instruction sequence to find enough independentinstructions (wrt to the conditional branch) to feed into the pipeline after the branch that thebranch penalty is reduced to zero.

    Consider the sequence:Instruction xInstruction x+1Instruction x+2

    Conditional branch

    Do not take jump until you have to Rearrange instructions Implemented on many RISC architectures

  • 7/29/2019 CO_unit3

    11/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 11

    8086 Processor Family:

    8086 Register Organization:

    Intel 8086 was the first 16-bit microprocessor introduced by Intel in 1978.

    The register organization includes the following types of Registers.

    1. General Purpose:

    There are 8 32-bit general purpose registers Used for all types of x86 instructions Holds the operands for address calculations. String instructions use the contents of ECX,ESI and EDI registers In 64-bit there are 16 64-bit general purpose registers.

    2.Segment:

    The 16-bit segment register selectors which segment selectors, which index into segment tables The Code Segment(CS):Register references the segment containing the instruction being

    executed.

    The Stack Segment(SS):Register references contains a user-visible stack. The Remaining segment registers(DS,ES,FS,GS) enable the user to separate the data segments

    at a time.

    3.FLAGS: The 32-bit EFLAGS register contain the conditional codes and various mode bits.

    4.Instruction Pointer: Contain the address of the current instruction.

    5.Numaric:

    Each register holds an extended precision 80-bit floating point numbers. There are 8 registers that function as a stack, with push and pop operations available in the

    instruction set.

    6.Control:

    The 16-bit control registers contains bit that control the operations of floating point unit. It include rounding, exception, precision controls

    7.Staus:

    16-bit status register contains bits that reflects the current state of floating point unit. It include 3-bit pointer to the top of the stack Conditional codes are reported

    8.Tag word: 16-bit register contains a 2-bit tag for each floating point numeric register which indicates the

    nature of the contents of corresponding register.

    The four possible values are valid, zero, special and empty Enable program to check the contents of the numeric register without performing complex

    decoding of actual data in the register.

  • 7/29/2019 CO_unit3

    12/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 12

    EFLAGS Registers:

    There is a special register in the processor called EFLAGS. This register is 32 bits wide and most of those

    bits are used to track a variety of conditions in the processor. It includes the six condition codes (likecarry, parity, auxiliary, zero, sign, overflow) which reports results of an integer operations.

  • 7/29/2019 CO_unit3

    13/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 13

    Trap Flag(TF): when set, causes an interrupt after the execution of each instruction. Used for

    debugging.

    Interrupt Enable Flag (IF): when set ,the processor will recognize the external interrupts.

    Direction Flag (DF): It is used in string processing.

    I/O privilege flags(IOPL):Used in protected mode to generate four levels of securityResume Flag(RF): It enables you to turn off certain exceptions while debugging code.

    Identification Flag (IF):If this bit can be set and cleared, then the processor supports the ProcessorID

    instruction. It provide information about vendor, family and model.

    Nested Task Flag: Indicate current task is nested within another task in protected mode.

    Virtual Mode: Allow the programmer to enable or disable virtual mode.

    Virtual Interrupt Flag(VIF) & Virtual Interrupt Pending(VIP) are used in multi tasking

    environment.

    Control Registers:

    MMX Registers:

    MMX uses several 64 bit data types

    Use 3 bit register address fields

    8 registers No MMX specific registers

    Aliasing to lower 64 bits of existing floating point registers

    http://www.c-jump.com/CIS77/asm_images/io_privilege_levels.pnghttp://www.c-jump.com/CIS77/asm_images/io_privilege_levels.png
  • 7/29/2019 CO_unit3

    14/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 14

    Interrupt Processing:

    Interrupt processing with in a processor is facility provided to support the operating system. It allow the application programmer to be suspended, in order that a variety of interrupt

    conditions can be serviced and latter resumed.

    Interrupts & Exceptions:

    Interrupt is generated by a signal from hardware, and it may occur at random times during the

    execution of a program.

    Exception is generated from software an it is provoked by the execution of an instruction.

    There are two sources of interrupts and exceptions.

    Interrupts:Maskable:Received on the processors INTR pin.The processor does not recognize a maskable

    interrupt unless the Interrupt Enable Flag(IF) is set.Nonmaskable: Received on the processors NMI pin, Reorganization of such interrupts can not

    be prevented.

    Exceptions:

    Processor detected: Results when processor encounters an error while attempting to execute

    an instruction.

    Programmed: These are instructions that generate an exception.

    Interrupt vector table:Each interrupt type assigned a numberIndex to vector table256 * 32 bit interrupt vectors

    5 priority classes :

    Class1: Traps Previous instructions

    Class2: External Interrupts

    Class3: Faults from fetching next instruction

    Class4: Faults from decoding the next instruction

    Class5: Faults on executing an instruction

  • 7/29/2019 CO_unit3

    15/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 15

    RISC (Reduced Instruction Set Computers):

    Major Advances in Computers:

    The family concept IBM System/360 1964 DEC PDP-8 Separates architecture from implementation

    Microporgrammed control unit Idea by Wilkes 1951 Produced by IBM S/360 1964

    Cache memory IBM S/360 model 85 1969

    Solid State RAM (See memory notes)

    Microprocessors Intel 4004 1971

    Pipelining Introduces parallelism into fetch execute cycle

    Multiple processors Reduced Instruction Set Computer

    Key features

    Large number of general purpose registers or use of compiler technology to optimize register use Limited and simple instruction set Emphasis on optimising the instruction pipeline

    Instruction Execution Characteristics:

    Driving force for CISC:

    Software costs far exceed hardware costs Increasingly complex high level languages Semantic gap Leads to:

    Large instruction sets More addressing modes Hardware implementations of HLL statements

    e.g. CASE (switch) on VAXIntention of CISC: Ease compiler writing Improve execution efficiency

    Complex operations in microcode Support more complex HLLs

    Execution Characteristics:

    Operations performed Operands used Execution sequencing

  • 7/29/2019 CO_unit3

    16/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 16

    Studies have been done based on programs written in HLLs Dynamic studies are measured during the execution of the program

    Operations:

    Assignments Movement of data

    Conditional statements (IF, LOOP) Sequence control

    Procedure call-return is very time consuming Some HLL instruction lead to many machine code operations

    Operands:

    Mainly local scalar variables Optimisation should concentrate on accessing local variables

    Procedure Calls:

    Very time consuming Depends on number of parameters passed Depends on level of nesting Most programs do not do a lot of calls followed by lots of returns Most variables are local (c.f. locality of reference)

    Implications:

    Best support is given by optimising most used and most time consuming features Large number of registers

    Operand referencing Careful design of pipelines

    Branch prediction etc. Simplified (reduced) instruction set

    Large Register File:

    Software solution Require compiler to allocate registers Allocate based on most used variables in a given time Requires sophisticated program analysis

    Hardware solution Have more registers Thus more variables will be in registers

    Registers for Local Variables:

    Store local scalar variables in registers Reduces memory access Every procedure (function) call changes locality Parameters must be passed Results must be returned Variables from calling programs must be restored

    Register Windows:

    Only few parameters Limited range of depth of call Use multiple small sets of registers Calls switch to a different set of registers

  • 7/29/2019 CO_unit3

    17/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 17

    Returns switch back to a previously used set of registers Three areas within a register set

    Parameter registers Local registers Temporary registers Temporary registers from one set overlap parameter registers from the next This allows parameter passing without moving data

    Circular Buffer Organization of overlapped windows:

    Operation of Circular Buffer :

    When a call is made, a current window pointer is moved to show the currently active registerwindow.

    If all windows are in use, an interrupt is generated and the oldest window (the one furthest backin the call nesting) is saved to memory.

    A saved window pointer indicates where the next saved windows should restore to.Global Variables:

    Allocated by the compiler to memory

  • 7/29/2019 CO_unit3

    18/22

  • 7/29/2019 CO_unit3

    19/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 19

    Why CISC:

    Compiler simplification? Disputed Complex machine instructions harder to exploit Optimization more difficult

    Smaller programs? Program takes up less memory but Memory is now cheap May not occupy less bits, just look shorter in symbolic form

    More instructions require longer op-codes Register references require fewer bits

    Faster programs Bias towards use of simpler instructions More complex control unit Microprogram control store larger thus simple instructions take longer to execute It is far from clear that CISC is the appropriate solution

    RISC Characteristics:

    One instruction per cycle Register to register operations Few, simple addressing modes Few, simple instruction formats

    Hardwired design (no microcode) Fixed instruction format More compile time/effort

    RISC VS CISC

  • 7/29/2019 CO_unit3

    20/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 20

    RISC Pipelining:

    Most instructions are register to register Two phases of execution

    I: Instruction fetch E: Execute

    ALU operation with register input and output For load and store

    I: Instruction fetch E: Execute

    Calculate memory address D: Memory

    Register to memory or memory to register operationEffects of Pipelining:

    Optimization of Pipelining: Delayed branch

    Does not take effect until after execution of following instruction This following instruction is the delay slot

    Delayed Load Register to be target is locked by processor Continue execution of instruction stream until register required Idle until load complete Re-arranging instructions can allow useful work whilst loading

    Loop Unrolling

  • 7/29/2019 CO_unit3

    21/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    R.Veeranjaneyulu M.Tech PACE Institute of Technology & Sciences, Ongole Page 21

    Replicate body of loop a number of times Iterate loop fewer times Reduces loop overhead Increases instruction parallelism Improved register, data cache or TLB locality

    Example:

    do i=2, n-1

    a[i] = a[i] + a[i-1] * a[i+l]

    end do

    Becomes

    do i=2, n-2, 2

    a[i] = a[i] + a[i-1] * a[i+i]

    a[i+l] = a[i+l] + a[i] * a[i+2]

    end do

    if (mod(n-2,2) = i) then

    a[n-1] = a[n-1] + a[n-2] * a[n]

    end if

    Use of Delayed Branch:

  • 7/29/2019 CO_unit3

    22/22

    [ II - IT- II semester Computer Organization -- Unit-3 ]

    Assignment Questions

    1.What is a pipeline register. What is the use of it? Explain in detail?2. (a) Differentiate RISC and CISC computers.

    (b) Explain RISC pipelining.

    3.Explain vector processing?4. (a) What is pipeline? Explain.

    (b) Explain arithmetic pipeline.