INSTRUCTION PIPELINING. What is pipelining? The greater performance of the cpu is achieved by...

Preview:

Citation preview

INSTRUCTION PIPELINING

What is pipelining?

• The greater performance of the cpu is achieved by instruction pipelining.

• 8086 microprocesor has two blocks

BIU(BUS INTERFACE UNIT)

EU(EXECUTION UNIT)

• The BIU performs all bus operations such as instruction fetching,reading and writing operands for memory and calculating the addresses of the memory operands. The instruction bytes are transferred to the instruction queue.

• EU executes instructions from the instruction system byte queue.

• Both units operate asynchronously to give the 8086 an overlapping instruction fetch and execution mechanism which is called as Pipelining.

INSTRUCTION PIPELINING

First stage fetches the instruction and buffers it.

When the second stage is free, the first stage passes it the buffered instruction.

While the second stage is executing the instruction,the first stage takes advantages of any unused memory cycles to fetch and buffer the next instruction.

This is called instruction prefetch or fetch overlap.

Inefficiency in two stage instruction pipelining

There are two reasons

• The execution time will generally be longer than the fetch time.Thus the fetch stage may have to wait for some time before it can empty the buffer.

• When conditional branch occurs,then the address of next instruction to be fetched become unknown.Then the execution stage have to wait while the next instruction is fetched.

Two stage instruction pipelining

Simplified view

wait new address wait

Instruction Instruction Result

discard EXPANDED VIEW

Fetch Execute

Decomposition of instruction processing

To gain further speedup,the pipeline have more stages(6 stages)

Fetch instruction(FI)

Decode instruction(DI)

Calculate operands (i.e. EAs)(CO)

Fetch operands(FO)

Execute instructions(EI)

Write operand(WO)

SIX STAGE OF INSTRUCTION PIPELINING Fetch Instruction(FI)

Read the next expected instruction into a buffer

Decode Instruction(DI)

Determine the opcode and the operand specifiers.

Calculate Operands(CO)

Calculate the effective address of each source operand.

Fetch Operands(FO)

Fetch each operand from memory. Operands in registers need not be fetched.

Execute Instruction(EI)

Perform the indicated operation and store the result

Write Operand(WO)

Store the result in memory.

Timing diagram for instruction pipeline operation

High efficiency of instruction pipeliningAssume all the below in diagram

• All stages will be of equal duration.

• Each instruction goes through all the six stages of the pipeline.

• All the stages can be performed parallel.

• No memory conflicts.

• All the accesses occur simultaneously.

In the previous diagram the instruction pipelining works very efficiently and give high performance

Limits to performance enhancement

The factors affecting the performance are

1. If six stages are not of equal duration,then there will be some waiting time at various stages.

2. Conditional branch instruction which can invalidate several instruction fetches.

3. Interrupt which is unpredictable event.

4. Register and memory conflicts.

5. CO stage may depend on the contents of a register that could be altered by a previous instruction that is still in pipeline.

Effect of conditional branch on instruction pipeline operation

Conditional branch instructionsAssume that the instruction 3 is a

conditional branch to instruction 15.

Until the instruction is executed there is no way of knowing which instruction will come next

The pipeline will simply loads the next instruction in the sequence and execute.

Branch is not determined until the end of time unit 7.

During time unit 8,instruction 15 enters into the pipeline.

No instruction complete during time units 9 through 12.

This is the performance penalty incurred because we could not anticipate the branch.

Simple pattern for high performance• Two factors that frustrate this simple

pattern for high performance are

1.At each stage of the pipeline,there is some overhead involved in moving data from buffer to buffer and in performing various preparation and delivery functions.This overhead will lengthen the execution time of a single instruction.This is significant when sequential instructions are logically dependent,either through heavy use of branching or through memory access dependencies

2.The amount of control logic required to handle memory and register dependencies and to optimize the use of the pipeline increases enormously with the number of stages.

Six-stage CPU instruction pipeline

THANK YOU

8086 Pin Function

By:

Madhu Oruganti

SNIST,

Pin Diagram

Pin Functions

Out of 40 pins, 32 pins are having same function in minimum or maximum mode,

And remaining 8 pins are having different functions in minimum and maximum mode.

Following are the pins which are having same functions

Symbol: AD15 - AD0, Pin No. 39, 2-16 Type: I/O

ADDRESS DATA BUS: time multiplexed memory/IO address (T1), and data (T2, T3, TW, T4) bus.

These lines are active HIGH and float to 3-state OFF during interrupt acknowledge and local bus ``hold acknowledge''.

Symbol: A19/S6, A18/S5, A17/S4, A16/S3Pin No: 35 - 38 Type: OAddress/ Status lines

During T1: Address and then during T2, T3, Tw, T4 Status

S5: IF flag condition and S6: LOW

A17/S4 A16/S3 Characteristics

0 (Low)01 (High)1

0101

Alternate DataStackCode or noneData

Symbol: BHE#/S7Pin No.: 34Type: O

Bus High Enable / Status:

BHE# A0 Characteristics

0011

0101

Whole word from even locationUpper byte from/to odd addressLower byte from/to even addressNone

Symbol: RD#Pin No.: 32Type: O

Read: RD# is active LOW during read cycle in T2, T3 and Tw clocks and indicates that processor is performing memory or I/O read

Symbol: READYPin No.: 22Type: I

Ready signal is received from memory or I/O devices to indicate the completion of data transfer

Synchronized by 8284 clock generator

Symbol: INTRPin No.: 18Type: I

Interrupt Request: Level triggered input received from interrupting device

Sampled during last clock of each instruction cycle

A subroutine is vectored through IVT if interrupt enable flag (IF) is SET

Symbol: TEST#Pin No.: 23Type: I

Test: Input is examined by the ‘wait’ instruction, if TEST# is LOW processor will continue execution otherwise wait in an idle state.

Symbol: NMIPin No.: 17Type: I

Non Maskable Interrupt: Edge triggered input causes a TYPE 2 interrupt.

Not maskable internally by software.

Symbol: RESETPin No.: 21Type: I

Reset: Input causes the processor to immediately terminate its present activity

Must be HIGH for at least 4 clock cycles

Symbol: CLKPin No.: 19Type: I

Clock: provides the basic timing for the processor and bus controller.

It is asymmetric with a 33% duty cycle to provide optimized internal timing.

Symbol: VccPin No.: 40

Vcc: +5V power supply pin.

Symbol: GNDPin No.: 1, 20

GROUND

Symbol: MN/MX#Pin No.: 33Type: I

MINIMUM/MAXIMUM: indicates what mode the processor is to operate in.

HIGH indicates minimum mode (Single processor system)

LOW indicates maximum mode (Multi-processor system)

Pins having different functions in maximum modePin number 24 to 31 is having different

functions in maximum mode which is explained below

Symbol: S2#, S1#, S0# Pin No.: 26-28Type: O

Status: active during T4, T1, and T2 and is returned to the passive state (1, 1, 1) during T3 or during TW when READY is HIGH

Used by the 8288 Bus Controller to generate all memory and I/O access control signals

S2 S1 S0 Characteristics

0 0 0 Interrupt Acknowledge

0 0 1 Read I/O Port

0 1 0 Write I/O Port

0 1 1 Halt

1 0 0 Code Access

1 0 1 Read Memory

1 1 0 Write Memory

1 1 1 Passive

Symbol: RQ#/GT0#, RQ#/GT1#Pin No.: 30, 31Type: I/O

Request/Grant: Pins are used by other local bus masters to force the processor to release the local bus at the end of the processor's current bus cycle.

RQ/GT0# is having higher priority than RQ/GT1#

Symbol: LOCK# Pin No.: 29Type: O

LOCK: output indicates that other system bus masters are not to gain control of the system bus while LOCK is active LOW.

Activated by the ``LOCK'' prefix instruction and remains active until the completion of the next instruction.

Symbol: QS1, QS0 Pin No.: 24, 25Type: O

Queue Status: The queue status is valid during the CLK cycle after which the queue operation is performed.

QS1 QS0 Characteristics

0 0 No Operation

0 1 First Byte of Op Code from Queue

1 0 Empty the Queue

1 1 Subsequent Byte from Queue

Pins having different functions in minimum mode Pin number 24 to 31 is having different

functions in minimum mode which is explained below

Symbol: M/IO#Pin No.: 28Type: O

Status Line: used to distinguish a memory access from an I/O access

HIGH for memory operation and LOW for I/O operations

Symbol: WR#Pin No.: 29Type: O

Write: indicates that the processor is performing a write memory or write I/O cycle

Symbol: INTA#Pin No.: 24Type: O

Interrupt Acknowledgement: used as a read strobe for interrupt acknowledge cycles

Active LOW during T2, T3 and TW of each interrupt acknowledge cycle.

Symbol: ALEPin No.: 25Type: O

Address Latch Enable: It is a HIGH pulse active during T1 of any bus cycle

Provided by the processor to latch the address into the 8282/8283 address latch.

Symbol: DT/R#Pin No.: 27Type: O

Data Transmit/Receive: used to control the direction of data flow through the transceiver

Symbol: DEN#Pin No.: 26Type: O

Data Enable: provided as an output enable for the 8286/8287 in a minimum system which uses the transceiver

Symbol: HOLD, HLDAPin No.: 31, 30Type: I, O

Hold: indicates that another master is requesting a local bus ``hold.'‘

The processor receiving the ``hold'' request will issue HLDA (HIGH) as an acknowledgement

Email:madhuoruganti@sreenidhi.edu.in

Combinational Circuits

Madhu Oruganti.SNIST

Outline

Boolean Algebra

Decoder

Encoder

MUX

History: Computer and the Rationalist

Modern research issues in AI are formed and evolve through a combination of historical, social and cultural pressures.

The rationalist tradition had an early proponent in Plato, and was continued on through the writings of Pascal, Descates, and Liebniz

For the rationalist, the external world is reconstructed through the clear and distinct ideas of a mathematics

History: Development of Formal Logic

The goal of creating a formal language for thought also appears in the work of George Boole, another 19th century mathematician whose work must be included in the roots of AI

The importance of Boole’s accomplishment is in the extraordinary power and simplicity of the system he devised: Three Operations

Three Operations

three basic Boolean operations can be defined arithmetically as follows.

x∧y=xy

x∨y=x + y − xy

¬x=1 − x

Boolean function and logic diagram

• Boolean algebra: Deals with binary variables and logic operations operating on those variables.

• Logic diagram: Composed of graphic symbols for logic gates. A simple circuit sketch that represents inputs and outputs of Boolean functions.

Basic Identities of Boolean Algebra(1) x + 0 = x

(2) x · 0 = 0

(3) x + 1 = 1

(4) x · 1 = 1

(5) x + x = x

(6) x · x = x

(7) x + x’ = x

(8) x · x’ = 0

(9) x + y = y + x

(10) xy = yx

(11) x + ( y + z ) = ( x + y ) + z

(12) x (yz) = (xy) z

(13) x ( y + z ) = xy + xz

(14) x + yz = ( x + y )( x + z)

(15) ( x + y )’ = x’ y’

(16) ( xy )’ = x’ + y’

(17) (x’)’ = x

Gates

Refer to the hardware to implement Boolean operators.

The most basic gates are

Boolean function and truth table

Outline

Boolean Algebra

Decoder

Encoder

MUX

DecoderAccepts a value and decodes it

Output corresponds to value of n inputs

Consists of:

Inputs (n)

Outputs (2n , numbered from 0 2n - 1)

Selectors / Enable (active high or active low)

The truth table of 2-to-4 Decoder

2-to-4 Decoder

2-to-4 Decoder

The truth table of 3-to-8 Decoder

A2 A1 A0 D0 D1 D2 D3 D4 D5 D6 D7

0 0 0 1

0 0 1 1

0 1 0 1

0 1 1 1

1 0 0 1

1 0 1 1

1 1 0 1

1 1 1 1

3-to-8 Decoder

3-to-8 Decoder with Enable

Decoder Expansion

Decoder expansionCombine two or more small decoders with enable

inputs to form a larger decoder

3-to-8-line decoder constructed from two 2-to-4-line decoders

The MSB is connected to the enable inputs

if A2=0, upper is enabled; if A2=1, lower is enabled.

Decoder Expansion

Combining two 2-4 decoders to form one 3-8 decoder using enable switch

The highest bit is used for the enables

How about 4-16 decoder

Use how many 3-8 decoder?

Use how many 2-4 decoder?

Outline

Boolean Algebra

Decoder

Encoder

Mux

Encoders

Perform the inverse operation of a decoder

2n (or less) input lines and n output lines

Encoders

Encoders with OR gates

Encoders

Perform the inverse operation of a decoder

2n (or less) input lines and n output lines

Outline

Boolean Algebra

Decoder

Encoder

Mux

Multiplexer (MUX)

A selector chooses a single data input and passes it to the MUX output

It has one output selected at a time.

A multiplexer can use addressing bits to select one of several input bits to be the output.

Function table with enable

4 to 1 line multiplexer

S1 S0 F

0 0 I0

0 1 I1

1 0 I2

1 1 I3

4 to 1 line multiplexer

2n MUX to 1

n for this MUX is 2

This means 2 selection lines s0 and s1

Multiplexer (MUX)

Consists of:

Inputs (multiple) = 2n

Output (single)

Selectors (# depends on # of inputs) = n

Enable (active high or active low)

Multiplexers versus decoders

• A Multiplexer uses n binary select bits to choose from a maximum of 2n unique input lines.

•Decoders have 2^n number of output lines while multiplexers have only one output line.

•The output of the multiplexer is the data input whose index is specified by the n bit code.

Multiplexer Versus Decoder

S0

S1

I3

I2

I1

I0

X

Note that the multiplexer has an extra OR gate. A1 and A0 are the two inputs in decoder. There are four inputs plus two selecs in multiplexer.

4-to-1 Multiplexer 2-to-4 Decoder

Cascading multiplexers

Using three 2-1 MUX to make one 4-1 MUX

S1 S0 F

0 0 I0

0 1 I1

1 0 I2

1 1 I3

F

F2-1

MUX

S E

S2 E

S2 S1 S0 F

0 0 0 I0

0 0 1 I1

0 1 0 I2

0 1 1 I3

1 0 0 I4

1 0 1 I5

1 1 0 I6

1 1 1 I7

I0

I1

I2

I3

I4

I5

I6

I7

Example: Construct an 8-to-1 multiplexer using 2-to-1 multiplexers.

Example : Construct 8-to-1 multiplexer using one 2-to-1 multiplexer and two 4-to-1 multiplexers

S2 S1 S0 X

0 0 0 I0

0 0 1 I1

0 1 0 I2

0 1 1 I3

1 0 0 I4

1 0 1 I5

1 1 0 I6

1 1 1 I7

Quadruple 2-to-1 Line Multiplexer

Used to supply four bits to the output. In this case two inputs four bits each.

Quadruple 2-to-1 Line Multiplexer

E(Enable)

S(Select)

Y(Output)

0 X All 0’s

1 0 A

1 1 B

Sequential circuits

part 2: implementation, analysis & design

More summer fashion

SR is one of 4 basic flip flops common in computer design

Others can all be constructed from SR; they are:

JK

D (data)

T (toggle)

JK flip flop

Resolves undefined transition in SRJ input acts like S (sets device)K acts like R (resets)

When JK = 11, have toggle condition: switch from one state to other

Implementation of JK flip flop

JK flip flop implementation

If JK = 00, SR = 00 because of AND – so SR won’t change state when clocked

JK flip flop implementation

If JK = 10, R must be 0:if Q=0, Q’=1, so SR=10, the set condition: flip flop will

change state (to Q=1)if Q=1, Q’=0, SR=00 (stable condition) so flip flop stays in

Q=1

JK flip flop implementation

If JK = 01, final state is Q=0 (analogous to JK=10)

JK flip flop implementation

If JK=11, Q connects directly to R, Q’ to S

so if Q=0, SR=10, so Q=1

if Q=1, SR=01, so Q=0

D flip flop

D: data; one input + CP

Q(t+1) independent of Q(t) – depends only on value of D at time t

D flip flop holds data until next pulse

Constructing registers

Can use D flip flops to construct individual bits of registers – one signal sent to each bit

Setting/resetting flip flop requires a 1 signal on exactly one of its input lines – CP restricts incoming signal to appropriate time so device remains in sync

D is split in 2, with one half inverted – so always 1 true, 1 false on data line

Since CP usually false, both inputs normally 0 (no change in flip flop)

When clock goes high, one of 2 lines (S or R) delivers 1

Device select signal

Used in combination with CP & D signals to determine if register should send or receive data

When one register is to send to another, 3 simultaneous signals sent to each register:clockdevice selectsend or receive

All 3 ANDed together to indicate that specific register should send or receive at specific time

T flip flop

T stands for Toggle

like D, has one input + CP

acts like control line that specifies selective toggle

if T=0, flip flop doesn’t change; if T=1, toggles

Implementation of T flip flop

Identical to JK, with J=K

General sequential network

Sequential circuit: interconnection of gates & flip flops

All gates can be grouped conceptually as combinational network, all flip flops as group of state registers

Between clock pulses, combinational part produces output; amount of time needed depends on number of gates in net

General sequential network

Arrows: one or more connecting lines

I/O lines: connections to external environment

Arrow between boxes: input lines to flip flops

Clock line assumed but not shown

Hardware analysis vs. design

Analysis: determine output given input and sequential network

Design: input and output are known; need to determine makeup of sequential network

General approach:

construct state transition table and transition diagram

determine output stream for given input stream

Excitation table

The excitation table is a design tool for constructing circuits from a given type of flip-flop

Given the desired transition from Q(t) to Q(t +1), what inputs are necessary to make the transition happen?

Characteristic table vs. Excitation table for SR flip flopTells what next state is,

given current input and current state

Tells what current input must be given current state

Sequential analysis

Step 1: List all possible combinations of current state and current input in an analysis table

Step 2: For each combination, compute the output and the current inputs to the state registers

Step 3: From the characteristic table, determine the next state and construct the state transition table and diagram

Example problem

State registers: FFA & FFB (T flip flops)

Combinational circuit

inputs:

X1 AND B (TA)

X2 OR A (TB)

TA & TB are inputs to FFA & FFB

output:

B’ AND X1 (Y)

Example problem

2 flip flops, so 4 possible states:

A B

0 0

0 1

1 0

1 1

• 2 inputs, so 4 possible input combinations:

X1 X2

0 0

0 1

1 0

1 1

Example problem

Given a state (AB) and an input (X1X2):

what is output?

what will be the state after CP?

16 possible answers, as shown on next slide

Analysis table for sample problem circuit

1st 4 columns list possible combinations of initial state & initial input

By the logic diagram, we know:

Y(t)=X1(t) AND B’(t)

TA(t)=X1(t) AND B(t)

TB(t)=X2(t) OR A(t)

Compute next 3 columns given above

Compute last 2 from:

characteristic table for T flip flop

initial state of flip flop

flip flop’s initial input

State transition table

Table shows simple rearrangement of selected columns from table on previous slide

For given initial state A(t)B(t) and input X1(t)X2(t), lists next state (A+1)(t)(B+1)(t) and initial output Y(t)

States listed as ordered pairs – next state followed by initial output

State transition diagram

Easier to visualize circuit behavior

Transitions listed as ordered pairs of input followed by initial output, with slash separator

Asynchronous inputs

An asynchronous input changes state of a flip-flop immediately without regard to CP

Preset sets Q to 1

Clear clears Q to 0

Used to initialize the state of a machine

Normal operation: both lines 0

Sequential design

Given the state transition diagram, the output, and the type of flip-flop to be used, design the combinational circuit

Any unused input combinations or unused states are don’t care conditions

2n states are possible with n flip-flops

Design steps

Step 1: In a design table, list the initial state, input, and output, and from the transition diagram list the next state

Step 2: Use the excitation table for the given type of flip-flop to determine the input required for the state registers

Step 3: Use Karnaugh maps to design a minimized two-level circuit for each flip-flop input

Sample problem

Design table for sample problem

Sequential design & K-maps

Each flip flop in the problem can be considered a function of four variables:

initial state (AB)

input (X1X2)

To design the combinational circuit we need a 4-variable K-map for each flip flop input

K-maps for sample problem

Figures a and b below show K-maps for S & R inputs to FFA

Row values are AB, columns are X1X2

X1X2 = 00 is a don’t care condition for both inputs, so first column of both tables is X

K-maps for sample problem

Figures c and d show inputs to FFB

Note that we can take advantage of don’t care conditions to minimize circuit

Resulting circuit with original spec

K-map & circuit for output Y

Another look at the register

Basic building block of instruction set architecture

array of D flip flops; each is bit in register

common clock line connected to all flip flops; # of flip flops doesn’t affect speed of load operation because all receive clock signal simultaneously

Memory

Conceptually, main memory is just a big array of registers

Input: address lines, control lines, data lines

Data lines are bidirectional (output also)

Control signals:

CS: Chip select, to enable or select the memory chip

WE: Write enable, to write or store a memory word to the chip

OE: Output enable, to enable the output buffer to read a word from the chip

Memory chips

Storage capacity of each is identical (512 bits); left uses 8-bit word, right uses 1Generally, chip with 2n words has n address lines

Memory access

To store a word (memory write)

Select chip by setting CS to 1

Put data and address on the bus and set WE to 1

To retrieve a word (memory read)

Select chip by setting CS to 1

Put address on the bus, set OE to 1, and read the data on the bus

4 x 2 memory chip

2 address lines (A0, A1) & 2 data lines (D0, D1)

Stores 4 2-bit words

each bit is D flip flop

Address lines drive 2 x 4 decoder

1 output is 1, other 3 0

line with 1 signal selects row of D flip flops that make up word accessed by chip

Closer look

Diagram below shows implementation of “Read enable” box

Alphabet soup:WE: write enableCS: chip selectOE: output enableMMV: monostable multivibrator (CP)

Read Enable

Three normal modes:

CS=0 (chip not selected)

CS=1, WE=1, OE=0

(chip selected for write)

CS=1, WE=0, OE=1

(chip selected for read)

WE & OE not permitted to be 1 at same time

Memory types: volatile

SRAM: Static random access memory

most closely resembles model we’ve seen

advantage: fast

disadvantage: large – several transistors required for each bit cell

DRAM: Dynamic RAM

overcomes size problem of SRAM: one transistor, one capacitor per cell

advantage: high capacity

disadvantage: relatively slow because requires refresh operation

Memory types: non-volatile

ROM: Read-only memory

Simplest type, ROM, is prewritten to spec by manufacturer – can’t be overwritten

PROM: Programmable ROM: user can write once (by blowing embedded fuses) – can’t be overwritten

EPROM: Erasable PROM: can be wiped out & reprogrammed (requires removal from computer)

Memory types: non-volatile

EEPROM: Electrically erasable PROM

Like EPROM, but doesn’t require removal to reprogram

Can reprogram individual cell (doesn’t have to be whole chip)

Flash memory: A type of EEPROM

flash card is array of flash chips

flash drive has interface circuitry to mimic hard drive

Recommended