josephgordonftw.weebly.com · Web viewOne factor that largely determined how the processes would be run is by the P bit in the program status word register. If the P bit was high

Derek Thompson and Joseph Gordon

Computer Architecture Spring Semester 2017

Project 4: Pipelined Control Unit

April 4th 2017

Introduction:

For this project a processor was designed which was capable of arithmetic, storing and

accessing memory, reading instructions, and interpreting data. The data is moved through a

pipelined system designed do move at fast paces with maximum efficiency. For the circuital

representation, we used the CAD tool Logism. The design is broken up into four stages, these

stages are separated by state buffers clearly showing where each section starts and stops. First

is the IF stage which stands for instruction fetch. This stage is where we get the necessary

instruction for completing a command sent from a user. Next data moves to the ID stage, which

is responsible for decoding data and storing it in appropriate GPRs. Then data goes to the EX

stage where ALU operations are executed to send to the next stage which is MEM. In MEM,

data can be written to or read from RAM’s I-cache and D-cache.

The processor ran the fetch-decode-execute instruction cycle and was able to run 16

instructions from a 4 bit opcode. There were three different operands which could be

interpreted by the processor and they were 3-address, 2-address, and 1-address the type of

operand was determined within the instruction register and stored values from there

accordingly.

One factor that largely determined how the processes would be run is by the P bit in the

program status word register. If the P bit was high then you were in privileged mode and could

access all of the instructions and would not have to worry about the processor timing out.

However, if the P bit was low, then instruction 14 and 15 couldn’t be run without a program

violation check occurring and if the processor timed out, the user would then be sent to the

timeout trap. The P bit allowed for many extra operations to be run and drastically changed the

state machine.

This design consisted of both a control unit and a datapath. The control unit designated

which enable bits to turn on and off thus pushing and pulling data to the bus so that it could be

manipulated. The control unit consisted of decoding the opcode from the instruction register

and determining which instruction to run, as well as running the fetch cycle, timeout trap, and

the program check violation. The datapath was how all the components of the circuit

connected through the bus that being the ALU, general purpose registers, instruction register,

memory, and more.

High Level Design

State Machine:

IF:

The Instruction Fetch stage is located at the beginning of the pipelined system. It

contains the Program Counter, ROM, Decoder, Control Unit, Indirect Checker, and Program

Check. The program counter provides information to the ROM to look up instructions to be

decoded. The ROM then feeds the decoder where it changes into the control unit and the data

for the registers. The program check is fed by the control unit where it compares the

instructions to the P bit to see if a program check violation occurs. Before the program counter

there is a multiplexer that will either accept an increment from the incrementer or a new

address if there is a branch.

ID:

The ID stage which is located between state buffers 1 and 2 takes in decoded

information from the previous stage and moves that data into one or two of the seven GPR. The

correct register is chosen from inside the GPR block. Once the correct register is chosen and the

GPRs are loaded, the data in those GPRs continues onto the next stage.

Real decoding happens in the previous stage but its ok

EX:

The EX stage occurs between state buffers 2 and 3 contains the ALU and the MUXs that control

how data flows to and from it. Data coming into the ALU will usually come from the previous ID

stage but sometimes the data entering the ALU will come from RAM or PSW. The MUX controls

where the data comes from based on how the Data Forwarding Unit selects. The MUX on the

end selects whether to take the output from the ALU or to bypass the ALU by receiving data

straight from the ID stage. Our design is set up so that both wires exiting this stage can get data

from the ID stage by bypassing the ALU.

MEM:

The MEM stage is between buffers 3 and 4. This is where the MM is

written to and read from. This stage contains RAM and accesses the Main Memory through the

D-cache. It contains lines that can be used for data forwarding from the ALU stage.

STORE:

The store stage is used to return data values to the general purpose

registers, PSW, timer, ALU, and PC. It contains lines that can be used for data forwarding

from the memory stage. The result it sends to the program counter can be accepted only if a

branch is occurring and that branch is taken.

Program Check Violation:

A program check occurs when Instruction 14 or 15 are run. A circuit was designed so

that when Instruction 14 or 15 are called it will compare it against a P bit using a configuration

similar to a demultiplexer. If the P bit was set to high the instructions will execute normally. If

the P bit was set to low, then execution will not be run and instead a program check violation

will occur which will result in MM[0]=PSW, MM[2]=PC, PSW=MM[4], PC=MM[6]. The control

signals for this can be seen below in the control signals section. After a program check violation

occurs, the system will return to fetch.

Data Forwarding:

Within our pipeline system design, we made a use of a data forwarding unit. This was

an optimization that we used to prevent any data hazards from occurring. It works by taking

the destination locations of currently running instructions after the ALU and MM and

comparing those to the source registers of the instruction in the register fetch stage. Those

comparators control control signals to multiplexers before the ALU stage. If any of the

comparators show true meaning the destination of one is the same as the source for the other,

it will feed back the result for the instruction to access before it is published within its

destination register. This allows for a much faster system by removing the need for stalls when

data hazards occur.

Branch Prediction:

Another optimization that was used in our system design was branch prediction. Branch

prediction is used to prevent control hazards from occurring. Originally we planned to

implement a 2-bit dynamic predictor, but because of the lack of diversity in the opcode

instructions, we found it to be much more efficient to consider hardware and thus went with

static not taken branch predictor instead. The way the branch predictor works is that when a

conditional branch instruction is run, we predict that the conditions will not be met and we

continue incrementing the program counter normally. When the instruction before the branch

resolves, we then can retrieve the N or Z bit depending on the branch being run. This will then

be analyzed to determine if our prediction is wrong or not. If branch is not taken and our

prediction is correct, then the system will be completely unaffected. However, if our prediction

was incorrect and the branch is taken, then we must load the new location into the program

counter and flush the instructions being run throughout the circuit. Every time we incorrectly

predict the branch, the outcome is like what would be achieved with the stall. However, for the

times where the prediction is correct, we achieve a great result in efficiency and it doesn’t slow

the progress of the processor at all. This is implemented in our circuit by having the PSW and

the branch control signals go into the predictor. The predictor itself doesn’t create the

prediction, but it checks if we were correct or not. If it is correct, then it is not taken and the PC

will resume normal operation. If incorrect and we need to take the branch, then the predictor

sends a control signal to the PCMUX to enable it and accept the result of the prediction rather

than the normal incrementation of the program counter. Also the predictor will force a flush of

the system.

Countdown Timer:

The countdown timer is a trap that when called upon sends PC and PSW value to the

main memory, and sends main memory values to the PC and PSW. This trap can be avoided by

having the privilege bit set or by making sure that new CLK values are loaded into the

countdown timer so that it never reaches zero. The countdown timer starts at some preloaded

value and decreases by one every clock cycle. The design for the countdown is an Adder that

always adds -1. The signals coming from the line of flip flops on the left are subtracted by one

and the placed in the flip-flops on the right in one clock cycle. The flip-flops on the left are set

from the flip-flops on the right in the same clock cycle so the number is always decreasing by

one. There are also MUXs included so that the countdown timer trap can receive a new value if

called on by the control unit.

Timeout Trap:

Timeout trap can only occur after execution of an instruction has completed and the

cycle is in the process of returning to fetch. If the countdown timer has reached zero and the P

bit is set to zero, that is the case in which timeout trap happens. If a timeout trap is called it will

result in MM[8]=PSW, MM[10]=PC, PSW=MM[12], PC=MM[14]. The control signals for this can

be seen in the control signal section below. After timeout trap, the cycle will return to fetch.

Control Signals:

*Buffers are automatic due to the clock, however they are put into the control signals to assist

in showing the processes.

Fetch:

1. PCOUT, INCIN, INCOUT, MARIN, Read_MM, MDROUT, PCIN, PCHECKIN, ICHECKIN, Buffer1IN

Program Check Violation:

1. FLUSH, STALL UNTIL COMPLETE

2. PSWOUT, GPR(PSW)IN, GPR(0)OUT, Buffer2IN

3. Buffer2OUT, GPR(PSW)OUT, Buffer2IN, Buffer3IN

4. Buffer3OUT, Buffer2OUT, MARIN, Buffer3IN, Buffer4IN

5. Buffer3OUT, GPR(6)IN, GPR(6)OUT, Buffer2IN, MDRIN, Write_MM

6. Buffer2OUT, GPR(PC)OUT, Buffer2IN, ALU/INC, ALUMUXENABLE, Buffer3IN



9. Buffer2OUT, ALU/INC, ALUMUXENABLE, Buffer3IN

10. Buffer3OUT, MARIN, Buffer4IN

11. Buffer4OUT, GPR(6)IN, Read_MM, MDROUT, Buffer4IN, GPR(6)OUT, Buffer2IN

12. Buffer4OUT, Buffer2OUT, PSWIN, ALU/INC, ALUMUXENABLE, Buffer3IN

13. Buffer3OUT, MARIN, Read_MM, MDROUT, Buffer4IN

14. Buffer4OUT, GPR(PC)IN

Timeout Trap Violation:

1. FLUSH, STALL UNTIL COMPLETE

2. PSWOUT, GPR(PSW)IN, GPR(0)OUT, Buffer2IN


4. Buffer3OUT, Buffer4IN

5. Buffer4OUT, GPR(6)IN, GPR(6)OUT, GPR(6)OUT, Buffer2IN

6. Buffer2OUT, ALU/LEFTSHIFT, ALUMUXENABLE, Buffer3IN

7. Buffer3OUT, Buffer4IN

8. Buffer4OUT, GPR(6)IN, GPR(6)OUT, Buffer2IN

9. Buffer2OUT, GPR(PSW)OUT, Buffer2IN, Buffer3IN



12. Buffer2OUT, GPR(PC)OUT, Buffer2IN, ALU/INC, ALUMUXENABLE, Buffer3IN




16. Buffer3OUT, MARIN, Buffer4IN

17. Buffer4OUT, GPR(6)IN, Read_MM, MDROUT, Buffer4IN, GPR(6)OUT, Buffer2IN

18. Buffer4OUT, Buffer2OUT, PSWIN, ALU/INC, ALUMUXENABLE, Buffer3IN

19. Buffer3OUT, MARIN, Read_MM, MDROUT, Buffer4IN

20. Buffer4OUT, GPR(PC)IN

Instruction 1:

1. Buffer1IN

2. Buffer1OUT, GPRLOAD, GPR(RS1)OUT, GPR(RS2)OUT, Buffer2IN

3. Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN

4. Buffer3OUT, Buffer 4IN

5. Buffer4OUT, GPR(RD)IN

Instruction 2:

1. Buffer1IN


3. Buffer2OUT, ALU/SUB, ALUMUXENABLE, Buffer3IN



Instruction 3:

1. Buffer1IN


3. Buffer2OUT, ALU/AND, ALUMUXENABLE, Buffer3IN



Instruction 4:

1. Buffer1IN


3. Buffer2OUT, ALU/LEFTSHIFT, ALUMUXENABLE, Buffer3IN



Instruction 5:

1. Buffer1IN


3. Buffer2OUT, ALU/RIGHTSHIFT, ALUMUXENABLE, Buffer3IN



Instruction 6:

1. Buffer1IN, STALLFETCH

2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, GPR(Short_Offset)OUT, Buffer2IN, STALLFETCH


4. Buffer3OUT,MARIN, MDROUT, MMMUXENABLE, Buffer 4IN

5. Buffer4OUT, GPR(RD)IN, ALU/ADD, ALUMUXENABLE, Buffer3IN

6. Buffer3OUT, MARIN, Read_MM, MDROUT, MMMUXENABLE, Buffer 4IN


Instruction 7:

1. Buffer1IN

2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, GPR(Short_Offset)OUT, Buffer2IN




Instruction 8:

1. Buffer1IN

2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, GPR(Short_Offset)OUT, GPR(RD)OUT, Buffer2IN


4. Buffer3OUT, MARIN, MDRIN, Write_MM, MDROUT, MMMUXENABLE, Buffer 4IN


Instruction 9:

1. Buffer1IN

2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, GPR(Long_Offset)OUT, Buffer2IN



5. Buffer4OUT, PredictorLoad,

If CC.N=1 then PCMUXENABLE then FLUSH_BUFFERS, PCIN

Instruction 10:

1. Buffer1IN




5. Buffer4OUT, PredictorLoad,

If CC.Z=1 then PCMUXENABLE then FLUSH_BUFFERS, PCIN

Instruction 11:

1. Buffer1IN




5. Buffer4OUT, PredictorLoad, PCMUXENABLE, FLUSH_BUFFERS, PCIN

Instruction 12:

1. Buffer1IN, STALLFETCH

2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, Buffer2IN

3. Buffer2OUT, Buffer3IN, GPR(PC)OUT, GPR(Short_Offset)OUT, Buffer2IN

4. Buffer3OUT, Buffer 4IN, Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN

5. Buffer4OUT, GPR(RD)IN, Buffer3OUT, Buffer4IN

6. Buffer4OUT, GPR(PC)IN, PCMUXENABLE, PCIN

Instruction 13:

1. Buffer1IN

2. Buffer1OUT, GPRLOAD, GPR(RD)OUT, GPR(Short_Offset)OUT, Buffer2IN



5. Buffer4OUT, GPR(PC)IN, PCMUXENABLE, PCIN

Instruction 14:

1. Buffer1IN




5. Buffer4OUT, TimerIN

Instruction 14:

1. Buffer1IN




5. Buffer4OUT, PSWIN

Documents

josephgordonftw.weebly.com · Web viewOne factor that largely determined how the processes would be run is by the P bit in the program status word register. If the P bit was high