Upload
doanlien
View
213
Download
0
Embed Size (px)
Citation preview
Derek Thompson and Joseph Gordon
Computer Architecture Spring Semester 2017
Project 4: Pipelined Control Unit
April 4th 2017
Introduction:
For this project a processor was designed which was capable of arithmetic, storing and
accessing memory, reading instructions, and interpreting data. The data is moved through a
pipelined system designed do move at fast paces with maximum efficiency. For the circuital
representation, we used the CAD tool Logism. The design is broken up into four stages, these
stages are separated by state buffers clearly showing where each section starts and stops. First
is the IF stage which stands for instruction fetch. This stage is where we get the necessary
instruction for completing a command sent from a user. Next data moves to the ID stage, which
is responsible for decoding data and storing it in appropriate GPRs. Then data goes to the EX
stage where ALU operations are executed to send to the next stage which is MEM. In MEM,
data can be written to or read from RAM’s I-cache and D-cache.
The processor ran the fetch-decode-execute instruction cycle and was able to run 16
instructions from a 4 bit opcode. There were three different operands which could be
interpreted by the processor and they were 3-address, 2-address, and 1-address the type of
operand was determined within the instruction register and stored values from there
accordingly.
One factor that largely determined how the processes would be run is by the P bit in the
program status word register. If the P bit was high then you were in privileged mode and could
access all of the instructions and would not have to worry about the processor timing out.
However, if the P bit was low, then instruction 14 and 15 couldn’t be run without a program
violation check occurring and if the processor timed out, the user would then be sent to the
timeout trap. The P bit allowed for many extra operations to be run and drastically changed the
state machine.
This design consisted of both a control unit and a datapath. The control unit designated
which enable bits to turn on and off thus pushing and pulling data to the bus so that it could be
manipulated. The control unit consisted of decoding the opcode from the instruction register
and determining which instruction to run, as well as running the fetch cycle, timeout trap, and
the program check violation. The datapath was how all the components of the circuit
connected through the bus that being the ALU, general purpose registers, instruction register,
memory, and more.
High Level Design
State Machine:
IF:
The Instruction Fetch stage is located at the beginning of the pipelined system. It
contains the Program Counter, ROM, Decoder, Control Unit, Indirect Checker, and Program
Check. The program counter provides information to the ROM to look up instructions to be
decoded. The ROM then feeds the decoder where it changes into the control unit and the data
for the registers. The program check is fed by the control unit where it compares the
instructions to the P bit to see if a program check violation occurs. Before the program counter
there is a multiplexer that will either accept an increment from the incrementer or a new
address if there is a branch.
ID:
The ID stage which is located between state buffers 1 and 2 takes in decoded
information from the previous stage and moves that data into one or two of the seven GPR. The
correct register is chosen from inside the GPR block. Once the correct register is chosen and the
GPRs are loaded, the data in those GPRs continues onto the next stage.
Real decoding happens in the previous stage but its ok
EX:
The EX stage occurs between state buffers 2 and 3 contains the ALU and the MUXs that control
how data flows to and from it. Data coming into the ALU will usually come from the previous ID
stage but sometimes the data entering the ALU will come from RAM or PSW. The MUX controls
where the data comes from based on how the Data Forwarding Unit selects. The MUX on the
end selects whether to take the output from the ALU or to bypass the ALU by receiving data
straight from the ID stage. Our design is set up so that both wires exiting this stage can get data
from the ID stage by bypassing the ALU.
MEM:
The MEM stage is between buffers 3 and 4. This is where the MM is
written to and read from. This stage contains RAM and accesses the Main Memory through the
D-cache. It contains lines that can be used for data forwarding from the ALU stage.
STORE:
The store stage is used to return data values to the general purpose
registers, PSW, timer, ALU, and PC. It contains lines that can be used for data forwarding
from the memory stage. The result it sends to the program counter can be accepted only if a
branch is occurring and that branch is taken.
Program Check Violation:
A program check occurs when Instruction 14 or 15 are run. A circuit was designed so
that when Instruction 14 or 15 are called it will compare it against a P bit using a configuration
similar to a demultiplexer. If the P bit was set to high the instructions will execute normally. If
the P bit was set to low, then execution will not be run and instead a program check violation
will occur which will result in MM[0]=PSW, MM[2]=PC, PSW=MM[4], PC=MM[6]. The control
signals for this can be seen below in the control signals section. After a program check violation
occurs, the system will return to fetch.
Data Forwarding:
Within our pipeline system design, we made a use of a data forwarding unit. This was
an optimization that we used to prevent any data hazards from occurring. It works by taking
the destination locations of currently running instructions after the ALU and MM and
comparing those to the source registers of the instruction in the register fetch stage. Those
comparators control control signals to multiplexers before the ALU stage. If any of the
comparators show true meaning the destination of one is the same as the source for the other,
it will feed back the result for the instruction to access before it is published within its
destination register. This allows for a much faster system by removing the need for stalls when
data hazards occur.
Branch Prediction:
Another optimization that was used in our system design was branch prediction. Branch
prediction is used to prevent control hazards from occurring. Originally we planned to
implement a 2-bit dynamic predictor, but because of the lack of diversity in the opcode
instructions, we found it to be much more efficient to consider hardware and thus went with
static not taken branch predictor instead. The way the branch predictor works is that when a
conditional branch instruction is run, we predict that the conditions will not be met and we
continue incrementing the program counter normally. When the instruction before the branch
resolves, we then can retrieve the N or Z bit depending on the branch being run. This will then
be analyzed to determine if our prediction is wrong or not. If branch is not taken and our
prediction is correct, then the system will be completely unaffected. However, if our prediction
was incorrect and the branch is taken, then we must load the new location into the program
counter and flush the instructions being run throughout the circuit. Every time we incorrectly
predict the branch, the outcome is like what would be achieved with the stall. However, for the
times where the prediction is correct, we achieve a great result in efficiency and it doesn’t slow
the progress of the processor at all. This is implemented in our circuit by having the PSW and
the branch control signals go into the predictor. The predictor itself doesn’t create the
prediction, but it checks if we were correct or not. If it is correct, then it is not taken and the PC
will resume normal operation. If incorrect and we need to take the branch, then the predictor
sends a control signal to the PCMUX to enable it and accept the result of the prediction rather
than the normal incrementation of the program counter. Also the predictor will force a flush of
the system.
Countdown Timer:
The countdown timer is a trap that when called upon sends PC and PSW value to the
main memory, and sends main memory values to the PC and PSW. This trap can be avoided by
having the privilege bit set or by making sure that new CLK values are loaded into the
countdown timer so that it never reaches zero. The countdown timer starts at some preloaded
value and decreases by one every clock cycle. The design for the countdown is an Adder that
always adds -1. The signals coming from the line of flip flops on the left are subtracted by one
and the placed in the flip-flops on the right in one clock cycle. The flip-flops on the left are set
from the flip-flops on the right in the same clock cycle so the number is always decreasing by
one. There are also MUXs included so that the countdown timer trap can receive a new value if
called on by the control unit.
Timeout Trap:
Timeout trap can only occur after execution of an instruction has completed and the
cycle is in the process of returning to fetch. If the countdown timer has reached zero and the P
bit is set to zero, that is the case in which timeout trap happens. If a timeout trap is called it will
result in MM[8]=PSW, MM[10]=PC, PSW=MM[12], PC=MM[14]. The control signals for this can
be seen in the control signal section below. After timeout trap, the cycle will return to fetch.
Control Signals:
*Buffers are automatic due to the clock, however they are put into the control signals to assist
in showing the processes.
Fetch:
1. PCOUT, INCIN, INCOUT, MARIN, Read_MM, MDROUT, PCIN, PCHECKIN, ICHECKIN, Buffer1IN
Program Check Violation:
1. FLUSH, STALL UNTIL COMPLETE
2. PSWOUT, GPR(PSW)IN, GPR(0)OUT, Buffer2IN
3. Buffer2OUT, GPR(PSW)OUT, Buffer2IN, Buffer3IN
4. Buffer3OUT, Buffer2OUT, MARIN, Buffer3IN, Buffer4IN
5. Buffer3OUT, GPR(6)IN, GPR(6)OUT, Buffer2IN, MDRIN, Write_MM
6. Buffer2OUT, GPR(PC)OUT, Buffer2IN, ALU/INC, ALUMUXENABLE, Buffer3IN
7. Buffer3OUT, Buffer2OUT, MARIN, Buffer3IN, Buffer4IN
8. Buffer3OUT, GPR(6)IN, GPR(6)OUT, Buffer2IN, MDRIN, Write_MM
9. Buffer2OUT, ALU/INC, ALUMUXENABLE, Buffer3IN
10. Buffer3OUT, MARIN, Buffer4IN
11. Buffer4OUT, GPR(6)IN, Read_MM, MDROUT, Buffer4IN, GPR(6)OUT, Buffer2IN
12. Buffer4OUT, Buffer2OUT, PSWIN, ALU/INC, ALUMUXENABLE, Buffer3IN
13. Buffer3OUT, MARIN, Read_MM, MDROUT, Buffer4IN
14. Buffer4OUT, GPR(PC)IN
Timeout Trap Violation:
1. FLUSH, STALL UNTIL COMPLETE
2. PSWOUT, GPR(PSW)IN, GPR(0)OUT, Buffer2IN
3. Buffer2OUT, ALU/INC, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, Buffer4IN
5. Buffer4OUT, GPR(6)IN, GPR(6)OUT, GPR(6)OUT, Buffer2IN
6. Buffer2OUT, ALU/LEFTSHIFT, ALUMUXENABLE, Buffer3IN
7. Buffer3OUT, Buffer4IN
8. Buffer4OUT, GPR(6)IN, GPR(6)OUT, Buffer2IN
9. Buffer2OUT, GPR(PSW)OUT, Buffer2IN, Buffer3IN
10. Buffer3OUT, Buffer2OUT, MARIN, Buffer3IN, Buffer4IN
11. Buffer3OUT, GPR(6)IN, GPR(6)OUT, Buffer2IN, MDRIN, Write_MM
12. Buffer2OUT, GPR(PC)OUT, Buffer2IN, ALU/INC, ALUMUXENABLE, Buffer3IN
13. Buffer3OUT, Buffer2OUT, MARIN, Buffer3IN, Buffer4IN
14. Buffer3OUT, GPR(6)IN, GPR(6)OUT, Buffer2IN, MDRIN, Write_MM
15. Buffer2OUT, ALU/INC, ALUMUXENABLE, Buffer3IN
16. Buffer3OUT, MARIN, Buffer4IN
17. Buffer4OUT, GPR(6)IN, Read_MM, MDROUT, Buffer4IN, GPR(6)OUT, Buffer2IN
18. Buffer4OUT, Buffer2OUT, PSWIN, ALU/INC, ALUMUXENABLE, Buffer3IN
19. Buffer3OUT, MARIN, Read_MM, MDROUT, Buffer4IN
20. Buffer4OUT, GPR(PC)IN
Instruction 1:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(RS1)OUT, GPR(RS2)OUT, Buffer2IN
3. Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, Buffer 4IN
5. Buffer4OUT, GPR(RD)IN
Instruction 2:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(RS1)OUT, GPR(RS2)OUT, Buffer2IN
3. Buffer2OUT, ALU/SUB, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, Buffer 4IN
5. Buffer4OUT, GPR(RD)IN
Instruction 3:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(RS1)OUT, GPR(RS2)OUT, Buffer2IN
3. Buffer2OUT, ALU/AND, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, Buffer 4IN
5. Buffer4OUT, GPR(RD)IN
Instruction 4:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(RS1)OUT, GPR(RS2)OUT, Buffer2IN
3. Buffer2OUT, ALU/LEFTSHIFT, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, Buffer 4IN
5. Buffer4OUT, GPR(RD)IN
Instruction 5:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(RS1)OUT, GPR(RS2)OUT, Buffer2IN
3. Buffer2OUT, ALU/RIGHTSHIFT, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, Buffer 4IN
5. Buffer4OUT, GPR(RD)IN
Instruction 6:
1. Buffer1IN, STALLFETCH
2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, GPR(Short_Offset)OUT, Buffer2IN, STALLFETCH
3. Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT,MARIN, MDROUT, MMMUXENABLE, Buffer 4IN
5. Buffer4OUT, GPR(RD)IN, ALU/ADD, ALUMUXENABLE, Buffer3IN
6. Buffer3OUT, MARIN, Read_MM, MDROUT, MMMUXENABLE, Buffer 4IN
7. Buffer4OUT, GPR(RD)IN
Instruction 7:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, GPR(Short_Offset)OUT, Buffer2IN
3. Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, MARIN, Read_MM, MDROUT, MMMUXENABLE, Buffer 4IN
5. Buffer4OUT, GPR(RD)IN
Instruction 8:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, GPR(Short_Offset)OUT, GPR(RD)OUT, Buffer2IN
3. Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, MARIN, MDRIN, Write_MM, MDROUT, MMMUXENABLE, Buffer 4IN
5. Buffer4OUT, GPR(RD)IN
Instruction 9:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, GPR(Long_Offset)OUT, Buffer2IN
3. Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, Buffer 4IN
5. Buffer4OUT, PredictorLoad,
If CC.N=1 then PCMUXENABLE then FLUSH_BUFFERS, PCIN
Instruction 10:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, GPR(Long_Offset)OUT, Buffer2IN
3. Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, Buffer 4IN
5. Buffer4OUT, PredictorLoad,
If CC.Z=1 then PCMUXENABLE then FLUSH_BUFFERS, PCIN
Instruction 11:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, GPR(Long_Offset)OUT, Buffer2IN
3. Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, Buffer 4IN
5. Buffer4OUT, PredictorLoad, PCMUXENABLE, FLUSH_BUFFERS, PCIN
Instruction 12:
1. Buffer1IN, STALLFETCH
2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, Buffer2IN
3. Buffer2OUT, Buffer3IN, GPR(PC)OUT, GPR(Short_Offset)OUT, Buffer2IN
4. Buffer3OUT, Buffer 4IN, Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN
5. Buffer4OUT, GPR(RD)IN, Buffer3OUT, Buffer4IN
6. Buffer4OUT, GPR(PC)IN, PCMUXENABLE, PCIN
Instruction 13:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(RD)OUT, GPR(Short_Offset)OUT, Buffer2IN
3. Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, Buffer 4IN
5. Buffer4OUT, GPR(PC)IN, PCMUXENABLE, PCIN
Instruction 14:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, GPR(Long_Offset)OUT, Buffer2IN
3. Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, MARIN, Read_MM, MDROUT, MMMUXENABLE, Buffer 4IN
5. Buffer4OUT, TimerIN
Instruction 14:
1. Buffer1IN
2. Buffer1OUT, GPRLOAD, GPR(PC)OUT, GPR(Long_Offset)OUT, Buffer2IN
3. Buffer2OUT, ALU/ADD, ALUMUXENABLE, Buffer3IN
4. Buffer3OUT, MARIN, Read_MM, MDROUT, MMMUXENABLE, Buffer 4IN
5. Buffer4OUT, PSWIN