37
Technological Document Due 6/24/2008 CPU Design Report Project Name: MIPS CPU Design Author: Lin Yin TA: Li Tingtao School of Software, Shanghai Jiao Tong University

MIPS Design

Embed Size (px)

DESCRIPTION

mips processor

Citation preview

Page 1: MIPS Design

Technological Document

Due:6/24/2008

CPU Design Report

ABSTRACT

P r o j e c t N a m e : M I P S C P U D e s i g n

A u t h o r : L i n Y i n

T A : L i T i n g t a o

School of Software, Shanghai Jiao Tong University

Page 2: MIPS Design

This report gives a brief description of the design and VHDL implementation of a

MIPS CPU with pipeline and cache functionalities.

KEYWORDS

MIPS, CPU, Pipeline, Cache, Logic, VHDL, Digital Circuit

School of Software, Shanghai Jiao Tong University

Address:Dongchuan Road No.800

Post Code:200240

Page 3: MIPS Design

C o n t e n t s

1. Executive Overview....................................................................................................1

2. Processor and Cache Design......................................................................................1

2.1 Processor Design....................................................................................................1

2.1.1 Architectural Overview..................................................................................1

2.1.2 Hazards Solving..............................................................................................2

2.1.3 The Controller.................................................................................................4

2.1.4 Detailed Design for Each Phase.....................................................................6

2.2 Cache Design.......................................................................................................10

2.2.1 Overview......................................................................................................10

2.2.2 Design Choices.............................................................................................11

2.2.3 State Machines..............................................................................................12

3. Processor Debugging................................................................................................14

3.1 Debug Overview..................................................................................................14

3.2 Debugging Synchronous Problems......................................................................15

3.3 Debugging the Cache...........................................................................................15

4. Results........................................................................................................................15

4.1 Testing Code........................................................................................................15

4.2 Single Cycle MIPS...............................................................................................15

4.3 Pipeline MIPS......................................................................................................18

4.4 Pipeline MIPS with Cache...................................................................................19

5. Conclusions................................................................................................................20

6. Developing Environments........................................................................................20

6.1 Hardware..............................................................................................................21

6.2 Software...............................................................................................................21

7. Appendices................................................................................................................22

7.1 ISA.......................................................................................................................22

7.2 Testing Codes.......................................................................................................22

8. References..................................................................................................................23

Page 1

School of Software, Shanghai Jiao Tong University

Page 4: MIPS Design

1. Executive Overview

Page 1

School of Software, Shanghai Jiao Tong University

Page 5: MIPS Design

In this CPU design practice, I’ve implemented 5-stage pipelined MIPS CPU with cache functionalities.

The practice is intended to make us know computer architecture better by making CPU logic design. My

design supports 15 instructions including 7 R-type instructions, 7 I-type, and 1 J-type ones (see appendix

[1]). It has successfully fulfilled all basic ideas of a simple MIPS CPU including instruction execution,

pipeling and cache functionalities. However, due to time limits, the bus arbitrator is not realized so that

separate memories are required for instructions and data. From the practice, I’ve learned how to make a

single-cycle CPU run, how to turn a single-cycle CPU into a pipelined one by solving the problem of

harzards, and how to design a cache using a finite state machine model. As a by-product, I also learned

VHDL language and how to solve the problem of synchronous problems. And above all, the goal did

achieve its goal – it made me know better about the architecture and organization of a computer.

The rest of this document is organized by first giving the details about processor and cache design,

followed by a description of problems and solutions during the processor debugging phase. Then the results

of the practice are given. Finally, conclusions are reached and the developing environments are listed.

2. Processor and Cache Design

2.1 Processor Design

2.1.1 Architectural Overview

Figure 1 Top-level Circuit

Page 2

School of Software, Shanghai Jiao Tong University

Page 6: MIPS Design

As is shown in figure1, the MIPS CPU is divided into 5 phases, according to the stages in the

pipelining: IF (Instruction Fectch), ID (Instruction Decoding), EXE (Execution), MEM (Memory

Accessing) and WB (Write Back). Whenever a phase has finished its task in an instruction, it will head for

the next instruction. Therefore, a set of backup registers are required to save the results of each phase. The

long bars in figure 1 denote the registers used for backup.

2.1.2 Hazards Solving

Like any pipelined CPU, the design of pipelined MIPS involves three types of hazards: structural

hazards, data hazards, and control hazards.

Structural hazards happen when instructions compete for the same hardware resource. For example,

both IF phase and MEM phase call for memory accesses. So they compete to gain the memory resource and

to occupy the data and address bus. In my design, this confiction is solved by using separate instruction

cache and data cache, instruction memory and data memory. However, the separation of the latter is not

necessary if an arbitrator is available. When ever a miss in one of the cache happens, the arbitrator will

freeze the memory so that the other cache can not access it until the current memory accessing finishes. The

strategy of using separate cache to solve the structure hazards can be generalized as adding hardware

resources. A general solution to this kind of harzards also includes adding bubbles, where the CPU stops

instruction fetching for one cycle to allow the competing instructions to gain resource access in a sequential

way. This approach is not adopted in my design because it delays instruction execution and is not efficient

enough.

Data hazards happen when the execution of an instruction needs the results of the instruction one or

several instructions ahead of it. In most conditions, an instruction will put its result in the register or

memory so that the following instructions can fetch it. However, if each instruction waits until the final

results of previous instructions are reached, unnecessary delay will be resulted and the pipeline will stop

working. Thus, whenever such data association happens, we should try to get the partial result of the

previous instructions for the thirsty instruction as soon as possible. This technique is called forwarding. In

MIPS, partial results have the following three sources:

ID Phase – the result of this phase may come from the register file or instruction word.

EXE Phase – the result of this phase may come from ALU.

MEM Phase – the result of this phase may come from memory (cache).

Since my design does not support multiplication instructions, it has fewer data hazard sources thus

fewer considerations are needed. The following list gives brief ideas in solving the data hazards in my

design.

When an instruction In needs to use the result of In-3, since In-3 has already reached its WB phase

and written its result to the register file at the negative clock edge of the current cycle, no data

hazards will happen.

Page 3

School of Software, Shanghai Jiao Tong University

Page 7: MIPS Design

When an instruction In needs to use the result of In-2, since In-2 has already reached its MEM phase

the result has already been calculated at the negative clock edge of the current cycle, forwarding is

possible. The only problem is that the calculated result hasn’t been written into the register file yet,

so we forward the the data from the backup register of MEM phase.

When an instruction In needs to use the result of In-1, two different cases should be taken into

consideration. If the result of In-1 comes from ID or EXE phase, the results has already been

calculated, forwarding is made possible. However, if the result of In-1 comes from MEM phase, it

hasn’t been ready yet. The only solution is to insert a bubble to delay the instruction fetch for one

cycle and allow In-1 to get its result from the memory.

Figure 2 Solution to Data Harzards by Forwarding

Figure 3 Solution to Data Harzards by Adding Bubbles

Page 4

School of Software, Shanghai Jiao Tong University

Page 8: MIPS Design

The last kind of hazards – the control hazards, happen when branches are needed. We are put in a

dilemma whether to execute the following instruction or take the branch. Such decisions can not be made

until we have ideas whether the branch condition is true. Generally, three solutions are proposed (see

reference [1]) for this hazard. In my design, the third one (non-delay solution) is adopted because of its

efficiency. The branch condition is judged at the ID phase. Also, a delaying slot is appended to each branch

instruction to make fuller use of resources. Due to the introduction of delaying slot, the instruction that

directly follows a branch instruction is guaranteed to execute, whether the branch is taken or not. The

execution of a series of instruction involving the branch is illustrated in Figure 4.

Cycle 1 2 3 4 5 6 7 8 9

Branch IF ID EXE MEM WB

Delaying Slot IF ID EXE MEM WB

Suc. Inst. 1 IF ID EXE MEM WB

Suc. Inst. 2 IF ID EXE MEM WB

Page 5

School of Software, Shanghai Jiao Tong University

Page 9: MIPS Design

Figure 4 Solution to Control Hazards

The delaying slot will definitely be executed after branch execution. But the successor instruction 1

and 2 are executed only when the branch is not taken.

2.1.3 The Controller

rseqrt

rseqz

rsltz

instr[31..0]

aludes[6..0]

memdes[6..0]

writepc

writeir

jump

branch

f wda[1..0]

f wdb[1..0]

rso[4..0]

rto[4..0]

rdo[4..0]

controlw[31..0]

pcu

inst7

rseqrt

instr[31..0]

controlw_id[26..20]

controlw_exe[26..20]

writepc

writeir

jump

branch

f wda[1..0]

f wdb[1..0]

rs[4..0]

rt[4..0]

rd[4..0]

controlw[31..0]

Figure 5 The PCU Module

The task of a controller is to decode the MIPS instructions and generate control signals for every

design elements in the CPU. The generation control signals is the core issue involved in CPU design. It

coordinates every parts of a processor to work in phase with each other. The solution to problems of

harzards is also implemented in the controller. In my design, all controller functionalities along with

decoding are all encapsulated in a PCU (Processor Control Unit).

The type of an instruction is recognized by decoding its opcode. Different kinds of instructions will

have different opcodes. For example, R-type instruction will have an all-zero opcode in my design. For R-

type instructions, the type of calculation is recognized by decoding the func-code. The opcodes and func-

codes of each instruction is given in appendix 1.

Once the concrete operation of an instruction is recognized, the PCU may start generating the control

word. The control word contains every piece of information needed to define the behavior of the processor

in the current cycle and provides enough information for the generation of control word in the next cycle.

Typically, this may include: the soonest phase at which the final result of current instruction will be

generated, whether the result will be written to the register file or memory, the source of data for port A and

port B of ALU, the func-code of ALU, etc. In practice, the control word is stored in every backup register to

enable each phase work correctly. The arrangement of control word is given in table 2-1.

Table 2-1 Control Word

Page 6

School of Software, Shanghai Jiao Tong University

Page 10: MIPS Design

Bits Name Usage

0-4 ALUCONTROL Func-code of ALU

5 ALUSRCA The data source of ALU data A

6 WRITEMEM Control write enable port of memory

7 MEMTOREG Whether the reg file should write the data from memory

8 WRITEREG Whether result will be written to reg file in the WB phase

9-13 ResltDes Control the register to be written in the WB phase

14 ALURESOK Whether EXE phase will use ALU

15 MEMRESOK Whether MEM phase will access memory

16-17 ALUSRCB The data source of ALU data B

18-32 Reserved

Table 2-2 Other Control Signals

Name # of Bits Comments

fwda 2 Control the forwarding to ALU data A

fwdb 2 Control the forwarding to ALU data B

writepc 1 Control the write enable port of pc register

writeir 1 Control the write enable port of instruction register (IR)

branch 1 Whether the current instruction is a branch instruction.

jump 1 Whether the current instruction is a jump instruction.

rs 5 # of source register

rt 5 # of target register

rd 5 # of destination register

Table 2-3 Data Source for ALU Port A

ALUSRCA Data Source

0 RS in the register file

1 INST[10..6] in the instruction code

Table 2-4 Data Source for ALU Port B

ALUSRCB Data Source

0 RT in the register file

1 INST[15..0] in the instruction code

Page 7

School of Software, Shanghai Jiao Tong University

Page 11: MIPS Design

Table 2-5 Forwarding Source

FWDA/FWDB Forwarding Source

00 No forwarding at all.

01 Forward data from ALU.

10 Forward data from memory.

Page 8

School of Software, Shanghai Jiao Tong University

Page 12: MIPS Design

2.1.4 Detailed Design for Each Phase

2.1.4.1 IF Phase

Figure 6 IF Phase

The IF phase should finish the following operations:

(1) Calculate the next instruction address for program counter register (PC)

(2) Read the instruction from the memory.

The PC is counted in the following way:

(1) In plain conditions, simply add 4 to the current PC counter

(2) When encountered with branch instruction, shift offset INST[[15..0] two bits left and add it to the

current instruction address. Note that “current instruction” means the instruction at IF execution,

that is, the instruction in the delaying slot. Offset INST[15..0] is in the branch code saved during

ID phase, that is, a part in the branch instruction itself.

(3) When encountered with a jump instruction, shift offset INST[[25..0] two bits left and add it to the

current instruction address. Note that “current instruction” means the instruction at IF execution,

that is, the instruction in the delaying slot. Offset INST[15..0] is in the jump instruction saved

during ID phase, that is, a part in the jump instruction itself.

Instruction fetch is to fetch the instruction at the virtual address saved in the PC register. However,

MMU is not provided in my simple MIPS CPU so that the addresses are directly sent to the instruction

memory (cache) for instructions. The instruction is saved in the instruction register.

2.1.4.2 ID Phase

Page 9

School of Software, Shanghai Jiao Tong University

Page 13: MIPS Design

Figure 7 ID Phase

The ID phase is responsible for the following tasks:

(1) Instruction decoding.

(2) Handle data harzards.

(3) Judch whether the condtion is met for a branch instruction.

(4) Fetch the operands from register file.

Once the mechanism described in section 2.1.3 is used, the decoding work is much of a language

problem. Simple and, or operation is enough for realizing the decoder. As is mentioned in section 2.1.2,

data harzard is solved by establishing extra data pathand forwarding data of the previous instructions as

soon as possible. The strategy of forwarding is illustrated in table 2-6.

Table 2-6 Forwarding Strategy

Page 10

School of Software, Shanghai Jiao Tong University

Page 14: MIPS Design

Instruction Data Source Condition Strategy

EXE Phase ( 1

instruction ahead)

EXE Phase:

ALUDES[5]

rssource AND

(RSI==ALUDES)

fwda=01B

rtsource AND

(RSI==ALUDES)

fwdb=01B

MEM Phase:

ALUDES[6]

rssource AND

(RSI==ALUDES)

bubble

rtsource AND

(RSI==ALUDES)

bubble

MEM Phase (2

instructions ahead)

EXE Phase:

MEMDES[5]

rssource AND

(RSI==MEMDES)

fwda=01B

rtsource AND

(RSI==MEMDES)

fwdb=01B

MEM Phase:

MEMDES[6]

rssource AND

(RSI==MEMDES)

fwda=01B

rtsource AND

(RSI==MEMDES)

fwdb=01B

Page 11

School of Software, Shanghai Jiao Tong University

Page 15: MIPS Design

2.1.4.3 EXE Phase

Figure 8 EXE Phase

The EXE Phase follows the ID Phase. The main tasks for EXE Phase are:

(1) Use ALU to realize arithmetic/logic operations

(2) Save the result of calculation to the backup register

(3) Give various control signals

Before we start calculation, we should first select operands. The input data of ALU either come from

immediate number in the instruction code or from RS/RT in the register file. This choice is decided upon

the signal ALUSRCA/ALUSRCB given by PCU.

2.1.4.4 MEM Phase

Figure 9 MEM Phase

Page 12

School of Software, Shanghai Jiao Tong University

Page 16: MIPS Design

The MEM Phase has the following main tasks:

(1) Give various memory access control signals and finish the memory access task.

(2) Save the data read from memory to backup register so they can be used by WB Phase.

The following signals are needed to access memory.

(1) DATAO[31..0]: Output data of CPU

(2) DADDR[31..0]: Address for accessing the memory.

(3) WRITEMEM: Control the write enable of the memory.

2.1.4.5 WB Phase

Figure 10 WB Phase

The only task for the WB Phase is to write the result of calculation to the register file. To achieve this,

the following signals are needed.

(1) RESULT[31..0]: The final result of the execution of an instruction

(2) CONTROLW_MEM[13..9] (ResltDes in PCU signals): The index of the register to be written.

(3) CONTROLW_MEM[8] (WRITEREG in PCU signals): Control the write enable of the register

file.

2.2 Cache Design

2.2.1 Overview

Page 13

School of Software, Shanghai Jiao Tong University

Page 17: MIPS Design

CPU Core

Cache Controller

i-cache d-cache

Main Memory

As is shown in figure 11, the cache lies in the CPU core and is coordinated by a cache controller. All

addresses sent by the CPU is sent to the cache controller first instead of directly to the main memory. The

cache controller decides whether the data needed is in the cache. If it is, then no memory access is needed,

the data is directly got from the cache and sent back to the CPU; if not, then it fetches several words from

the main memory consecutively to fill the corresponding line in the cache. Since cache access is much

faster than memory access. Once the data hit the cache, the performance may increase significantly. By

using cache, we believe the time locality and space locality of data access; that is, we believe that a datum

used currently will soon be used again; and a datum access may denotes access to the adjacent data in the

near future. Therefore, instead of only fetch the data we currently need, we also fetch their adjacent blocks –

sequential access to memory is much faster than random access.

Since my design does not include an MMU module, the cache receives physical addresses from CPU

instead of virtual addresses. Lock functionality is not supported. Two state machines are used in

coordination in the cache controller to schedule data read and write.

2.2.2 Design Choices

Page 14

School of Software, Shanghai Jiao Tong University

Figure 11 Cache Overview

Page 18: MIPS Design

Out of simplicity, my design adopts the direct mapped cache. Instruction cache (i-cache) and data

cache (d-cache) are separated in order to avoid structural hazards. The size of both caches is 2KB. Both of

them have 128 lines; each line has 4 words; and each word is 4 bytes in length. Therefore, signals

DADDR[10..2] are used for addressing the cache. Among them, DADDR[10..4] are used for indexing line

and DADDR[3..2] are used for deciding the column. Each line also has 21 tag bits, 1 valid bit and 1 dirty

bit. The tag bits are used for recording the high 21 bits of a virtual address so that the cache controller may

decide whether a read/write hits or misses the cache. The valid bit is used to distinguish whether a line in

the cache is valid. All valid bits are set to false when the machine restarts. The dirty bit is used to decide

whether the line is dirty, that is, whether a block in the line has been overwritten without writing the data

back to the memory.

Choices involved in any cache design are also made in this practice. The strategies I adopted are listed

below:

(1) Block (re)placement & block identification. Since I adopt the direct mapped way, there is no

choice for these two strategies. The block to be (re)placed is decided by the low 11 bits of the

address. And the block is identified by using the low 11 bits for addressing and high 21 bits and

valid bit for verification.

(2) Write strategy. Out of the consideration of efficiency, I adopt the write back strategy in my design.

The information is written only to the block in the cache. The modified cache block is written to

main memory only when it is replaced.

(3) Allocation strategy. I assume that data written to memory will soon be read and therefore adopt the

read-write-allocation strategy. Upon a cache miss, the cache controller allocates a cache line for

either a read or a write to memory. Any load or store operation made to main memory, which is

not in cache memory, allocates a cache line.

Since memory clock is much slower than the CPU clock, we also have choices about the mechanism to

let the cache controller know when a memory read/write has been completed. We can do this by either sent

both memory clock and CPU clock to the cache controller to let it arbitrate or add a memready signal

between the cache controller and memory so that the memory can tell when the data are ready. In my

design, I choose the former.

2.2.3 State Machines

2.2.3.1 Cache State Switching

Figure 12 and figure 13 demonstrate state switching of instruction cache and data cache, respectively.

Page 15

School of Software, Shanghai Jiao Tong University

Page 19: MIPS Design

Valid InvalidCache

WriteRead

Valid Valid CleanCache

Write

Invalid

Write

Cache Cache

Write

Write,Read Read

Figure 12 State Switching for Instruction Cache

Figure 13 State Switching for Data Cache

2.2.3.2 Finite State Machine for I-cache and D-cache

The differences between I-cache and D-cache lies in that I-cache does not allow write action. To make

this document more concise and clear, only state machines for D-cache is given here. The state machines

for I-cache are just a submachine of D-cache by removing the edges involving write actions and

unnecessary isolated states after the edges are removed.

To realize a D-cache, two state machines are needed, we call them cachefsm and fillfsm, respectively.

Cachefsm is meant for handling the data access requests, judging hit or miss, and scheduling the fillfsm, if it

is unfortunately a miss. After fillfsm has fulfilled its tasks, cachefsm fetches the data from cache and sends

them to the output pins, if necessary. Fillfsm is responsible for fetching the data from memory to fill the

corresponding line in the cache upon a miss. It does all things from necessary write back to memory read.

The state machine for cachefsm is shown in figure 14; it is a 3-state finite state machine (fsm). It starts

from a srw state. It then judges whether the request is a read or write request. If it is a read one, it switches

to the SDRW state; if it is a write one, it will turn into SDWW state; or else it will remain SRW state. It

may also schedule fillfsm, depending on whether a miss ever happens.

Page 16

School of Software, Shanghai Jiao Tong University

Page 20: MIPS Design

SDRW SDRS

SRW

write

!cachebusy

read

cachebusy cachebusy

!cachebusy

SS

SWW0

SIDLE

SWW1

SWW2 SWW3

SRW0 SRW1

SRW2SRW3

write backfillcache cachebusy

write back fillcache

fillcache

!fillcache

cachefsm

fillfsm

The state machine for fillfsm is a bit complicated; it is a 10-state fsm, starting from ss state. When it is

not work, it remains at sidle state. It turns into ss state when it is scheduled by cachefsm. Then it goes

through a series of SWW or SRW states, depending on the type of data accessing request. Each SWW or

SRW will write/read a word from/to the memory. So a group of 4 states write back/fill a line in the cache.

Concrete state machine for fillfsm is shonw in figure 14.

In figure 14, the dashed line denotes message between two state machines. Cachefsm may schedule

fillfsm by sending writeback or fillfsm signal. Fillfsm tells cachefsm whether it is busy by maintaining

cachebusy signal.

Page 17

School of Software, Shanghai Jiao Tong University

Page 21: MIPS Design

Figure 14 State Machine for D-Cache

3. Processor Debugging

3.1 Debug Overview

To facilitate debugging, I use ModelSim to do pre-synthesizing analysis. This tool has powerful tools

for writing benchmarks, adding breakpoints, and watching waveforms. For each module in the design, I

write a bundle of benchmarks to test it in the ModelSim. After it has passed all testcases, I integrate it to the

system. The breakpoint function of ModelSim empowered me with the ability to catch nearly any bug in my

design. After the CPU design finished, I then compile and synthesize it in the Quartus II, load test codes and

data to the memories, edit the input waveform Quaretus Waveform Editor, and then watch the result.

3.2 Debugging Synchronous Problems

The biggest headache in the design is that Quartus II does not support asynchronouos memory reading

and writing. That is, the output of memory is only given at a clock edge. Data can be written to the memory

only at the clock edge, too. This is quite different from default memory in MAX PLUS II. When I found

that the final output waveform does not conform to what we have expected, I traced it down to a delayed

memory read near the second cycle of my test program. I then detected this problem.

This problem can be solved by giving a negative clock edge to the memory while a positive one to the

CPU. Then the memory data access is done at the middle of a CPU cycle. So the memory seems

asynchronous. However, when I add caches to my design, this approach just won’t work. This is because,

the cachefsm state machine requires a half cycle delay so it can schedule fillfsm state machine, so it also

needs a negative clock edge. This conflicts with the memory access again. To solve the latter problem calls

on us to use different clocks for CPU and memory rather than reversing the CPU clock and sending it to the

memory. A memory cycle should be an integral multiple of a CPU cycle to ensure that each module in the

design work in phase.

3.3 Debugging the Cache

When I added cache to my design, the machine behaves in a strange way again. The first problem I

discovered is that both values of PC and IR have short cycles no matter the cache hit or miss. So I check the

modules that give the signals and found that I’ve forgotten to write a module that locks instruction fetch

upon a cache miss. I modified the PC calculation logic and the first problem was solved.

But other problems are still ahead. The final results of my test programs are all wrong. So I output the

states in the cache fsm and watch their waveforms. By doing this, I detected several flaws in my state

switching logic, and some conditions that have been neglected. For example, I forgot to check the dirty bit

before filling the cache with new data. Once the fsm diagram is correct, debugging state switching is made

much easier. Watch the states in the waveform and see whether they conform to your expectation. If not,

trace the wrong switch down to the program and rectify it.

4. Results

Page 18

School of Software, Shanghai Jiao Tong University

Page 22: MIPS Design

4.1 Testing Code

To test the correctness of each design, three pieces of codes are used. To make this document more

concise, only the codes cited in reference [1] are used (a small modification of code is made at line 0C to

show the correctness of cache write back). The testing codes are attached in the appendix of this document.

4.2 Single Cycle MIPS

Page 19

School of Software, Shanghai Jiao Tong University

Page 23: MIPS Design

Page 20

School of Software, Shanghai Jiao Tong University

Page 24: MIPS Design

4.3 Pipeline MIPS

Page 21

School of Software, Shanghai Jiao Tong University

Page 25: MIPS Design

As can be seen from the result, the CPU with pipeline has the same number of cycles as single

cycle CPU, but only 1/5 cycles compared with that of multi-cycle CPU. The time cost of pipeline CPU is

much less than both single-cycle and multi-cycle CPU. It is much faster than single-cycle CPU because

single-cycle CPU contains a lot of gliches at the beginning of each cycle and requires plenty of time to

stable, whereas pipelined CPU does not. It is much faster than multi-cycle CPU just because it contains

fewer cycles, and because all modules in the CPU are now in full-time usage. In theory, a CPU with k

stages in its pipeline with have a 1/k time cost of multi-cycle CPU, assuming that the time cost of all stages

are even.

4.4 Pipeline MIPS with Cache

Page 22

School of Software, Shanghai Jiao Tong University

Page 26: MIPS Design

As is seen from the result, the cache significantly cut down the time cost when the data hit the

cache. The cycles from 3.5ms to 4.3ms demonstrates a whole process of a cache miss together with write

back. It demonstrates the penality resulted upon a cache miss.

5. Conclusions

In this practice, I’ve successfully accomplished buiding a MIPS CPU with pipeline and cache

functionalities. 3 types of harzards are solved and the memory is correctly synchronized. Two seperated

direct mapped caches are designed for instruction and data access. The hierarchy of storage is fully

demonstrated in this practice.

From the practice, I’ve get more familiar with computer architecture and organization. Through this

practice, I have got a very deep impression about harzards involved in pipelining and how to solve them. I

also learned different cache policies with their advantages and disadvantages. As a byproduct, I learned how

to use VHDL to build digital circuits and how to debug it by writing benchmarks and watching waveforms.

And above all, it endows me with patience and carefulness to do things in an orderly way. This practice is

really a valuable lesson for my college study.

6. Developing Environments

Page 23

School of Software, Shanghai Jiao Tong University

Page 27: MIPS Design

6.1 Hardware

Processor: Intel Core Duo processor T2300 (1.66MHz FSB)

Memory: 1536MB

6.2 Software

OS: Windows XP + SP2

Design Software: Quartus II 7.2

Debug and Simulation: ModelSim SE 6.2b

Page 24

School of Software, Shanghai Jiao Tong University

Page 28: MIPS Design

7. Appendices

7.1 ISA

Table 7-7 Instruction Set

7.2 Testing Codes

Page 25

School of Software, Shanghai Jiao Tong University

Page 29: MIPS Design

WIDTH=32;

DEPTH=32;

ADDRESS_RADIX=HEX;

DATA_RADIX=HEX;

CONTENT BEGIN

00 : 00000820; % add $1, $0, $0 %

01 : 20020004; % addi $2, $0, 4 %

02 : 00001820; % add $3, $0, $0 %

03 : 8C240000; %loop: lw $4, 0($1) %

04 : 20210004; % addi $1, $1, 4 %

05 : 00641820; % add $3, $3, 4 %

06 : 2042FFFF; % addi $2, $2, -1 %

07 : 10400003; % beq $2, $0, finish %

08 : 00000000; % nop %

09 : 08000003; % j loop %

0A : 00000000; % nop %

0B : AC230000; %finish:sw $3, 0($1) %

0C : 8C220000; % lw $4, 0($2) %

0D : 0800000D; %here: j here %

[0E..1F] : 00000000;

END;

8. References

[1] Zhu Ziyu, Li Yamin. CPU Chip Logic Design. Tsinghua University Publishers,2005

[2] David A Patterson, John L Hennessy. Computer Architecture: A Quantitative Approach. Third

Edition. Morgan Kaufmann Publishers, Inc. 2003

[3] Purdue University ECE4371 MIPS Deisgn Labs. http://cobweb.ecn.purdue.edu/~ece437l/materials

Page 26

School of Software, Shanghai Jiao Tong University