Upload
neeraj-dhotre
View
937
Download
9
Embed Size (px)
Citation preview
Memory Address
TAG LINE WORD
4bit
s
2bit
s
2bit
s
IMPLEMENTATION OF “DIRECT MAPPED CACHE”, IN BEHAVIORIAL VERILOG
ECE254 ASSIGNMENT 1 NEERAJ DHOTRE (perm:5483615)
1. Introduction:
Memory hierarchy is imperative due to prevalent highly pipelined and super scalar architectures. As main memory
access is lot slower compared to other tasks in the pipeline, data and instructions are stored closed to the processor in
a small and comparatively faster memory, called cache. The main aspects of cache design are cache size, memory
mapping function, write policy and replacement algorithm. In a direct mapped cache each block of memory is
mapped to a particular row in the cache. The mapping function is simple to implement but performance of this type
of cache is not the best. The synchronization of the cache with main memory, handling read miss and write miss etc.
make direct mapped cache a good candidate for this assignment, aim of which is to learn Verilog modeling and
design simulation with Model Sim.
2. Cache Design:
2.1 Assumptions: The cache is designed with the following assumptions.
This cache lies between the Processor and the Main memory.
The cache and processor run on the same fast clock.
Main memory is a single port synchronous DRAM running on a slower clock.(4 times slower)
Processor sends physical address to the cache.
Processor sends/requests one word data (32 bits wide) at a time.
The cache implements „write through‟ write policy with “No write allocate” i.e. on a write miss
data is written only to main memory.
2.2 Cache and Main Memory Size:
To keep the cache size, memory size, data with etc. flexible for any cache module instance parameters are used.
Parameters define constants which can be changed during instantiation. The main parameters with the values specified were used for simulations in this assignment.
parameter ADDR_SIZE = 8; meaning 8 bit address and 28 Main memory locations.
parameter DATA_SIZE = 32; meaning processor is 32 bit and each main memory location is 32bit making it
256 x 4 byte or 1024B memory.
parameter LINE_BITS = 2; meaning 4 lines in the cache.
parameter LINES = 1 << LINE_BITS;
parameter WORD_BITS = 2; meaning 4 words per line in the cache, making it 64B memory.
parameter WORDS = 1 << WORD_BITS;
So according to these sizes the address is broken like shown is figure 1. for direct mapping in the cache.
Figure 1. Address break up for direct mapping.
Cache Memory data 8
rd_en 7 addr
clk
reset
Chip_select
mem_addr
mem_wr
mem_data
rd_done
wr_done
8
busy
data_valid
2.3 Block Diagram:
The block diagram is shown in figure 2. Behavioral model is written for the cache and main memory blocks.
The signals from processor are given as stimulus is tech bench. Table 2 lists the ports of cache.
Figure 2. Block Diagram showing signal connections.
Port Direction Description
clk input Common clock between cache and processor
reset input Synchronous reset to the cache
rd_en input HIGH for read from cache LOW for write to cache
data [31:0] bidirectional Data from/to processor Direction determined by rd_en
addr[7:0] input Address from the processor
data_valid output Active high signal indicating output data to processor is valid
Busy output Active High signal indication that cache is busy. Processor will not send another
request when cache busy.
mem_addr[7:0] output Address bus to main memory
mem_wr output HIGH for read from main memory LOW for write to main memory
chip_select output Signal to enable main memory access
mem_data[31:0] bidirectional Data from/to main memory
rd_done input Signal from main memory that requested read operation done
wr_done input Signal from main memory that requested write operation done
Table 1. Ports of cache with direction and descriptions.
Processor
Cache
Cache memory
tag word 1 word 2 word 3 word 4
Main Memory
Register Size Description
cache_hit_reg 1 bit Indicates a tag match, meaning requested address present in cache
line 2 bit To store line index to cache from input address
tag 4bit To store location tag from input address
count 2 bit To keep track of number of main memory reads in case of read miss
data_out 32bit Registered data out before driving it onto the bidir data bus to processor
mem_data_out 32bit Registered data out before driving it onto the bidir data bus to Memory
mem_data_reg0 to 3 32bit 4 registers to store data words read in from main memory Table 2. Internal registers used in the behavioral model
3. Verilog Implementation
3.1 Verilog code
The verilog code for the cache is given in appendix A. The design is implemented in 6 always blocks which execute simultaneously. There are 2 combinational blocks and 4 sequential blocks. These blocks do
the following logical tasks and together model cache behavior.
3.1.1 Combinational Blocks:
I. Tag comparison: This block always checks weather the tag of line mentioned in input address
matches with that in the address. It sets the cache_hit_reg if there is a tag match irrespective of read
or write operation
II. Memory select: This blocks controls the enabling of Main Memory. The Main Memory needs to
be enabled only when data is needed to be transferred to/from it. This gives better control over the
rd_done and wr_done signals given out by the Main Memory.
3.1.2 Sequential Blocks:
I. Cache Hit: Only if Tag comparison is successful this block executes and does the required data
manipulation.
II. Cache Miss: Only if Tag comparison is un successful this block executes and does the required data
manipulation.
III. Data Synchronizing from Memory: There two blocks, one runs on posedge clk and other on posedge
rd_done. These are required to synchronize the reads from memory in case of a read miss, as cache
and memory run at different asynchronous clocks.
3.2 Test bench
The test bench code is present in appendix C. The test bench runs 4 test cases to test the functionality of
the direct mapped cache. The clk signal is given a period of 10ns and mem_clk period is 40ns
1) Write Miss: Initially there is nothing in the cache or Memory. Processor issues 4 writes to
consecutive memory locations all of which result in a cache write miss. The data is written only to main
memory. As seen in the waveform data 56,57,58,59 were written to memory location 120,121,122 and
123 respectively. Cache_hit_reg signal was always low meaning a cache miss and proper busy pulses
were given to the processor form every right.
Figure 3. Wave forms showing Cache write miss test case.
2) Read Miss: Now the test bench requests the data written in the previous step. This results
in a read miss and cache brings the data from main memory. In this case as the memory has only
one word at each location, cache has to do 4 reads to get a block of data and replace a line. As
seen in the waveform in figure 4 the processor requests data at location 120 resulting in a read
miss. This triggers 4 reads from main Memory. Required data is given to processor with
data_valid and the cache line 2 is written with 4 words (56,57,58,59).
Figure 4. Wave forms showing Cache read miss test case
3) Read Hit: Again the processor requests same data. This time it is a cache hit as the data
was brought into the cache in the previous step. The data requested was at location 122 and as
seen in figure 5. Correctly data 58 was returned.
4) Write Hit: Now the processor writes a word to the cache at the same address from which
it read in last step. This results in a cache hit and the data is written properly. The data 60 is
requested to be written at location 122. As seen in the waveform in figure 5. correctly 60 is
written to the cache. According to write through method this data is written to main memory too.
Figure 5. Wave forms showing Cache read and write hit test case
4. APPENDIX A CACHE Verilog code.
/*######################################################################
---------------------------
SIMPLE DIRECT MAPPED CACHE
---------------------------
Input address is broken like this [----TAG----| -----LINE------|---WORD----]
TAG ---> cache tag
LINE --> index for the line in cache. 2^line = number of lines in cache
WORD --> bits to address word in cache line. 2^word = number of data words in cache line.
######################################################################*/
module cache (
clk, //clock. same as cpu clock.
addr, //address from cpu.
rd_en, //HIGH for read from cache.LOW for write to cache.
data, //bidir data from/to CPU
mem_addr, //address to main Memory.
mem_wr, //HIGH for write to Memory.LOW for read from Memory
mem_data, //bidir data to/from main Memory
rd_done, //read done signal from main Memory
wr_done, //write done signal from main Memory
data_valid, //Signal telling CPU data is valid to read.
busy, //telling CPI cache busy when read miss.
reset, //reset to cache.
chip_select //select signal to enable memory.
);
parameter ADDR_SIZE = 8;
parameter LINE_BITS = 2;
parameter LINES = 1 << LINE_BITS;
parameter WORD_BITS = 2;
parameter WORDS = 1 << WORD_BITS;
parameter DATA_SIZE = 32;
parameter TAG_SIZE = ADDR_SIZE - LINE_BITS - WORD_BITS;
parameter LINE_WIDTH = DATA_SIZE * 4 + TAG_SIZE;
parameter TAG_INDEX_1 = ADDR_SIZE - TAG_SIZE;
parameter TAG_INDEX_2 = LINE_WIDTH - TAG_SIZE;
input clk;
input reset;
input [ADDR_SIZE-1:0] addr;
input rd_en;
input wr_done;
input rd_done;
inout [DATA_SIZE-1:0] data;
inout [DATA_SIZE-1:0] mem_data;
output chip_select;
output [ADDR_SIZE-1:0] mem_addr;
output mem_wr;
output data_valid;
output busy;
reg chip_select;
reg data_valid;
reg busy;
reg [1:0] count;
reg [ADDR_SIZE-1:0] mem_addr;
reg mem_wr;
reg [DATA_SIZE-1:0] data_out;
reg [DATA_SIZE-1:0] mem_data_out;
reg [DATA_SIZE-1:0] mem_data_reg_0;
reg [DATA_SIZE-1:0] mem_data_reg_1;
reg [DATA_SIZE-1:0] mem_data_reg_2;
reg [DATA_SIZE-1:0] mem_data_reg_3;
reg cache_hit_reg;
reg [LINE_WIDTH-1:0] memory [LINES-1:0];
reg [LINE_WIDTH-1:0] line;
reg [LINE_WIDTH-1:0] tag;
wire [LINE_BITS-1:0] line_index;
assign line_index = addr[(WORD_BITS+LINE_BITS)-1:WORD_BITS];
assign data = (rd_en) ? data_out:{DATA_SIZE{1'bz}};
assign #5 mem_data = (!rd_en && chip_select) ? mem_data_out:{DATA_SIZE{1'bz}};
always @ (rd_done or wr_done or count )
begin
if (rd_done || wr_done && count == 2'b00)
begin
chip_select = 1'b0;
busy = 1'b0;
end
else if (!rd_done && !wr_done && (count != 2'b00) )
begin
chip_select = 1'b1;
busy = 1'b1;
end
end
//storing data red from main memory.
always @ (posedge rd_done)
begin
if (count == 2'b11)
begin
count <= 2'b00;
end
else
count <= count + 1;
case (mem_addr[1:0])
2'b00: begin mem_data_reg_0 <= mem_data; mem_addr[1:0] <= 2'b01; end
2'b01: begin mem_data_reg_1 <= mem_data; mem_addr[1:0] <= 2'b10; end
2'b10: begin mem_data_reg_2 <= mem_data; mem_addr[1:0] <= 2'b11; end
2'b11: begin mem_data_reg_3 <= mem_data; mem_addr[1:0] <= 2'b00; end
default: mem_data_reg_0 = mem_data;
endcase
end
//BLOCK to handle data in cache miss
always @ (posedge clk)
begin
//if read miss get data from main Memory.
if ( rd_en && !cache_hit_reg && count == 2'b01 )
begin
if(rd_done)
begin
data_out <= mem_data;
data_valid <= 1'b1;
end
else
data_valid <= 1'b0;
end
// wriritng for words read from main memory.
if ( rd_en && !cache_hit_reg && (count == 2'b00))
begin
memory[line_index] <= { addr[ADDR_SIZE-1:TAG_INDEX_1], mem_data_reg_0, mem_data_reg_1,
mem_data_reg_2, mem_data_reg_3 };
busy <= 1'b0;
end
//if write miss, write data to memory. After data written tell cpu
//that CACHE not busy
if ( wr_done && !rd_en && !cache_hit_reg)
begin
//busy <= 1'b0;
mem_wr <= 1'b0;
chip_select <=1'b0;
end
end
// BLOCK FOR TAG MATCH. CACHE HIT
always @ (posedge clk or reset)
begin
if (reset)
begin
line <= {LINE_WIDTH{1'b0}};
tag <= {LINE_WIDTH{1'b0}};
busy <= 1'b0;
count <= 1'b0;
chip_select <= 1'b0;
end
else
begin
//check tag every cycle
tag <= memory [line_index];
//read request cache hit. return data.
if( rd_en && cache_hit_reg)
begin
line <= memory [line_index];
case (addr[1:0])
0:data_out <= line[LINE_WIDTH-1:(DATA_SIZE*3)];
1:data_out <= line[(DATA_SIZE*3)-1:(DATA_SIZE * 2)];
2:data_out <= line[(DATA_SIZE * 2)-1:DATA_SIZE];
3:data_out <= line[DATA_SIZE-1:0];
default:data_out <= {DATA_SIZE{1'b0}};
endcase
data_valid <= 1'b1;
end
//write request cache hit write data to cache and memory
else if ( !rd_en && cache_hit_reg)
begin
line <= memory [line_index];
case (addr[1:0])
0:line[LINE_WIDTH-1:(DATA_SIZE*3)] <= data;
1:line[(DATA_SIZE*3)-1:(DATA_SIZE * 2)] <= data;
2:line[(DATA_SIZE * 2)-1:DATA_SIZE] <= data;
3:line[DATA_SIZE-1:0] <= data;
default:line <= {DATA_SIZE{1'b0}};
endcase
memory[line_index] <= {addr[ADDR_SIZE-1:TAG_INDEX_1] , line};
mem_addr <= addr;
mem_wr <= 1'b1;
chip_select <= 1‟b1;
mem_data_out <= data;
end
end
end
// BLOCK FOR TAG MIS MATCH. CACHE MISS
always @ (posedge clk )
begin
//Read miss, read from main memory. Send data to CPU later.
if( !rd_done && rd_en && !cache_hit_reg && !busy && count == 2'b00)
begin
chip_select <= 1'b1;
mem_addr <= addr;
mem_wr <= 1'b0;
busy <= 1'b1;
end
//Write miss, write to main memory.
if( !wr_done && !rd_en && !cache_hit_reg && !busy )
begin
chip_select <= 1'b1;
mem_addr <= addr;
mem_wr <= 1'b1;
mem_data_out <= data;
busy <= 1'b1;
end
end
//tag conparison.
always @ ( addr or tag or count or rd_done or wr_done)
begin
if ( (addr[ADDR_SIZE-1:TAG_INDEX_1] === tag[LINE_WIDTH-1:TAG_INDEX_2]) && count == 2'b00 && !busy && !rd_done &&
!wr_done )
begin
cache_hit_reg = 1'b1;
end
else
begin
cache_hit_reg = 1'b0;
end
end
endmodule
5. APPENDIX B Memory Verilog code.
//single port memory.
//simple behaviorial description.
module dram (
clk,
addr,
data,
wr_en, //high for write,low for read
rd_done,
wr_done,
chip_select
);
//busy signal?
parameter ADDR_SIZE = 8;
parameter DATA_SIZE = 32;
parameter MEM_SIZE = 1 << ADDR_SIZE;
input clk;
input wr_en;
input [ADDR_SIZE-1:0] addr;
input chip_select;
inout [DATA_SIZE-1:0] data;
output rd_done;
output wr_done;
wire [1:0] word_loc;
wire [ADDR_SIZE-1:2] line_index;
reg [DATA_SIZE-1:0] data_out;
reg [DATA_SIZE-1:0] memory [MEM_SIZE-1:0];
reg rd_done;
reg wr_done;
assign data = (!wr_en && rd_done)? data_out : {DATA_SIZE{1'bz}};
always @ ( posedge clk )
begin
if( wr_en && chip_select )
begin
memory[addr] <= data;
wr_done <= 1'b1;
end
else
wr_done <= 1'b0;
end
always @ ( posedge clk )
begin
if ( !wr_en && chip_select)
begin
data_out <= memory[addr];
rd_done <= 1'b1;
end
else
rd_done <= 1'b0;
end
endmodule
6. APPENDIX C Test bench
`include "memory.v"
`include "cache.v"
module cache_tb;
parameter ADDR_SIZE = 8;
parameter DATA_SIZE = 32;
reg clk;
reg mem_clk;
reg reset;
reg [ADDR_SIZE-1:0] addr;
reg rd_en;
reg [DATA_SIZE-1:0] data_in;
wire chip_select;
wire rd_done;
wire wr_done;
wire [DATA_SIZE-1:0] data;
wire [DATA_SIZE-1:0] mem_data;
wire [ADDR_SIZE-1:0] mem_addr;
wire mem_wr;
wire data_valid;
wire busy;
initial
begin
clk = 0;
mem_clk = 0;
reset = 1;
#20 reset =0;
//WRITE MISS
//write 4 times to addresses 120 to 123.
addr = 120;
#2 rd_en = 0;
data_in = 56;
#60 addr = 121;
data_in = 57;
#60 addr = 122;
data_in = 58;
#60 addr = 123;
data_in = 59;
//READ MISSS
#175 rd_en = 1'b1;
//#25 rd_en = 1'b1;
addr = 120;
//READ HIT
#280 addr = 122;
//WRITE HIT
#50 rd_en = 1'b0;
data_in = 60;
#400 rd_en = 1'bz;
//$readmemh("dram.list",
end
assign data = (!rd_en)? data_in:{DATA_SIZE{1'bz}};
always
#5 clk = !clk;
always
#20 mem_clk = !mem_clk;
cache direct_mapped(
.clk(clk),
.reset(reset),
.addr(addr),
.rd_en(rd_en),
.data(data),
.mem_addr(mem_addr),
.mem_wr(mem_wr),
.mem_data(mem_data),
.rd_done(rd_done),
.wr_done(wr_done),
.data_valid(data_valid),
.busy(busy),
.chip_select(chip_select)
);
dram main_memory(
.clk(mem_clk),
.addr(mem_addr),
.data(mem_data),
.wr_en(mem_wr),
.rd_done(rd_done),
.wr_done(wr_done),
.chip_select(chip_select)
);
//initial
//#100 $finish;
endmodule
7. REFERENCES
http://www.faculty.iu-bremen.de/birk/lectures/PC101-2003/07cache/cache%20memory.htm
http://www.asic-world.com/verilog/index.html