Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Modular Multi-ported SRAM-based Memories
Ameer M.S. Abdelhadi
Guy G.F. Lemieux
Multi-ported Memories:A Keystone for Parallel Computation!
1
• Enhance ILP for processors and accelerators, e.g.– VLIW Processors
– CMPs
– Vector Processors
– CGRAs
• DSPs
✘Major FPGA vendors provide dual-ported RAM only!
✘ASIC RAM compilers provide limited ports!
Multi-ported SRAM Cell
Word
Bit Bit
Word1..n
Bit1..n
Bit1..n
2
✘ASICs / custom design only!
✘Increasing ports incurs higher delays and area consumption
RAM Multi-pumping
✘Performance degradation
✘Data dependencies
1 Write/1 Read Dual-port RAM
RData1
RDatanR-1
RData0
RAddr1
RAddrnR-1
RAddr0
RAddr
RData
WAddr
WData
Re
ad-p
ort
Wri
te-p
ort
Mod nRCounter ÷nRMod nW
Counter÷nW
WData1
WDatanW-1
WData0
end
d
d
de
cod
er
WAddr1
WAddrnW-1
WAddr0
en
en
ü Resources sharing
ü Low area
3
Multi-banking
• Divide memory into smaller banks• Distribute data using fixed hashing scheme• Access to same bank is resolved by multiple request• The Pentium (P5) has 8-way two port interleaved cache*
• Area efficient• Long arbitration delays• Variable access latency
RAM Bank m
Hashing; Arbitration; Conflict resolver
RAM Bank 2RAM Bank 1
Port 1 2 n
*[Alpert & Avnon, IEEE Micro, June 1993] 4
Multi-read by Bank Replication
• Example: Alpha 21264*
– Each integer cluster has a replicated 80-entry register file– The 72-entry floating-point cluster register file is
duplicated– number of read ports is doubled– Support two concurrent units each
1W/1R Dual-port RAM Array
Read Addr-Port Data
Addr WriteData -Port
ReadPort 1
WritePort
Read Addr-Port Data
Addr WriteData -Port
ReadPort 2
Read Addr-Port Data
Addr WriteData -Port
ReadPort n
Read- AddrPort 1 DataRead- AddrPort 2 Data
Read- AddrPort n Data
Write-Port
Data
Addr
1W/nR Multi-read RAM
*[Ditlow et al., IEEE ISSCC , Feb. 2011] 5
Register-based Multi-ported RAM
Infeasible on Altera’s high-end Stratix V with our smallest test-case!
Register-based Array
Width (w)
RData1
RAddr1
RDatanR-1
RAddrnR-1
RData0
RAddr0
D Q
En
D Q
En
D Q
En
D Q
En
WData1
WDatanW-1
WData0
Dep
th (d)
WAddr1
WAddrnW-1
WAddr0
Ad
dre
ss D
eco
de
rs
✘ High resources consumption for deep memories (scaling)
ü High performance for small caches (<1k lines)
6
LVT-based Approach
• Stores the ID of latest written bank• LVT is a multi-ported RAM for banks IDs– Implemented with registers– Still has scaling issues: infeasible for deep memories!
WAddr0
WData0
WAddr1
WData1
WAddrnW-1
WDatanW-1
RAddr0
RData0
RAddr1
RData1
RAddrnR-1
RDatanR-1
Bank0ID
Bank1ID
BanknW-1ID
nW wrire/nR read RAM
RAddrWAddr
BankSelnW wrire/nR read LVT
RAM 01 Write/nR Read
WA
dd
r
RA
dd
r
RData0
RData1
RDatanR-1
WData0
WData1
WDatanW-1
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
Register-based LVT
BankSel
7
LVT-based Multi-ported RAM Example (1)
WAddr0
RAddr0
RData0
RData1
WData0
WData1
Data Banks 1: 1W / 2R
WAddr
WData
RAddr0
RData0
RAddr1
RData1
0:1:2:3: ...
WAddr
WData
RAddr0
RData0
RAddr1
RData1
0:1:2:3: ...
Data Banks 0: 1W / 2R
WAddr0
WData1
RAddr0
RData0
RAddr1
RData1
0:1:2:3: ...
LVT Bank: 2W / 2R
WData0
WAddr1
0
1
WAddr1
RAddr1
@3
0x8C
@1
0x24
8
LVT-based Multi-ported RAM Example (2)
WAddr0
RAddr0
RData0
RData1
WData0
WData1
Data Banks 1: 1W / 2R
WAddr
WData
RAddr0
RData0
RAddr1
RData1
0:1:2:3: 0x8C...
WAddr
WData
RAddr0
RData0
RAddr1
RData1
0:1: 0x242:3: ...
Data Banks 0: 1W / 2R
WAddr0
WData1
RAddr0
RData0
RAddr1
RData1
0:1: 12:3: 0...
LVT Bank: 2W / 2R
WData0
WAddr1
0
1
WAddr1
RAddr1
@3
0x8C
@1
0x24
@3
@1
8
LVT-based Multi-ported RAM Example (3)
WAddr0
RAddr0
RData0
RData1
WData0
WData1
Data Banks 1: 1W / 2R
WAddr
WData
RAddr0
RData0
RAddr1
RData1
0:1:2:3: 0x8C...
WAddr
WData
RAddr0
RData0
RAddr1
RData1
0:1: 0x242:3: ...
Data Banks 0: 1W / 2R
WAddr0
WData1
RAddr0
RData0
RAddr1
RData1
0:1: 12:3: 0...
LVT Bank: 2W / 2R
WData0
WAddr1
0
1
WAddr1
RAddr1@3
0x8C
@1
0x24
1
0
@3
@1
8
XOR-based Multi-ported RAM*
• SRAM-based• XOR is used to embed and extract data back:
Embed: DATA=OLD⊕NEWExtract: DATA⊕OLD=OLD⊕NEW⊕OLD=NEW
BRAM0,0
BRAMnW-2,0
BRAM0,0
BRAM1,0
BRAMnR-1,0
BRAM0,1
BRAMnW-2,1
BRAM0,1
BRAM1,1
BRAMnR-1,1
BRAM0,nW-1
BRAMnW-2,nW-1
BRAM0,nW-1
BRAM1,nW-1
BRAMnR-1,nW-1
WData0
WData1
WDatanW-1
RData0
RData1
RDatanR-1
Replicas of read-portXOR all
other rows
1 Write / 1 Read RAM Arrays
9*[Laforest et al. ACM/SIGDA FPGA, Feb. 2012]
XOR-based Multi-ported RAM Example (1)
WAddr0
RData0
RData1
WData0
WData1
Data Banks 1: 1W / 3RWAddr
WData
RAddr0
RData0
RAddr2
RData2
0: I1,0
1: I1,1
2: I1,2
3: I1,3...
WAddr1 RAddr1
Data Banks 0: 1W / 3RWAddr
WData
RAddr0
RData0
RAddr1
RData1
0: I0,0
1: I0,1
2: I0,2
3: I0,3...
RAddr2
RData2
RAddr1
RData1
RAddr0
D3
@3
D1
@1
10
XOR-based Multi-ported RAM Example (2)
10
XOR-based Multi-ported RAM Example (3)
10
XOR-based Multi-ported RAM Example (4)
10
Motivation
#Registers #BRAMs
Register-based LVT û üXOR-based ü ûProposedI-LVT ü ü
11
Motivation
#Registers #BRAMs
Register-based LVT û üXOR-based ü ûProposedI-LVT ü ü
11
Motivation
#Registers #BRAMs
Register-based LVT û üXOR-based ü ûProposedI-LVT ü ü
11
Method
• Based on LVT approach• The LVT is a multi-ported RAM
with constant inputs (bank IDs)• SRAM-based LVT– Can be implemented with XOR-
based multi-ported RAM– Is generalized by the proposed I-
LVT approach– Two special cases are provided:
• Binary-coded I-LVT• One-hot-coded I-LVT
12
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
Register-based LVT
BankSel
Method
• Based on LVT approach• The LVT is a multi-ported RAM
with constant inputs (bank IDs)• SRAM-based LVT– Can be implemented with XOR-
based multi-ported RAM– Is generalized by the proposed I-
LVT approach– Two special cases are provided:
• Binary-coded I-LVT• One-hot-coded I-LVT
12
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
Register-based LVT
BankSel
WAddr0
WData0
WAddr1
WData1
WAddrnW-1
WDatanW-1
RAddr0
RData0
RAddr1
RData1
RAddrnR-1
RDatanR-1
Bank0ID
Bank1ID
BanknW-1ID
nW wrire/nR read RAMRAddrWAddr
BankSelnW wrire/nR read LVT
Method
• Based on LVT approach• The LVT is a multi-ported RAM
with constant inputs (bank IDs)• SRAM-based LVT– Can be implemented with XOR-
based multi-ported RAM– Is generalized by the proposed I-
LVT approach– Two special cases are provided:
• Binary-coded I-LVT• One-hot-coded I-LVT
12
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
Register-based LVT
BankSel
WAddr0
WData0
WAddr1
WData1
WAddrnW-1
WDatanW-1
RAddr0
RData0
RAddr1
RData1
RAddrnR-1
RDatanR-1
Bank0ID
Bank1ID
BanknW-1ID
nW wrire/nR read RAMRAddrWAddr
BankSelnW wrire/nR read LVT
SRAM
Method
• Based on LVT approach• The LVT is a multi-ported RAM
with constant inputs (bank IDs)• SRAM-based LVT– Can be implemented with XOR-
based multi-ported RAM– Is generalized by the proposed I-
LVT approach– Two special cases are provided:
• Binary-coded I-LVT• One-hot-coded I-LVT
12
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
Register-based LVT
BankSel
WAddr0
WData0
WAddr1
WData1
WAddrnW-1
WDatanW-1
RAddr0
RData0
RAddr1
RData1
RAddrnR-1
RDatanR-1
Bank0ID
Bank1ID
BanknW-1ID
nW wrire/nR read RAMRAddrWAddr
BankSelnW wrire/nR read LVT
SRAM
XO
R-b
ase
d
Method
• Based on LVT approach• The LVT is a multi-ported RAM
with constant inputs (bank IDs)• SRAM-based LVT– Can be implemented with XOR-
based multi-ported RAM– Is generalized by the proposed I-
LVT approach– Two special cases are provided:
• Binary-coded I-LVT• One-hot-coded I-LVT
12
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
Register-based LVT
BankSel
WAddr0
WAddr1
WAddrnW-1
RAddr0
RData0
RAddr1
RData1
RAddrnR-1
RDatanR-1
RAddrWAddr
BankSelnW wrire/nR read LVT
SRAM
I-LV
T
Invalidation Table Approach (I-LVT)
• A bank for each write• A single write to a
specific bank invalidates all the other banks
• Feedbacks are received from all other banks
• ffb generates a new data that contradicts all the other banks
• fout detects the non-contradicting bank ID
Bank i
ü
Bank 0
Bank n-1
û
û
...
...
13
Invalidation Table Approach (I-LVT)
• A bank for each write• A single write to a
specific bank invalidates all the other banks
• Feedbacks are received from all other banks
• ffb generates a new data that contradicts all the other banks
• fout detects the non-contradicting bank ID
Bank i
ü
Bank 0
Bank n-1
...
...
...
û
û
13
Invalidation Table Approach (I-LVT)
• A bank for each write• A single write to a
specific bank invalidates all the other banks
• Feedbacks are received from all other banks
• ffb generates a new data that contradicts all the other banks
• fout detects the uncontradicted bank ID
Bank i
ü
Bank 0
Bank n-1
...
...
...
û
û
ffb
13
Invalidation Table Approach (I-LVT)
• A bank for each write• A single write to a
specific bank invalidates all the other banks
• Feedbacks are received from all other banks
• ffb generates a new data that contradicts all the other banks
• fout detects the uncontradicted bank ID
Bank i
Bank 0
Bank n-1
...
...
...i
...foutffb ü
û
û
13
Invalidation Table Approach (I-LVT)
• A bank for each write• A single write to a
specific bank invalidates all the other banks
• Feedbacks are received from all other banks
• ffb generates a new data that contradicts all the other banks
• fout detects the uncontradicted bank ID
Write-Port
Write-Port
Waddr0
Waddr1
WaddrnW-1
Raddr0Raddr1
RaddrnR-1
RBankSelnR-1
Feedback Read-Ports (Address,Data) Pairs
f fb,nW-1
f fb,0
Out Read- AddrPort nR-1 Data
Out Read- AddrPort 0 DataOut Read- AddrPort 1 Data
FB Read- AddrPort 0 Data
FB read- Addrport nW-2 Data
1 Write/nW+nR-1 Read RAM Bank 0
Write-Port
Data
Addr
Out Read- AddrPort nR-1 Data
Out Read- AddrPort 0 DataOut Read- AddrPort 1 Data
FB Read- AddrPort 0 Data
FB read- Addrport nW-2 Data
1 Write/nW+nR-1 Read RAM Bank 1
Data
Addr
Out Read- AddrPort nR-1 Data
Out Read- AddrPort 0 DataOut Read- AddrPort 1 Data
FB Read- AddrPort 0 Data
FB read- Addrport nW-2 Data
1 Write/nW+nR-1 Read RAM Bank nW-1
Data
Addr
f fb,1
0 nW-2
0
Port#
Bank#
0 nW-2
0
0 nW-2
nW-1
f out
f out
f out
RBankSel1
RBankSel1
13
Bank ID Embedding: Binary-coded Bank Selectors
Feedback function for bank k:
Output function (all banks):
14
Mutual-exclusive Conditions: One-hot-coded Bank Selectors
Feedback function for bank k:
Output function (check if condition match):
15
Mutual-exclusive Conditions Examples
• Each lines pair has a negated conditions
• One and only one line is logically true
�� = 2: ����,�: ����� 0 ← ����� 0
���,�: ����� 0 ← ����� 0
�� = 3:
���,�: ���k� 1: 0 ← ����� 0 , ����� 0
���,�: ����� 1: 0 ← ����� 1 , ����� 0
���,�: ����� 1: 0 ← ����� 1 , ����� 1
16
One-hot/Binary Coded 2W/2R Example (1)
WAddr0
RBankSel0
RBankSel1
Data Banks 1: 1W / 3RWAddr
WData
RAddr0
RData0
RAddr2
RData2
0: 01: 02: 03: 0...
WAddr1 RAddr1
Data Banks 0: 1W / 3RWAddr
WData
RAddr0
RData0
RAddr1
RData1
0: 01: 02: 03: 0...
RAddr2
RData2
RAddr1
RData1
RAddr01 bit
1 bit
@3
@1
17
Condition:����� = �����
Condition:����� ≠ �����
One-hot/Binary Coded 2W/2R Example (2)
WAddr0
RBankSel0
RBankSel1
Data Banks 1: 1W / 3RWAddr
WData
RAddr0
RData0
RAddr2
RData2
0: 01: 02: 03: 0...
WAddr1 RAddr1
Data Banks 0: 1W / 3RWAddr
WData
RAddr0
RData0
RAddr1
RData1
0: 01: 02: 03: 0...
RAddr2
RData2
RAddr1
RData1
RAddr01 bit
1 bit
@3
@1
@1
@3
0
1
17
Condition:����� = �����
Condition:����� ≠ �����
One-hot/Binary Coded 2W/2R Example (3)
WAddr0
RBankSel0
RBankSel1
Data Banks 1: 1W / 3RWAddr
WData
RAddr0
RData0
RAddr2
RData2
0: 01: 12: 03: 0...
WAddr1 RAddr1
Data Banks 0: 1W / 3RWAddr
WData
RAddr0
RData0
RAddr1
RData1
0: 01: 02: 03: 0...
RAddr2
RData2
RAddr1
RData1
RAddr01 bit
1 bit
@3
@1
@1
@3
0
1
17
Condition:����� = �����
Condition:����� ≠ �����
One-hot/Binary Coded 2W/2R Example (4)
WAddr0
RBankSel0
RBankSel1
Data Banks 1: 1W / 3RWAddr
WData
RAddr0
RData0
RAddr2
RData2
0: 01: 12: 03: 0...
WAddr1 RAddr1
Data Banks 0: 1W / 3RWAddr
WData
RAddr0
RData0
RAddr1
RData1
0: 01: 02: 03: 0...
RAddr2
RData2
RAddr1
RData1
RAddr01 bit
1 bit
@1
@3
@1@1
0
0
@3
1
0
@3
1
0
17
Condition:����� = �����
Condition:����� ≠ �����
3W/2R I-LVT Implementation
Binary-coded I-LVT
Waddr0
Raddr0Raddr1
RBankSel1
Feedback Read-Ports
Write-Port
Data
Addr
1 Write/4 Read RAM Bank 0 - 2 Bits Width
‘00’
Port#
Bank#
0 1 0
1
1 0 1
20
{{{{{{{ { {
FB Read Addr-Port 1 Data‘10’
‘01’
Waddr1
Waddr2
FB Read Addr-Port 0 Data
FB Read Addr-Port 1 Data
FB Read Addr-Port 0 Data
FB Read Addr-Port 1 Data
FB Read Addr-Port 0 Data
RBankSel0
Write-Port
Data
Addr
1 Write/4 Read RAM Bank 0 - 2 Bits Width
Write-Port
Data
Addr
1 Write/4 Read RAM Bank 0 - 2 Bits Width
OutRead Addr-Port 1 Data
Out Read Addr-Port 0 Data
Out Read Addr-Port 1 Data
Out Read Addr-Port 0 Data
Out Read Addr-Port 1 Data
Out Read Addr-Port 0 Data
Waddr0
Raddr0Raddr1
Feedback Read-Ports
Write-Port
Data
Addr
1 Write/4 Read RAM Bank 0 - 2 Bits Width
Port#
Bank#
0 1 0
1
1 0 1
20
{{{{{{{ { {
Waddr1
Waddr2
FB Read Addr-Port 1 Data
FB Read Addr-Port 0 Data
[1]
[1]
[0]
[1]
[0]
[0]
a
b
c
RBankSel1[0]
RBankSel[0] ← a={b[0],c[0]} ?
RBankSel[1] ← b={a[0],c[1]} ?
RBankSel[2] ← c={a[1],b[1]} ?
Write-Port
Data
Addr
1 Write/4 Read RAM Bank 1 - 2 Bits Width
Write-Port
Data
Addr
1 Write/4 Read RAM Bank 2 - 2 Bits Width
[0]
[1]
[0]
[1]
[0]
[1]
RBankSel1[1]
RBankSel1[2]
a
b
c
RBankSel0[0]
[0]
[1]
[0]
[1]
[0]
[1]
RBankSel0[1]
RBankSel0[2]
FB Read Addr-Port 1 Data
FB Read Addr-Port 0 Data
Out Read Addr-Port 0 Data
Out Read Addr-Port 0 Data
FB Read Addr-Port 1 Data
FB Read Addr-Port 0 Data
Out Read Addr-Port 0 Data
Out Read Addr-Port 0 Data
Out Read Addr-Port 0 Data
Out Read Addr-Port 0 Data
One-hot-coded I-LVT
18
SRAM Consumption
• XOR-based consumes fewer SRAM cells if:
� < ��� �� + 1,���� �� ⋅ �������
����(Unlikely!!)
• Otherwise, one-hot consumes fewer SRAM cells than binary-coded if:
1 < �� ≤ 3�� �� <�� − 1 ⋅ ���� �� − 1
�� − 1 − ���� �������
19
Register-based LVT � ⋅ � ⋅ �� ⋅ ��
XOR-based � ⋅ � ⋅ �� ⋅ �� + � ⋅ � ⋅ �� ⋅ �� − 1
Binary-coded I-LVT � ⋅ � ⋅ �� ⋅ �� + � ⋅ log� �� ⋅ �� ⋅ (�� − 1) + � ⋅ log� �� ⋅ �� ⋅ ��
One-hot-coded I-LVT � ⋅ � ⋅ �� ⋅ �� + � ⋅ �� + 1 ⋅ �� ⋅ (�� − 1)
Usage Guideline
20
I-LVTX
OR
-bas
ed
Register-based LVT
Register-based RAMwidth
depth
Experimental Environment
• Different ~1k designs have been synthesized with various parameters sweep– Altera’s Quartus II with Altera’s Stratix V device
• Verified with Altera’s ModelSim– Over Million RAM cycles for each configuration
• Bypassing capability:– New data read-after-write (same as Altera’s M20K)– New data read-during-write (same as a single register)
• Parameterized Verilog and simulation/synthesis run-in-batch manager are available online:
21
https://code.google.com/p/multiported-ram/https://code.google.com/p/multiported-ram/
Experimental ResultsBRAM Consumption
• Compared to XOR-based approach:Average of 19%; up to 44% BRAM reduction
• #BRAM compared to 32bit wide register-based LVT:– Up to 200 % in XOR-based – Up to 12.5% in I-LVT-based
22
Experimental ResultsFmax
• Compared to XOR-based approach:Average of 38%; up to 76% Fmax increase
• One-hot-coded I-LVT exhibits the highest Fmax– Due to fast feedback paths – BRAM consumption still within 6% of the minimal
23
Conclusions
Modular multi-ported SRAM-based memories for embedded systems• Based on dual-ported BRAMs• Dramatically lower resources consumption and
higher performance than previous approachesClose to register-based LVT BRAM consumption;
No further significant improvement can be done
• Additional features e.g. bypassing and initializing• Ready to use open source parameterized Verilog
and a run-in-batch manager are available online
24
https://code.google.com/p/multiported-ram/https://code.google.com/p/multiported-ram/