Upload
manoj-kumar-naga
View
236
Download
0
Embed Size (px)
Citation preview
8/8/2019 Thesis Documentation Lit Survey
http://slidepdf.com/reader/full/thesis-documentation-lit-survey 1/12
Thesis Documentation -1
Literature Survey
Title: Low–power Multi port Register File for D.S.P.s
Introduction:
In recent years, the desire of portable operation of all types of electronic systems has
become clear. And a major factor in weight and size of portable devices is the amount of
batteries which is directly impacted by power dissipated by the electronic circuits. In addition,
the cost of providing power and associated cooling has resulted in significant interest in power
reduction even in non-portable applications which have access to a power source.
In particular, a digital signal processor (D.S.P) is widely used in mobile applications like
mobile phones and power consumption is quite critical in these computation intensive
processors. Almost all the D.S.P. algorithm related processing demands the following:
1. Fast computational blocks like MAC (Multiply and accumulate).
2. Multiple sets of data accessible simultaneously.
3. Large no. of Registers for implementing Delays and accumulation.
So, a large multi ported Register file is indispensible for D.S.P.s. These Register files are
quite large that they often occupy nearly 40% of total area and 40-50% of data path power. So,
a large multi-ported register which is also energy efficient is a necessity.
In survey of literature related to this area, first mention should be the article referred in
ref [1], which primarily guides our thoughts towards low power design of any electronic system.Here all possible aspects of system design are investigated with the goal of reducing power
consumption. Here it is assumed that application, which is desired to be implemented, is known
which is partially true for our case, as we consider register files for D.S.P.s. Basically CMOS
circuits dissipate power whenever capacitances are switched. Power can be reduced by
minimizing this capacitance through operation reduction, choice of number representation,
exploitation signal correlation, resynchronization to minimize glitching, logic design, circuit
design and physical design. So, we take the gist of their work as we confine ourselves to signal
processing.
Maintaining a given level of computation or through-put is a common concept in signalprocessing applications, in which there is no advantage in performing the computation faster
than some given rate, since the processor has to simply wait until further processing is
required. This enables an architecture driven voltage scaling strategy, in which reduction in
operating voltage leads to reduction in energy consumption and the resulting reduction in logic
speed is compensated through parallel architectures. They focus on various aspects of design
and show their effect on power dissipation such as:
8/8/2019 Thesis Documentation Lit Survey
http://slidepdf.com/reader/full/thesis-documentation-lit-survey 2/12
1. Capacitive voltage transitions
a. Logic function choice: ex: NOR gate and NAND gates yield different switching
probabilities of the output for the same input characteristics.
b. Logic style choice: Dynamic and Pseudo NMOS dissipate more power than Static
logic style
c. Input Signal characteristics.
d. Circuit topology: The manner in which logic gates are interconnected can have astrong influence on switching activity
2. Leakage component of Power
a. Reverse biased diode leakage.
b. Sub-threshold conduction.
3. Short-Circuit current component of Power
They also mention some Physical, circuit and logic level aspects of design in power point of
view.
The next mention in this field is work referred in [2], as we go more into our core designof constructing register files. This work, together with next two references forms examples
illustrating energy efficient design of Register files.
Consider the basic block diagram of typical Register file which is similar to SRAM except
that registers are more delay optimized and occupy much more area per bit than SRAM.
+
Figure: Block diagram of Typical Register file
Source: ref[2]
Write decoder and read decoder select the word line to be written / read respectively.
The storage cell contains two inverters connected back to back just like in the case of SRAM
8/8/2019 Thesis Documentation Lit Survey
http://slidepdf.com/reader/full/thesis-documentation-lit-survey 3/12
cell. All the bit lines are pre-charged in the first half of the clock and in the next half they
remain at 1 or discharge based on the value stored in the cell. Two kinds of cell structures arise
by the kind of sensing of cells: 1.Single ended and 2. Differential. The names are quite self-
explanatory. Shown above is Differential sensing in type so, a sense amplifier (S.A.) is used to
sense bit line quickly by the difference of voltages between bit and bit_bar. Detailed treatment
of these kinds comes a bit later.
Above figure shows a single read and single write port register file. A multi-port registeris kind of similar to this except that it has additional sets of decoders and output stages (one
per port) and some added control circuitry and looks as follows.
Source: ref[2]
Now, looking at our Register file we can divide the total power consumption into
decoder, control circuitry and the bank. The output stage, including buffers come under Bank
power. Authors show the simulated power of the three circuits. They are as follows
8/8/2019 Thesis Documentation Lit Survey
http://slidepdf.com/reader/full/thesis-documentation-lit-survey 4/12
Figure: power dissipated by different components of a refister file
Source : ref[2]
The graph is self-explanatory. We clearly see that the contribution of bank to the powerconsumption is much higher than the remaining in all the cases. So we have a detailed look at
the structure of the register cell. As seen earlier, there are basically two kinds of structures:
1. Differential-end Structure:
The following figure shows a traditional 1-bit N-entry 2R1W differential-end register file
circuit. WWL is a write word line control signals. The gray region is the storage cell. In write
operation, write decoder decodes Nth entry address and activates N th WWL line. Then M1
and M2 are ON, which enable storage cell written with input data. The same function is
done by read decoder and RWL signal during a Read operation.
Source: ref[2]
2. Single-end structure
The following figure shows typical single ended 1-bit N-entry Single ended register file
8/8/2019 Thesis Documentation Lit Survey
http://slidepdf.com/reader/full/thesis-documentation-lit-survey 5/12
Source: ref[2]
It’s a kind of modified in terms of operation compared to differential-end
structure. An inverter is present at the output in order to drive the bit line.Comparison of both circuits:
• For each read cycle in Diff-end ckt.
– Bit or bit_bar are active => more power
– No. transistors is more=>more leakage power
– But access time is low
– Suitable for registers for higher entry
• For single-end ckt.
– Access time is high
– Not suitable for higher entry registers(due to large bit line capacitance)
Source: ref[2]
They try to build a higher entry register using a combination of smaller entry registers
which can be designed using single ended cells. And connect all the bit lines to an AND gate.
They built 128 entry register file using 4 32-entry register files each with separate pre-charge
8/8/2019 Thesis Documentation Lit Survey
http://slidepdf.com/reader/full/thesis-documentation-lit-survey 6/12
circuits and 32 AND gates each having 4 inputs are required for combining all of them. This way
they reduce the bit-line capacitance as they use separate pre-charge circuit for each bank of
lower entry register file.
Next mention is the work referred in [3]. This is an example of exploiting the kind of
operations done by a D.S.P. They consider conventional register file architecture named “word-
level parallel architecture” for implementing multi-channel data processing hardware. It offersa higher degree of random access which is not required in this particular example. So, they
propose a new file architecture named “word-level serial architecture”, which has a limited
degree of random access that is sufficient for this example. The following figure helps in
understanding this.
Figures : (a) Word level parallel and (b) Word level Serial Architectures
Source: ref[3]
Registers storing state values of same channel are colored identically and the different
state registers of same channel are named 1, 2, 3… r. If there are n channels then the registers
required are n*r. In parallel architecture, all the registers are accessible to the ALU. In serial
architecture, only the registers of one channel are accessible to the ALU.
This method reduces the no. registers seen by the bus while writing/ reading the data
into the registers thereby, reducing the capacitance seen by the ports.
Power dissipation reduces in this Architecture because:
1. Reduction in capacitance of port.
2. Reduction in size of MUX needed.
3. Reduction in no. clock-gating cells required from (n*r) to r.
Where n is no. rows(or channels) and r is no. of columns(or states) in WL-serial Architecture.
So, n*r must give the total no. register entries.
8/8/2019 Thesis Documentation Lit Survey
http://slidepdf.com/reader/full/thesis-documentation-lit-survey 7/12
The first two reasons are obvious from figures 2&3. Third reason is explained by following
figure:
Source: ref[3]
Usually while writing into a register entry, data is made available to all the entries and
the enable signals coming from write decoder drive the clock gating cells in a way that only
required entry gets written. So, the no of clock-gating cells required is less in Word level serial
architecture.
The same idea is extended to single-channel processing at the end of the work by simply
partitioning the computations into two parts and then using half the no. of computational units,
compute the values accordingly. This will be clearer by the following figure.
8/8/2019 Thesis Documentation Lit Survey
http://slidepdf.com/reader/full/thesis-documentation-lit-survey 8/12
Next work (ref [4]) mentioned provides more practical guidelines for our design. Though
it doesn’t emphasize on low power design but it provides a basic idea about design of multi
ported register files. They design a 16 port (10 read 6 write) register file which provides
synchronous reads and asynchronous writes. They assure in every clock cycle all the reads
happen after the write requests are fulfilled. This is done by generating two out-of-phase local
clocks for read circuitry and write circuitry from the global clock available throughout the
processor.They have a storage cell and each and every cell is associated 16 ports which asks for a
large cell accommodating these lines so, they choose a 7-Transistor single ended cell design
because using differential ended design would end up doubling the lines required in each cell
thus increasing the area of the cell and thereby increasing the length of the lines which slows
down the register file operation.
Figure: structure of cell
Source: ref[4]
As we can see we have 10 read ports and 6 read ports connected to back-to-back
connected inverters. Additional buffer is to drive the large load of ports. The line ‘Neq’ is the
only addition which is to reduce the delay of the read ports providing less resistance path while
discharging the bit line.
The structural plan of register file is shown in the following figure. All the sixteen 32-bit
registers are placed in such a way that 16-bits of each entry sit on both sides of decoders. The
sixteen decoders sit in the middle and drive the word lines both sides, two biggest blocks are
the memory cells. Each Read port has an output latch to buffer the read value and drive those
values on to the bus. Each Write port contains an input register in order to latch the value to bewritten from the bus. In addition to that we have addressregisters one for every port for storing
the addresses to be written or to be read. A clock control circuitry is present in order to
generate local clocks from global clock.
8/8/2019 Thesis Documentation Lit Survey
http://slidepdf.com/reader/full/thesis-documentation-lit-survey 9/12
Figure: Floor-plan of the register file referred in ref[4]
When the positive edge of the system clock is coming, the addresses, accessing enable
signals and data will be locked into input registers. Then the write decoder, of which the enable
signal is high, translates the addresses into word lines. Only one word line is pulled up and the
others keep low. Practically, the data is driven at the same time and the write bit-line(WBLarrives earlier than the write word line(WWL). So when the selected word line logic is ’1’, the
data has been ready and can be stored into memory cells quickly. The read operation is similar
with the write operation except that the read bit-line (RBL) is precharged until the word-line is
in high voltage and then discharged through the read cell. As soon as the RBL is stab;e the
output latch will amplify and output the data. The timing diagram is shown below.
Figure: Timing diagram of register file
Source: ref [4]
The next work in ref[5] is again a design example but this emphasizes a little on low
power design they choose single ended read ports and double ended write ports in order to do
a quicker write. They design a 9- write and 17-read register file. The operation is same as the
above example in case of read. Here differential current mode write is employed in order to
meet the performance requirements. The following figure shows the structural difference in
the cell with respect to previous example.
8/8/2019 Thesis Documentation Lit Survey
http://slidepdf.com/reader/full/thesis-documentation-lit-survey 10/12
Figure: structure of register file referred in ref[5]
The work in ref[6] introduces a novel technique of conditional charge sharing which
detects the bit-line flips and avoids unnecessary precharging of bit lines in case of absence of
bit line flip. This reduces power dissipated during writes which doesn’t require bit line flips.
8/8/2019 Thesis Documentation Lit Survey
http://slidepdf.com/reader/full/thesis-documentation-lit-survey 11/12
REFERENCES
1. A.P. Chandrakasan, R.W. Brodersen, "Minimizing power consumption in
digital CMOS circuits," Proceedings of the IEEE , vol.83, no.4, pp.498-523, Apr
1995
2. Ting-Sheng Jau, Wei-Bin Yang, Chung-Yu Chang, “Analysis and Design of
High Performance, Low Power Multiple Ports Register Files”, in IEEE Asia Pacific
Conference on Circuits and Systems (APCCAS), pp. 1453 – 1456, 4-7 DEC 2006,
Singapore.
3. M.Mueller, A. Wortmann, S.Simon, S. Woke, S.Buch, M. Wroblewski,
J.A.Nossek, “low power register file architecture for application specific DSPs”, in
IEEE International Symposium on Circuits and Systems(ISCAS) , pp. VI-89 – VI-92,
26-29 MAY 2002,Arizona.4. Yu Qian, Wang Dong-hui, Zhang Tie-jun, Hou Chao-huan, "A design of
500MHz 10-read 6-write register file," ASIC, 2005. ASICON 2005. 6th International
Conference On , vol.1, no., pp.311-315, 24-0 Oct. 2005.
5. Shenglong Li, Zhaolin Li, Fang Wang , "Design of a high-speed low-power
multiport register file," Microelectronics & Electronics, 2009. PrimeAsia 2009.
Asia Pacific Conference on Postgraduate Research in , vol., no., pp.408-411, 19-21
Jan. 2009.
6. Kimish Patel, Wonbok Lee, Pedram, M., "Minimizing power dissipationduring write operation to register files," Low Power Electronics and Design
(ISLPED), 2007 ACM/IEEE International Symposium on , vol., no., pp.183-188, 27-
29 Aug. 2007.
7. Andrei Pavlov , Manoj Sachdev, “SRAM circuit design and operation”, CMOS SRAM Circuit Design and Parametric Test in Nano-Scaled Technologies, Vol.
40, pp. 13-38, 2008. 8. Masaki Kondo, Hiroshi Nakamura, “A Small, Fast and Low-Power Register
File by Bit-Partitioning”, in 11 th International Symposium on High-Performance
Computer Architecture(HPCA-11) , pp.40-49,12-16 FEB 2005 , San Francisco.
8/8/2019 Thesis Documentation Lit Survey
http://slidepdf.com/reader/full/thesis-documentation-lit-survey 12/12
9. K.K. Parhi, VLSI Digital Signal Processing Systems, Wiley-Interscience,
1999.
10. Jessica Hui-Chun Tseng, “Energy Efficient Register File Design”, M.S.Thesis,
Dept. of Electrical Engg. and Computer Science, Massachusetts Inst. Of Tech. ,
Dec. 1999.
11. D.Suvakovic, C.A.T. Salama, "Guidelines for use of registers andmultiplexers in low power low voltage DSP systems”, in Proceedings of the 8th
Great Lakes Symposium on VLSI, 1998., pp.26-29, 19-21 Feb 1998