A Power Efficient Architecture for 2-D Discrete Wavelet Transform

A POWER EFFICIENT ARCHITECTURE FOR 2-D DISCRETE WAVELET TRANSFORM

Rahul Jain, CoWare India

Preeti Ranjan Panda, IIT-Delhi

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

Agenda

� Memory Power Optimization

� Existing Z-Scan based Schemes

� Low Power Z-Scan (Proposed Architecture )

� Results

� Conclusion

� Importance of Optimizing Memory System Energy

� Many emerging applications like JPEG2000 are data intensive

� Memory system can contribute up to 90% energy

� Concurrently Optimizing Memory Architecture and Accesses

� Algorithm Level� Reduce memory requirement

� Improve regularity of accesses

� Build optimized memory architecture� Memory Partitioning

� Custom Circuits

Memory Power Optimization

Z-Scan based Schemes [Chiu-SIPS’03]

� Suspending a DWT line computation

� Store 4 intermediate values

� Z-Scan

� Column Processing starts early

� On-Chip Buffer Required = 4*MM =Image Tile ht

� Optimal Z-Scan

� EBCOT Code-Block size (CW*CH) considered

� On-Chip Buffer Required = 4*M+4*2*CW

� Usually CW=CH=64 (values used in exp.)

Low-Power Z-Scan (1)

� Generalize the Z-Scan� Compute r elements in a row� For Z Scan, r =2� For Optimal Z-Scan, r = 2*CW� On-Chip Buffer Required = 4*M+4*r

Low-Power Z-Scan (2)

� r will be a sub-integral multiple of 2*CW� This considers the Code Block Size

� 2 separate buffers used� Row Buffer (RB) = 4*M� Column Buffer (CB) = 4*r

� How to decide the value of r ?� Size of CB α r� RB Sleep Time α r

CB: r locations

RB in Low Power Mode

RB access

Memory Power Analysis (1)

� Let us assume that each element is computed in unit time (Energy and Power can be used interchangeably)

� For a memory of size 2n, Let

� Pa(2n) : memory access power

� Ps(2n) : sleep mode / data retention mode power

� Pw(2n) : wakeup power for each state transition from

sleep mode to active mode

� Let, Ps(2n) = s* Pa (2

n) and Pw (2n) = w* Pa (2n)

� s = 0.1, w = 0.33 (Assumed for Experiments)

� Buffer Accesses

� Read at Resumption

� Write at Suspension

� Row Buffer Power

� 2 access per r elements

� RB in sleep mode for r-2 element computation

� Wakeup RB once per row

� Power per ‘r’ element computation:

Prow_buffer (r, M) = 2* Pa(M) + (r-2) * Ps(M) + Pw(M)

RB in Low Power Mode

Row Computation Suspends

Row Computation Resumes

Wakeup

� Column Buffer Power

� 1 access per element

� Power consumption per element computation:

Pcol_buffer (r) = Pa(r)

� Power per 2-D DWT Element Computation:

Prow_buffer (r, M)/r + Pcol_buffer (r)

Col Computation Suspends

Col Computation Resumes

Variation of Power with r

0.00E+00

1.00E-10

2.00E-10

3.00E-10

4.00E-10

5.00E-10

6.00E-10

2 4 8 16 32 64 128

Value of r

Energy (J)

� Banked Buffer

� Increases the average idleness of the each buffer

� Lower Access Power

� Predictable state changes, no timing overheads

� Let there be ‘b’ RB banks and ‘c’ CB banks

� Average RB power per element:

Prow = [Power of bank in use*M/b + Sleep Power*(M-M/b)] / M

= [{Prow_buffer (r, M/b) / r} * M/b + Ps (M/b) * (M-M/b)] / M

� Each bank waked up once for M*r elements� Additional Row Buffer Wakeups per Element = b/M*r

Power Implications of Banking (1)Power Implications of Banking (1)

� Average column-buffer power per element:

Pcol = [{Pcol_buffer (r/c)} * r/c + Ps (r/c) * (r-r/c)] / r

� No of Column Buffer Wakeups per Element = c/r

� Additional Wakeup Power :

Pwakeups = [Pw(M/b) * b/M*r ] + [ Pw(r/c) * c/r ]

� MUX power considered

� Total Power per Element :

Prow + Pcol + Pwakeups + Pmux

Power Implications of Banking (2)Power Implications of Banking (2)

r vs Power (Banked Case, M=512)

Min Power with r=64, c=4, b=8

Energy Consumption Comparison

MZ-scan

(10-11J)

Optimal Z-scan

(10-11J)

Low-Power Z-scan

(10-11J)r c b

32 23.4 29.1 8.08 32 4 4 72.2

64 25.5 29.3 8.13 64 4 4 72.3

128 29.9 29.7 8.18 64 4 8 72.5

256 38.5 30.6 8.29 64 4 8 72.9

512 55.8 32.3 8.49 64 4 8 73.7

1024 90.3 35.8 8.89 64 4 8 75.2

Up to 90% and 75% improvement over Z-Scan and Optimal Z-Scan respectively

Energy Modelling

� Sequential Access Memory [Moon-CICC’02]

� Configured as a circular buffer

� Address Sequencing logic and decoders replaced with row sequencer to get low power and high speed

� Banked implementation used for big memory

� Energy Modelling [Coumeri-TVLSI’00]

� Empirical Equations for modelling energy of on-chip SRAM memory

� Model parameters are Size, Bit Width, Access Mode

� Individual equations for different memory components

� To model SAM, Row Decoder, Column Decoder, Buffers not considered

Conclusion

� A methodology to arrive at a Low-Power DWT architecture proposed

� Co-Optimization of Memory Architecture and Access pattern done

� Up to 90% energy saving achieved

� The derived architecture depends on the target memory technology

� Would lead to different architectures for ASIC and FPGA implementations

References:

� [Chiu-SIPS’03]: Mu-Yu Chiu et al (2003).Optimal data transfer and buffering schemes for JPEG2000 encode. IEEE Workshop on SIPS, Aug. 2003, pp. 177 – 182

� [Moon-CICC’02]: Joong-Seok Moon et.al (2002). Low-power sequential access memory design. Custom Integrated Circuits Conference, 2002. pp.111 – 114

� [Coumeri-TVLSI’00]: Coumeri, S.L et al (2000). Memory modelling for System Synthesis. IEEE Trans. VLSI Systems, , June 2000, pp:327 – 334

Thank You

Questions!

Backup Slides

Discrete Wavelet Transform� 2D wavelet transform:

� 1st:1D wavelet transform to all rows

� 2nd:1D wavelet transform to all columns

� Each Row/Column can be computed independently

� Store 4 values at line computation suspension

Z(2i+1)

Z(2i)0 2 4 6 8

Y(2i+1)

1 3 5 7

Colored arrows show multiplication by constants a, b, c, ddefined in JPEG2000 standard

Buffer Structure

� The Buffers are all the time full

� They are accessed like a circular FIFO

� General Memory Row Decoder not required

� use a counter

� use a shift register loaded with a 1 initially

� Every Write Signal

� Increments the counter

� Shifts the Register

� Store all the 4 intermediate values in one Column

� No need for the Column Decoder

� This would be similar to Sequential Access Memory (SAM) [Moon-CICC’02]

A Power Efficient Architecture for 2-D Discrete Wavelet Transform

Education

a comparison of discrete cosine transform and discrete wavelet transform algorithm in

Study of the Discrete Wavelet Transform based Designed

1 Outline Introduction to JEPG2000 Why another image compression technique Features Discrete Wavelet Transform Wavelet transform Wavelet implementation

JPEG IMAGE DISCRETE WAVELET TRANSFORM COMPRESSION USING …publications.anveshanaindia.com/wp-content/uploads/2… · · 2016-05-21JPEG IMAGE DISCRETE WAVELET TRANSFORM COMPRESSION

Lip Reading by Using 3-D Discrete Wavelet Transform with Dmey Wavelet

Wavelets & Wavelet Algorithms: 1D Discrete Fourier Transform & Inverse Discrete Fourier Transform

Ripples in Mathematics: The Discrete Wavelet Transform

Discrete Wavelet Transform

Wavelet theory and applications - TU/e · The discrete wavelet transform (DWT) uses ﬁlter banks to perform the wavelet analysis. The discrete wavelet transform decomposes the signal

Image Retrieval Using Haar Discrete Wavelet Transform

IRJET-VLSI Architectures for 3D Discrete Wavelet Transform and Applications of Wavelet Transform– A Comprehensive Study

VLSI Implementation of Discrete Wavelet Transform

DISCRETE WAVELET TRANSFORM IN LINEAR SYSTEM IDENTIFICATION ... · DISCRETE WAVELET TRANSFORM IN LINEAR SYSTEM IDENTIFICATION Zdeněk Váňa ... Mathematical modeling of systems and

A Lifting-Based Discrete Wavelet Transform and Discrete

Discrete Wavelet Transform, Lifting, and Image Coding: An

Application of Discrete Wavelet Transform for Differential

Image Denoising Using Discrete Wavelet Transform

Watermarking Using Three Level Discrete Wavelet Transform

Discrete Wavelet Transform Based Video … Wavelet Transform Based Video Watermarking Technique Sneha ... strategy for Discrete Wavelet Transform-based video ... This generated flag

Discrete Wavelet Transform based Steganography Group-22