Transcript
Page 1: Lifting Scheme Cores for Wavelet Transform

Lifting Scheme Cores for Wavelet Transform

David Barina(supervised by Pavel Zemcik)

1 / 24

Page 2: Lifting Scheme Cores for Wavelet Transform

DWT in image processing

can be found in many image-processing tasks

I analysis(edge detection, feature extraction, multiscale representation),

I compression (JPEG 2000, Dirac),

I watermarking, edge sharpening, contrast enhancement,tone mapping, denoising, fusion, etc.

2 / 24

Page 3: Lifting Scheme Cores for Wavelet Transform

Filter bank

S. Mallat, ”A theory for multiresolution signal decomposition: The wavelet representation” (1989)

H̃(z−1) a

d

↓ 2

+

G̃(z−1) ↓ 2

↑ 2 H(z)

↑ 2 G(z)

decomposition: two complementary filters,high number of operations

3 / 24

Page 4: Lifting Scheme Cores for Wavelet Transform

Lifting scheme

I. Daubechies, W. Sweldens, ”Factoring wavelet transforms into lifting steps” (1998)

a

d

split P̃ (z−1)T P (z) merge

P (z) =

I−1∏i=0

{[1 Si(z)0 1

] [1 0

Ti(z) 1

]}[K 00 1/K

]

decomposition: sequence of simple filtering steps,reduces the number of operations, split: even, odd

4 / 24

Page 5: Lifting Scheme Cores for Wavelet Transform

CDF 9/7 wavelet

I. Daubechies, W. Sweldens, ”Factoring wavelet transforms into lifting steps” (1998)

α

β

γ

δ

input

output

steps

even samples

odd samples

P̃ (z) =

[1 α

(1 + z−1

)0 1

] [1 0

β (1 + z) 1

] [1 γ

(1 + z−1

)0 1

] [1 0

δ (1 + z) 1

] [ζ 00 1/ζ

]

four two-tap symmetric filters

5 / 24

Page 6: Lifting Scheme Cores for Wavelet Transform

2-D decomposition

S. Mallat, ”A theory for multiresolution signal decomposition: The wavelet representation” (1989)

a h

v d

horizontal vertical

h

v d

a h

dv

image: 2-D signal, by a series of 1-D transforms, four subbands,multi-scale decomposition

6 / 24

Page 7: Lifting Scheme Cores for Wavelet Transform

Lenna

how to calculate this as efficiently as possible

7 / 24

Page 8: Lifting Scheme Cores for Wavelet Transform

Strategies and issues

R. Kutil, ”A single-loop approach to SIMD parallelization of 2-D wavelet lifting” (2006)

a h

v d

horizontal vertical

strategies row-column, block-based, and line-based

cache issues cache line, limited size, set associativity, prefetching

techniques padding, aggregation, memory layouts,interleave loops, parallelization

the approaches have to repeatedly visit samples,memory access is expensive ⇒ CPU cache, limitations,existing techniques, single-loop approach

8 / 24

Page 9: Lifting Scheme Cores for Wavelet Transform

Unsolved issues

2 × 2

prolog

core

epilog

prolog epilog

F

F

FF

I complicated border treatment (prolog/epilog phases)I suspend/resume processing

I arbitrary processing order (scan order)

I interleave the transform and a subsequent processing

I multi-scale decomposition

I reorganization of underlying scheme9 / 24

Page 10: Lifting Scheme Cores for Wavelet Transform

Objectives of the thesis

Aims improve image transform performance and resourceconsumption

Objectives eliminate the shortcomings of existing methodsprevious slide

Evaluation prove experimentallyperformance, memory requirements

10 / 24

Page 11: Lifting Scheme Cores for Wavelet Transform

Lifting core

D. Barina, P. Zemcik, ”Vectorization and parallelization of 2-D wavelet lifting” (in press)

solution: a processing unit

I continuously consumes an input and produces an output

I which visits every image sample only once (cache friendly)

I which is aware of image coordinates (can handle the borders)

I whose configuration (state) can be saved/restored

I which can be run in any direction

I which can be SIMD vectorized

I which can run in parallel (on independent parts of the image)

y = C x

xdef= In ‖ B y

def= On ‖ B

11 / 24

Page 12: Lifting Scheme Cores for Wavelet Transform

Core examples

D. Barina, P. Zemcik, ”Vectorization and parallelization of 2-D wavelet lifting” (in press)

α

β

γ

δ mn

1 2 3 4

core inputs, outputs

12 / 24

Page 13: Lifting Scheme Cores for Wavelet Transform

Processing orders

D. Barina, P. Zemcik, ”Vectorization and parallelization of 2-D wavelet lifting” (in press)

horizontal horiz. strips horiz. blocks

vertical vert. strips vert. blocks

13 / 24

Page 14: Lifting Scheme Cores for Wavelet Transform

Borders treatment

D. Barina, P. Zemcik, ”Vectorization and parallelization of 2-D wavelet lifting” (in press)

d a d a d a d a d a d a d a d a d a d

d a d a d a d a d a d a d a d a d a d a

n n n n n n n

a d aad

n nnnn

d a d a d a d a d a d a d a d a d a d

0

d a d a d a d a d a d a d a d a d a d a

2 n N − 2 N

0 0

n n n n n n

a

y = Cn x

cores gracefully treats the boundaries

14 / 24

Page 15: Lifting Scheme Cores for Wavelet Transform

Parallel cores and reorganization

M. Kula, D. Barina, et al., ”Block-based Approach to 2-D Wavelet Transform on GPUs” (2016)

1 2 3 4Sweldens1995

1 2 3Iwahashi2007

1 2proposed

15 / 24

Page 16: Lifting Scheme Cores for Wavelet Transform

3-D core

D. Barina, P. Zemcik, ”Real-Time 3-D Wavelet Lifting” (2015)

x

y

z

buffer x

buffer y

buffer z

extended into more dimensions, buffers on the sides

16 / 24

Page 17: Lifting Scheme Cores for Wavelet Transform

CPU implementation

D. Barina, P. Zemcik, ”Vectorization and parallelization of 2-D wavelet lifting” (in press)

0.0 s

5.0ns

10.0ns

15.0ns

20.0ns

25.0ns

30.0ns

35.0ns

40.0ns

45.0ns

50.0ns

1.0k 10.0k 100.0k 1.0M 10.0M 100.0M

tim

e /

pix

el

pixels

separable approach core approach

an evaluation of approaches,implemented the separable, single-loop, and core

17 / 24

Page 18: Lifting Scheme Cores for Wavelet Transform

3-D CPU implementation

D. Barina, P. Zemcik, ”Real-Time 3-D Wavelet Lifting” (2015)

x

y

z

buffer x

buffer y

buffer z

0.0 s

20.0ns

40.0ns

60.0ns

80.0ns

100.0ns

120.0ns

140.0ns

160.0ns

0.0 50.0M 100.0M 150.0M 200.0M 250.0M

tim

e /

voxe

l

voxels

naive horizontalnaive vertical

core 42

core 23

core 43

performance of 3-D transform: separable, 2-D core, 3-D core

18 / 24

Page 19: Lifting Scheme Cores for Wavelet Transform

GPU implementation

M. Kula, D. Barina, et al., ”Block-based Approach to 2-D Wavelet Transform on GPUs” (2016)

80.0 100.0 120.0 140.0 160.0 180.0 200.0 220.0 240.0 260.0

0.0 10.0M 20.0M 30.0M 40.0M 50.0M 60.0M 70.0M

GB

/s

pixels

Kucis2014Separable Block

Non-Separable Block

0

10

20

30

40

50

60

100kpel 1Mpel 10Mpel 100Mpel

GB

/s

SweldensIwahashi*

Explosive*

Monolithic*

Polyphase*

Monolithic∗ scheme:

left: SotA is in red, block methods in blue/green, reorganizationright: block methods, separable in black, our in blue/green

19 / 24

Page 20: Lifting Scheme Cores for Wavelet Transform

FPGA implementation

D. Barina, et al., ”Single-Loop Approach to 2-D Wavelet Lifting with JPEG 2000 Compatibility” (2015)

H V

BRAM

Input Transform

core FF LUT BRAMlatency 4 441 (0.1 %) 399 (0.18 %) 6 (1.1 %)latency 2 391 (< 0.1 %) 592 (0.27 %) 6 (1.1 %)

architecture device BRAM [bits] clocks/pel time [ms]Dillen2003 VirtexE1000-8 50K 0.50 1.20Descampe2004 Virtex-II XC2V6000 N/A 0.60 1.75Seo2007 Altera Stratix 128K 2.64 6.02Zhang2012 Virtex-II Pro XC2VP30 6× 18K 0.50 0.97the cores Zynq XC7Z045 1× 36K 0.26 0.27

20 / 24

Page 21: Lifting Scheme Cores for Wavelet Transform

JPEG 2000 implementation

D. Barina, O. Klima, P. Zemcik, ”Single-Loop Architecture for JPEG 2000” (2016)

core

codeblock

2 × 2cn

2 × 2cm

aj

aj+1

h v d

0.0

20.0

40.0

60.0

80.0

100.0

120.0

140.0

100.0k 1.0M 10.0M 100.0M 1.0G

tim

e [

ns]

resolution [pel]

proposedOpenJPEG

JasPerFFmpeg

21 / 24

Page 22: Lifting Scheme Cores for Wavelet Transform

Contributions of the thesis

Aims improved image transform performance and resourceconsumption

Objectives eliminated the shortcomings of existing methods

Evaluation assessed experimentally(performance, memory requirements)

evaluation performed:2-D on CPU, 3-D on CPU, 2-D on GPU, 2-D on FPGA,JPEG 2000 on CPU

22 / 24

Page 23: Lifting Scheme Cores for Wavelet Transform

Selected papersI Barina, D.; Klima, O.; Zemcik, P.: Single-Loop Software Architecture for JPEG 2000. In

Data Compression Conference (DCC), 2016

I Barina, D.; Musil, M.; Musil, P.; et al.: Single-Loop Approach to 2-D Wavelet Lifting withJPEG 2000 Compatibility. In Workshop on Applications for MultiCore Architectures(WAMCA), 2015

I Barina, D.; Zemcik, P.: Minimum Memory Vectorisation of Wavelet Lifting. In AdvancedConcepts for Intelligent Vision Systems (ACIVS), 2013

I Barina, D.; Zemcik, P.: Wavelet Lifting on Application Specific Vector Processor. InGraphiCon, 2013

I Barina, D.; Zemcik, P.: Diagonal Vectorisation of 2-D Wavelet Lifting. In IEEE InternationalConference on Image Processing (ICIP), 2014

I Barina, D.; Zemcik, P.: Real-Time 3-D Wavelet Lifting. In International Conference inCentral Europe on Computer Graphics, Visualization and Computer Vision (WSCG), 2015

I Barina, D.; Zemcik, P.: Vectorization and parallelization of 2-D wavelet lifting. Journal ofReal-Time Image Processing (JRTIP), in press

I Barina, D.; Klima, O.; Zemcik, P.: Single-Loop Architecture for JPEG 2000. In: Image andSignal Processing (ICISP), 2016

I Kula, M.; Barina, D.; Zemcik, P.: Block-based Approach to 2-D Wavelet Transform on GPUs.In International Conference on Information Technology – New Generations (ITNG), 2016

I Kucis, M.; Barina, D.; Kula, M.; et al.: 2-D Discrete Wavelet Transform Using GPU. InWorkshop on Application for Multi-Core Architectures (WAMCA), 2014

23 / 24

Page 24: Lifting Scheme Cores for Wavelet Transform

Summary

the core

I computing unit which processes the data in a single pass,

I can suspend/resume execution,

I can processes the data in many different orders,

I can handle signal boundaries (is aware of coordinates),

I can be easily SIMD vectorized and parallelized,

I and whose underlying scheme can be reorganized.

24 / 24


Recommended