Click here to load reader

Lifting Scheme Cores for Wavelet Transform

  • View
    132

  • Download
    4

Embed Size (px)

Text of Lifting Scheme Cores for Wavelet Transform

  • Lifting Scheme Cores for Wavelet Transform

    David Barina(supervised by Pavel Zemcik)

    1 / 24

  • DWT in image processing

    can be found in many image-processing tasks

    I analysis(edge detection, feature extraction, multiscale representation),

    I compression (JPEG 2000, Dirac),

    I watermarking, edge sharpening, contrast enhancement,tone mapping, denoising, fusion, etc.

    2 / 24

  • Filter bank

    S. Mallat, A theory for multiresolution signal decomposition: The wavelet representation (1989)

    H(z1) a

    d

    2

    +

    G(z1) 2

    2 H(z)

    2 G(z)

    decomposition: two complementary filters,high number of operations

    3 / 24

  • Lifting scheme

    I. Daubechies, W. Sweldens, Factoring wavelet transforms into lifting steps (1998)

    a

    d

    split P (z1)T P (z) merge

    P (z) =

    I1i=0

    {[1 Si(z)0 1

    ] [1 0

    Ti(z) 1

    ]}[K 00 1/K

    ]

    decomposition: sequence of simple filtering steps,reduces the number of operations, split: even, odd

    4 / 24

  • CDF 9/7 wavelet

    I. Daubechies, W. Sweldens, Factoring wavelet transforms into lifting steps (1998)

    input

    output

    steps

    even samples

    odd samples

    P (z) =

    [1

    (1 + z1

    )0 1

    ] [1 0

    (1 + z) 1

    ] [1

    (1 + z1

    )0 1

    ] [1 0

    (1 + z) 1

    ] [ 00 1/

    ]

    four two-tap symmetric filters

    5 / 24

  • 2-D decomposition

    S. Mallat, A theory for multiresolution signal decomposition: The wavelet representation (1989)

    a h

    v d

    horizontal vertical

    h

    v d

    a h

    dv

    image: 2-D signal, by a series of 1-D transforms, four subbands,multi-scale decomposition

    6 / 24

  • Lenna

    how to calculate this as efficiently as possible

    7 / 24

  • Strategies and issues

    R. Kutil, A single-loop approach to SIMD parallelization of 2-D wavelet lifting (2006)

    a h

    v d

    horizontal vertical

    strategies row-column, block-based, and line-based

    cache issues cache line, limited size, set associativity, prefetching

    techniques padding, aggregation, memory layouts,interleave loops, parallelization

    the approaches have to repeatedly visit samples,memory access is expensive CPU cache, limitations,existing techniques, single-loop approach

    8 / 24

  • Unsolved issues

    2 2

    prolog

    core

    epilog

    prolog epilog

    F

    F

    FF

    I complicated border treatment (prolog/epilog phases)I suspend/resume processing

    I arbitrary processing order (scan order)

    I interleave the transform and a subsequent processing

    I multi-scale decomposition

    I reorganization of underlying scheme9 / 24

  • Objectives of the thesis

    Aims improve image transform performance and resourceconsumption

    Objectives eliminate the shortcomings of existing methodsprevious slide

    Evaluation prove experimentallyperformance, memory requirements

    10 / 24

  • Lifting core

    D. Barina, P. Zemcik, Vectorization and parallelization of 2-D wavelet lifting (in press)

    solution: a processing unit

    I continuously consumes an input and produces an output

    I which visits every image sample only once (cache friendly)

    I which is aware of image coordinates (can handle the borders)

    I whose configuration (state) can be saved/restored

    I which can be run in any direction

    I which can be SIMD vectorized

    I which can run in parallel (on independent parts of the image)

    y = C x

    xdef= In B y

    def= On B

    11 / 24

  • Core examples

    D. Barina, P. Zemcik, Vectorization and parallelization of 2-D wavelet lifting (in press)

    mn

    1 2 3 4

    core inputs, outputs

    12 / 24

  • Processing orders

    D. Barina, P. Zemcik, Vectorization and parallelization of 2-D wavelet lifting (in press)

    horizontal horiz. strips horiz. blocks

    vertical vert. strips vert. blocks

    13 / 24

  • Borders treatment

    D. Barina, P. Zemcik, Vectorization and parallelization of 2-D wavelet lifting (in press)

    d a d a d a d a d a d a d a d a d a d

    d a d a d a d a d a d a d a d a d a d a

    n n n n n n n

    a d aad

    n nnnn

    d a d a d a d a d a d a d a d a d a d

    0

    d a d a d a d a d a d a d a d a d a d a

    2 n N 2 N

    0 0

    n n n n n n

    a

    y = Cn x

    cores gracefully treats the boundaries

    14 / 24

  • Parallel cores and reorganization

    M. Kula, D. Barina, et al., Block-based Approach to 2-D Wavelet Transform on GPUs (2016)

    1 2 3 4Sweldens1995

    1 2 3Iwahashi2007

    1 2proposed

    15 / 24

  • 3-D core

    D. Barina, P. Zemcik, Real-Time 3-D Wavelet Lifting (2015)

    x

    y

    z

    buffer x

    buffer y

    buffer z

    extended into more dimensions, buffers on the sides

    16 / 24

  • CPU implementation

    D. Barina, P. Zemcik, Vectorization and parallelization of 2-D wavelet lifting (in press)

    0.0 s

    5.0ns

    10.0ns

    15.0ns

    20.0ns

    25.0ns

    30.0ns

    35.0ns

    40.0ns

    45.0ns

    50.0ns

    1.0k 10.0k 100.0k 1.0M 10.0M 100.0M

    time

    / pix

    el

    pixels

    separable approach core approach

    an evaluation of approaches,implemented the separable, single-loop, and core

    17 / 24

  • 3-D CPU implementation

    D. Barina, P. Zemcik, Real-Time 3-D Wavelet Lifting (2015)

    x

    y

    z

    buffer x

    buffer y

    buffer z

    0.0 s20.0ns40.0ns60.0ns80.0ns

    100.0ns120.0ns140.0ns160.0ns

    0.0 50.0M 100.0M 150.0M 200.0M 250.0M

    time

    / vox

    el

    voxels

    naive horizontalnaive vertical

    core 42core 23core 43

    performance of 3-D transform: separable, 2-D core, 3-D core

    18 / 24

  • GPU implementation

    M. Kula, D. Barina, et al., Block-based Approach to 2-D Wavelet Transform on GPUs (2016)

    80.0 100.0 120.0 140.0 160.0 180.0 200.0 220.0 240.0 260.0

    0.0 10.0M 20.0M 30.0M 40.0M 50.0M 60.0M 70.0M

    GB

    /s

    pixels

    Kucis2014Separable Block

    Non-Separable Block

    0

    10

    20

    30

    40

    50

    60

    100kpel 1Mpel 10Mpel 100Mpel

    GB

    /s

    SweldensIwahashi*Explosive*Monolithic*Polyphase*

    Monolithic scheme:

    left: SotA is in red, block methods in blue/green, reorganizationright: block methods, separable in black, our in blue/green

    19 / 24

  • FPGA implementation

    D. Barina, et al., Single-Loop Approach to 2-D Wavelet Lifting with JPEG 2000 Compatibility (2015)

    H V

    BRAM

    Input Transform

    core FF LUT BRAMlatency 4 441 (0.1 %) 399 (0.18 %) 6 (1.1 %)latency 2 391 (< 0.1 %) 592 (0.27 %) 6 (1.1 %)

    architecture device BRAM [bits] clocks/pel time [ms]Dillen2003 VirtexE1000-8 50K 0.50 1.20Descampe2004 Virtex-II XC2V6000 N/A 0.60 1.75Seo2007 Altera Stratix 128K 2.64 6.02Zhang2012 Virtex-II Pro XC2VP30 6 18K 0.50 0.97the cores Zynq XC7Z045 1 36K 0.26 0.27

    20 / 24

  • JPEG 2000 implementation

    D. Barina, O. Klima, P. Zemcik, Single-Loop Architecture for JPEG 2000 (2016)

    core

    codeblock

    2 2cn

    2 2cm

    aj

    aj+1

    h v d

    0.0 20.0 40.0 60.0 80.0

    100.0 120.0 140.0

    100.0k 1.0M 10.0M 100.0M 1.0G

    time

    [ns]

    resolution [pel]

    proposedOpenJPEG

    JasPerFFmpeg

    21 / 24

  • Contributions of the thesis

    Aims improved image transform performance and resourceconsumption

    Objectives eliminated the shortcomings of existing methods

    Evaluation assessed experimentally(performance, memory requirements)

    evaluation performed:2-D on CPU, 3-D on CPU, 2-D on GPU, 2-D on FPGA,JPEG 2000 on CPU

    22 / 24

  • Selected papersI Barina, D.; Klima, O.; Zemcik, P.: Single-Loop Software Architecture for JPEG 2000. In

    Data Compression Conference (DCC), 2016

    I Barina, D.; Musil, M.; Musil, P.; et al.: Single-Loop Approach to 2-D Wavelet Lifting withJPEG 2000 Compatibility. In Workshop on Applications for MultiCore Architectures(WAMCA), 2015

    I Barina, D.; Zemcik, P.: Minimum Memory Vectorisation of Wavelet Lifting. In AdvancedConcepts for Intelligent Vision Systems (ACIVS), 2013

    I Barina, D.; Zemcik, P.: Wavelet Lifting on Application Specific Vector Processor. InGraphiCon, 2013

    I Barina, D.; Zemcik, P.: Diagonal Vectorisation of 2-D Wavelet Lifting. In IEEE InternationalConference on Image Processing (ICIP), 2014

    I Barina, D.; Zemcik, P.: Real-Time 3-D Wavelet Lifting. In International Conference inCentral Europe on Computer Graphics, Visualization and Computer Vision (WSCG), 2015

    I Barina, D.; Zemcik, P.: Vectorization and parallelization of 2-D wavelet lifting. Journal ofReal-Time Image Processing (JRTIP), in press

    I Barina, D.; Klima, O.; Zemcik, P.: Single-Loop Architecture for JPEG 2000. In: Image andSignal Processing (ICISP), 2016

    I Kula, M.; Barina, D.; Zemcik, P.: Block-based Approach to 2-D Wavelet Transform on GPUs.In International Conference on Information Technology New Generations (ITNG), 2016

    I Kucis, M.; Barina, D.; Kula, M.; et al.: 2-D Discrete Wavelet Transform Using GPU. InWorkshop on Application for Multi-Core Architectures (WAMCA), 2014

    23 / 24

  • Summary

    the core

    I computing unit which processes the data in a single pass,

    I can suspend/resume execution,

    I can processes the data in many different orders,

    I can handle signal boundaries (is aware of coordinates),

Search related