64
2/25/2008 http://biostatistics.mdanderson.org/Morris 1 Wavelets and Wavelet Regression Rice University 2/25/2008 Jeffrey S. Morris Department of Biostatistics The University of Texas MD Anderson Cancer Center

Wavelets and Wavelet Regressionodin.mdacc.tmc.edu/~jmorris/Teaching files/Rice Wavelet Course S08/Wavelet course...What if we have . spatially heterogeneous. data? – Different degrees

  • Upload
    others

  • View
    28

  • Download
    1

Embed Size (px)

Citation preview

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 1

    Wavelets and Wavelet Regression

    Rice University2/25/2008

    Jeffrey S. MorrisDepartment of Biostatistics

    The University of Texas MD Anderson Cancer Center

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 2

    Nonparametric Regression• Given a response vector y={y1

    , …, yN

    }’

    with predictors t={t1

    , …, tN

    }’

    , a nonparametric regression model

    for y

    on t

    is given by

    ),0(~ )( 2eiiii Neetfy σ+=where the form of f is left unspecified.• Goal: estimate f• Various approaches:

    – Kernel

    methods– Spline-based methods– Wavelet-based methods

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 3

    Nonparametric Regression: Kernel Estimators• The idea behind kernel estimation is that simple

    parametric estimators (mean, linear, quadratic) are applied locally, with data points weighted according to a kernel function

    (bounded or decays to zero)

    =

    =

    −= N

    ii

    N

    iii

    httw

    yhttwtf

    1

    1

    );(

    );()(ˆ

    Nadaraya/Watson local mean

    estimator:

    • w

    is a kernel function (e.g. Gaussian pdf) • h

    is a bandwidth

    mitigating the trade-off between

    bias and variance (e.g. Gaussian standard dev.)

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 4

    Nonparametric Regression: N-W Local Mean Smoother

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 5

    Nonparametric Regression: N-W Local Mean Smoother (Bandwidth)

    • Can estimate h

    from data by cross validation (or GCV)

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 6

    Nonparametric Regression: Kernel Estimators

    • Can improve performance by replacing local mean with local regression fits (line/parabola)

    • Fits line or parabola locally, with weights determined by kernel function

    );(])}({[min

    osolution t squaresleast )(ˆ

    2

    110, 10

    httwttbby

    tf

    i

    N

    iiibb

    −−+−

    =

    ∑=

    Local LinearSmoother:

    Local Quadratic Smoother: );(]})()({[min

    osolution t squaresleast )(ˆ

    2

    1

    2210,, 210

    httwttbttbby

    tf

    i

    N

    iiiibbb

    −−+−+−

    =

    ∑=

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 7

    Nonparametric Regression: Local Quadratic Smoother

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 8

    Nonparametric Regression: Local Quadratic Smoother (Bandwidth)

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 9

    Nonparametric Regression: Smoothing Splines

    • Another way to view the nonparametric regression problem is as penalized regression; minimize L:

    ∫∑ +−==

    − dttftfyNLN

    iii

    2''2

    1

    1 )(})({ α

    • It can be shown that this function is minimized by a cubic smoothing spline with knots at values ti

    –Piecewise cubic polynomial between knots–Two continuous derivatives–Third derivative: step function with change point at knots

    • Smoothing parameter: α (can be estimated from data)

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 10

    Nonparametric Regression: Regression Splines

    • An easier-to-fit alternative to smoothing splines

    are regression splines, which involve polynomial regressions in intervals between chosen set of p

    knots

    κ={κ1

    , κ2

    , …, κp

    }: ∑=

    −N

    1i

    2)( minimize Xbyi

    • where (t-κ)3+

    =(t-κ)3 if (t-κ)>0, 0 otherwise• Knots

    frequently chosen to be at quantiles of

    t

    –Can be shown that 7-8 knots sufficient for most smooth functions

    ⎥⎥⎥

    ⎢⎢⎢

    −−

    −−=

    ++

    ++

    331

    32

    31

    311

    31

    211

    )()(1

    )()(1

    pNNNNN

    p

    ttttt

    tttttX

    κκ

    κκ

    L

    MOMMMMM

    LCubic Regression : Spline

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 11

    Nonparametric Regression: Penalized Regression Splines

    • Problem:

    wiggly fits because no penalty (overfitting)• Solution:

    Penalize

    Σbj2

    for spline terms

    P-Splines

    (Eilers

    and Marx, Ruppert

    Wand and Carroll)

    DbbXbyi ')( minimize2

    N

    1i

    2 λ+−∑=

    • Where D

    is a diagonal matrix with 1’s corresponding to the “spline”

    terms, and 0’s to the “polynomial”

    • Smoothing parameter:

    λ• Solution is ridge regression estimator:

    yXDXXXy ')'(ˆ 12 −+= λ

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 12

    Nonparametric Regression: Penalized Regression Splines

    • Can be represented by linear mixed model!!

    ),0(~),0(~

    22

    eNeNu

    eZuXbYσλ

    ++=

    • Spline coefficient estimators are BLUPs

    from LMM!– Linearly shrunken

    towards zero according to relative sizes

    of λ2

    and σe2 ; equivalent to mean zero Bayesian prior on u• Smoothing parameter is variance component!

    – Estimated from data using REML

    • Can also be fit as Bayesian hierarchical model

    ⎥⎥⎥

    ⎢⎢⎢

    −−

    −−=

    ⎥⎥⎥

    ⎢⎢⎢

    =

    ++

    ++

    331

    31

    311

    32

    31

    211

    )()(

    )()(

    1

    1

    pNN

    p

    NNN tt

    ttZ

    ttt

    tttX

    κκ

    κκ

    L

    MOM

    L

    MMMM

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 13

    Nonparametric Regression:• What if we have a spatially heterogeneous

    function?

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 14

    Nonparametric Regression Spatially Adaptive Function Estimation

    • What if we have spatially heterogeneous

    data?– Different degrees of smoothness at different parts of curve

    • We need spatially adaptive

    smoothers (regularization rather than smoothness)

    • Kernels

    can be spatially adaptive– Spatially varying bandwidth

    • Splines

    can be spatially adaptive– Vary knot locations (free knot splines)– Spatially varying smoothing parameter

    • Wavelet Regression: has been shown to adjust optimally to varying smoothness (Fan, et al. 1993)

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 15

    Wavelets: Fourier Analysis

    •Any function f

    in L2[-π,π]

    can be represented by a Fourier Series

    ∑∞

    = ⎭⎬⎫

    ⎩⎨⎧ ++=

    1

    0 )sin()cos(2

    )(j

    jjjtbjtaatyπππ

    ==

    ==

    ==

    π

    π

    π

    π

    π

    π

    ππ

    ππ

    ππ

    dtjttyjtyb

    dtjttyjtya

    dttyya

    j

    j

    )sin()()sin(,

    )cos()()cos(,

    21)(

    21,0

    •Coefficients (aj

    ,bj

    )

    describe behavior of the function at frequency j

    •{1/

    %2π, 1/

    %πcos(jt), 1/

    %π sin(jt) j=1, …, ∞} form a complete orthonormal

    basis

    for L2[-π,π]

    – + fj

    ,fk

    , = I(j=k)

    •This set of functions spans L2[-π,π]

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 16

    Wavelets: Fourier Analysis• Fourier Series also have a complex representation:

    πππψψ

    2)sin(

    2)cos(

    2)(for )( jtijtet(t)dty

    j

    ijt

    jjj +=== ∑∞

    −∞=∫−

    ==π

    π πψ dtetyyd

    ijt

    jj 2)(,

    • {ψj

    , j=-∞, …, ∞}

    form a complete orthonormal

    basis for L2[-π,π]• Discrete Fourier Analysis: For signal defined on an equally

    spaced grid of size T, a fast algorithm {FFT, O(TlogT)}

    is available for computing Fourier coefficients from the observed signal. (Transform data to frequency domain: T T)

    • Fourier analysis useful for signal processing, but has important limitation: all location information lost in frequency domain

    – Useful for processing stationary signals, but not so much for nonstationary– What if we had a set of basis functions that decomposed information in

    both the frequency

    and time

    domain? We do! Wavelets!!

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 17

    Wavelets )()(

    ,∑

    ℑ∈

    =kj

    jkjk tdty ψ )2(2)( 2/ ktt jjjk −= ψψ

    ∫ℜ

    == dtttyyd jkjkjk )()(, ψψ

    • Coefficient djk

    summarizes behavior of the function at scale (frequency) indexed by j, and location indexed by k.

    • {ψjk

    , j,k=-∞, …,∞}

    form a complete orthonormal

    basis for L2(ℜ)• The mother wavelet basis function,

    ψ, has several properties:

    – Iψ(t)dt=0, Iψ2(t)dt=1– It has a corresponding father wavelet

    ϕ such that +ϕ,ψ,=I ϕ(t)ψ(t)dt=0 (also called a scaling function), Iϕ(t)dt=1, Iϕ2(t)dt=1

    – It has compact or vanishing support (unlike Fourier bases)– It generates a multiresolution

    analysis

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 18

    Wavelets: Multiresolution Analysis• Let Vj

    , j= …, -2, -1, 0, 1, 2, …

    be a sequence of function subspaces in L2(ℜ).

    The collection of spaces {Vj

    , j∈ℑ}

    is called a multi- resolution analysis

    w/ scaling function ϕ if these conditions hold:

    – Orthonormal

    Basis: The function ϕ belongs to V0

    and the set {ϕ(t-k), k∈ℑ}

    is an orthonormal

    basis for V0

    .– Nested: Vj

    ⊂Vj+1– Density:

    ^Vj

    =L2(ℜ)– Separation: _Vj

    ={0}– Scaling: the function f(t)

    belongs to Vj

    iff

    the function f(2jt)

    belongs to V0Thus, {2j/2ϕ(2jt-k), k∈ℑ}

    is an orthonormal

    basis for

    Vj

    • Note the following properties also hold:– Wj

    =space spanned by {2j/2ψ(2jt-k), k∈ℑ}– Vj

    ⊕Wj

    =Vj+1, so one can think of W as “residual space”

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 19

    Wavelets: Haar Wavelet Example

    •The simplest and oldest wavelet is the Haar

    wavelet

    ⎪⎩

    ⎪⎨

    ⎧∈−∈

    =

    =

    otherwise 0)1,5.0[for 1

    )5.0,0[for 1)(

    )()( )1,0[

    tt

    t

    tIt

    ψ

    ϕ

    • Iϕ(t)ψ(t)dt=0

    Iϕ(t)dt=1

    Iϕ2(t)dt=1

    Iψ(t)dt=0

    Iψ2(t)dt=1

    • V0

    =space spanned by {ϕ(t-k), k∈ℑ}

    (piecewise const, jumps 1)• W0

    =space spanned by {ψ(t-k), k∈ℑ}, • V1

    =space from {21/2ϕ(2t-k), k∈ℑ}

    (piecewise const, jumps ½)

    • V0 ⊂V1

    , V1

    =V0 ⊕W0

    , ^Vj

    =L2(ℜ)

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 20

    Wavelets: Haar Wavelet Example

    ⎪⎩

    ⎪⎨

    ⎧∈−∈

    ==otherwise 0

    )1,5.0[for 1)5.0,0[for 1

    )( )()( 00)1,0[00 tt

    ttIt ψϕ•Consider L2{[0,1)}

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 21

    Wavelets: Haar Wavelet Example

    ⎪⎩

    ⎪⎨

    ∈−∈

    ==otherwise 0

    )5.0,25.0[for 2)25.0,0[for 2

    )( )(2)( 2/12/1

    10)5.0,0[2/1

    10 tt

    ttIt ψϕ⎪⎩

    ⎪⎨

    ∈−∈

    ==otherwise 0

    )0.1,75.0[for 2)75.0,5.0[for 2

    )( )(2)( 2/12/1

    11)1,5.0[2/1

    11 tt

    ttIt ψϕ

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 22

    Wavelets: Haar Wavelet Example

    ⎪⎩

    ⎪⎨

    ⎧∈−∈

    ==otherwise 0

    )25.0,125.0[for 2)125.0,0[for 2

    )( )()( 20)25.0,0[20 tt

    ttIt ψϕ⎪⎩

    ⎪⎨

    ⎧∈−

    ∈==

    otherwise 0)5.0,375.0[for 2)375.0,25.0[for 2

    )( )()( 21)5.0,25.0[21 tt

    ttIt ψϕ⎪⎩

    ⎪⎨

    ⎧∈−∈

    ==otherwise 0

    )75.0,625.0[for 2)625.0,5.0[for 2

    )( )()( 22)75.0,5.0[22 tt

    ttIt ψϕ⎪⎩

    ⎪⎨

    ⎧∈−

    ∈==

    otherwise 0)1,875.0[for 2)875.0,75.0[for 2

    )( )()( 23)00.1,75.0[23 tt

    ttIt ψϕ

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 23

    Wavelets: Haar Wavelet Example

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.30.10.20.00.60.20.00.1

    y

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 24

    Wavelets: Haar Wavelet Example

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.30.10.20.00.60.20.00.1

    y

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 25

    Wavelets: Haar Wavelet Example

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.30.10.20.00.60.20.00.1

    y

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 26

    Wavelets: Haar Wavelet Example

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.30.10.20.00.60.20.00.1

    y

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 27

    Wavelets: Haar Wavelet Example

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.30.10.20.00.60.20.00.1

    y

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 28

    Wavelets: Haar Wavelet Example

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    −−−

    −−

    =

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    4.04.08.0

    2.028.099.03.05.1

    '

    0.30.10.20.00.60.20.00.1

    '

    23

    22

    21

    20

    11

    10

    00

    00

    dddddddc

    dy

    • Full rank linear transformation from data space

    to wavelet space

    • Can be written y=dW

    4444 34444 21L

    MOM

    L

    L

    Matrix Transform Wavelet lOrthonorma

    823123

    800100

    800100

    )()(

    )()()()(

    '

    ⎥⎥⎥⎥

    ⎢⎢⎢⎢

    =

    tt

    tttt

    W

    ψψ

    ψψϕϕ

    • W’W=WW’=I• Thus yW’=dWW’

    – d=yW’

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 29

    Wavelets: Haar Wavelet Example

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.30.10.20.00.60.20.00.1

    y

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.30.10.20.00.60.20.00.1

    cccccccc

    )0(

    37

    36

    35

    34

    33

    32

    31

    30

    d

    Level 0 Decomposition

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 30

    Wavelets: Haar Wavelet Example

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.30.10.20.00.60.20.00.1

    y

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.4-0.4-0.8-0.20.80.41.60.2

    ddddcccc

    )1(

    23

    22

    21

    20

    23

    22

    21

    20

    d

    Level 1 Decomposition

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 31

    Wavelets: Haar Wavelet Example

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.30.10.20.00.60.20.00.1

    y

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.4-0.4-0.8-0.20.28-0.990.851.27

    ddddddcc

    )2(

    23

    22

    21

    20

    11

    10

    11

    10

    d

    Level 2 Decomposition

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 32

    Wavelets: Haar Wavelet Example

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.30.10.20.00.60.20.00.1

    y

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

    =

    0.4-0.4-0.8-0.20.28-0.990.31.5

    dddddddc

    )3(

    23

    22

    21

    20

    11

    10

    00

    00

    d

    Level 3 (full) Decomposition

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 33

    Wavelets: DWT Pyramid Algorithm

    y=cJ

    d0

    c0

    dJ-2

    cJ-2

    dJ-1

    cJ-1

    • Consider function

    y(t) observed on equally spaced grid size T=2J

    Decomposition Level:

    0 1 2 J (full)

    …s sss

    d d d d

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 34

    Wavelets: Filter Coefficients

    • Note: Because of the special construction of the wavelet coefficients, each inner product only involves multiplication of a portion of the data vector by [2-1/2 2-1/2]

    or [2-1/2

    -2-1/2]

    . These

    are called the Haar’s

    filter coefficients.

    • Thus, the matrix W can be generated by convolutions of these filter coefficients within a banded structure. These allow this pyramid-

    based algorithm to run very quickly O(T). (DWT/IDWT)

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 35

    Wavelets: Daubechies Family

    • Properties of Haar

    wavelets:– Iϕ(t)ψ(t)dt=0

    Iϕ(t)dt=Iϕ2(t)dt=1

    Iψ(t)dt=Iψ2(t)dt=0, MRA

    – Orthonormal

    transform (y=dW

    for orthogonal W)– W determined by two “filter coefficients”: [2-1/2, 2-1/2]– Has compact support on [0,1)– Vanishing “zeroth”

    moment: It0ψ(t)dt=Iψ(t)dt=0

    • Other wavelet bases:– Suppose we want wavelet with orthonormality, compact

    support, but also want vanishing 1st

    moment Itψ(t)dt=0– Is there a wavelet/scaling function pair with these properties?

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 36

    Wavelets: Daubechies Family

    • No closed form expression

    • Determined recursively by just 4 filter coefficients

    • Compact support on [0,3)

    ⎥⎦

    ⎤⎢⎣

    ⎡ ++−−4

    31,4

    33,4

    33,4

    31

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 37

    Wavelets: Daubechies Family

    • Daubechies

    family of wavelets (Daub 1988):• Indexed by number of vanishing moments (N)

    – All Daub wavelets have the properties:• Iϕ(t)ψ(t)dt=0• Iϕ(t)dt=Iϕ2(t)dt=1• Iψ(t)dt=Iψ2(t)dt=0• Multiresolution

    Analysis (MRA)

    • Orthonormal

    transform (y=dW

    for orthogonal W)– Determined by L=2N

    filter coefficients

    – Has compact support on [0,2N-1)– N

    vanishing moments: Itaψ(t)dt=0 for

    a=0, 1, 2,…, N-1

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 38

    Wavelets: Daubechies Family

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 39

    Wavelets: Coiflet Family

    • What if we also require L vanishing moments for the scaling function ϕ?

    • Coiflet

    Family

    (Daubechies

    1992, for R Coifman) (N)• All Coiflet

    wavelets

    have the properties:– Iϕ(t)ψ(t)dt=0– Iϕ(t)dt=Iϕ2(t)dt=1– Iψ(t)dt=Iψ2(t)dt=0– Multiresolution

    Analysis (MRA)– Orthonormal

    transform (y=dW

    for orthogonal W)• Completely determined by

    L=6N

    filter coefficients

    • Has compact support on [0,6N-1)• Wavelets:

    2N

    vanishing moments: Itaψ(t)dt=0 for

    a=0, 1, 2,…, 2N-1• Scaling:

    2N-1

    vanishing moments: Itaϕ(t)dt=0 for

    a=0, 1, 2,…, 2N-2

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 40

    Wavelets: Coiflet Family

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 41

    Wavelets: Summary∑

    ℑ∈

    =kj

    jkjk tdty,

    )()( ψ

    )2(2)( 2/ ktt jjjk −= ψψ

    ∫= dtttyd jkjk )()( ψ

    }

    {

    }TT

    T

    T ×

    ×

    ×

    = Wdy1

    1 }

    {

    }TT

    T

    T ×

    ×

    ×

    = 'Wyd1

    1

    • Given T-vector y consisting of function sampled on equally- spaced grid, a pyramid-based algorithm

    for DWT (Mallat) can

    be used to obtain d, T

    –vector of wavelet coefficients for the given family, in O(T)

    operations (converse also true)

    • We never need to compute or think about ψ or ϕ, since we can apply the DWT using only the wavelet family’s filter coefficients

    Linear Representation:

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 42

    Wavelets: DWT Pyramid Algorithm

    y=cJ

    d0

    c0

    dJ-2

    cJ-2

    dJ-1

    cJ-1

    • Consider function

    y(t) observed on equally spaced grid size T=2J

    Decomposition Level:

    0

    1

    2

    J (full)

    • DWT: Can be computed in O(T),

    Also IDWT• Note:

    The low pass and high pass filters, s

    and d, are

    determined solely by the wavelet’s filter coefficients

    …s sss

    d d d d

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 43

    Wavelets: Practical Issues

    • T

    not power of 2?– Adjustments to DWT possible (Percival&Walden

    2000)

    • Choice of wavelet basis

    :match data features– Piecewise constant (aCGH): Haar– Smoother functions: Daub#N

    with N higher

    – Symmetric features?: Symmlets

    or Coiflets– Tradeoff: smoothness vs. filter length

    • Boundary correction

    issues (all but Haar)– Periodic, Pad with zeros, Reflection

    • J=Number of

    levels

    of decomposition– Full decomposition: J=log2

    (T)– Choose

    J: floor(KJ-1

    /2)>L-1 (at least one coefficient in middle unaffected by boundary conditions, PW2000)

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 44

    Wavelets: Properties• Very fast calculation : O(T)• Compact support

    : good for spatially heterogeneous

    data• Signal compression: concentration on few d• Orthogonality: distributes white noise across d• Whitening: autocorrelation of wavelet coefficients

    tends to die away rapidly across k; djk

    approximately uncorrelated (Johnstone

    and Silverman 1997)

    • Time/frequency

    decomposition: key to adaptive smoothing

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 45

    ε+= )(ty g1.

    Project data into wavelet space

    using DWT.

    d=yW’

    where W’

    is the orthogonal DWT matrix

    Recall:

    d={djk

    , j=1, …, J; k=1, …, Kj

    }

    ),0(~ 2 TIN σε

    •Wavelet Regression: –Row vector y: response on equally-spaced grid t

    (length T)

    Wavelets: Wavelet Regression

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 46

    1. Project data into wavelet space

    using DWT.

    d=yW’

    where W’

    is the orthogonal DWT matrix

    ),0(~ 2 TIN σε'')(' WWgW ε+= ty

    •Wavelet Regression: –Row vector y: response on equally-spaced grid t

    (length T)

    Wavelets: Wavelet Regression

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 47

    1. Project data into wavelet space

    using DWT.

    d=yW’

    where W’

    is the orthogonal DWT matrix

    ),0(~ 2 TIN σε'')( WWg ε+= td

    •Wavelet Regression: –Row vector y: response on equally-spaced grid t

    (length T)

    Wavelets: Wavelet Regression

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 48

    1. Project data into wavelet space

    using DWT.

    d=yW’

    where W’

    is the orthogonal DWT matrix

    Note:

    θ={θjk

    : j=1, …, J; k=1, …, Kj

    }By sparsity

    property, we expect few

    |θjk

    |>0

    'Wε+= θd ),0(~ 2 TIN σε

    •Wavelet Regression: –Row vector y: response on equally-spaced grid t

    (length T)

    Wavelets: Wavelet Regression

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 49

    1. Project data into wavelet space

    using DWT.

    d=yW’

    where W’

    is the orthogonal DWT matrix

    Note: ε*= εW’~N(0,Wσ2IT

    W’)~N(0,σ2WW’)~N(0, σ2IT

    ) since WW’=IT (orthogonality)

    *ε+= θd ),0(~ 2 TIN σε

    •Wavelet Regression: –Row vector y: response on equally-spaced grid t

    (length T)

    Wavelets: Wavelet Regression

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 50

    Wavelets: Wavelet Regression•Wavelet Regression:

    –Row vector y: response on equally-spaced grid t

    (length T)

    Wtg θ̂)(ˆ =

    1. Project data into wavelet space

    using DWT.

    d=y W’

    where W’

    is the orthogonal DWT matrix

    *ε+= θd ),0(~ 2* TIN σε

    2. Estimate θ

    by hard thresholding

    : θjk

    =djk

    *I(|djk

    |>δσ2)

    • Yields adaptive regularized nonparametric estimate of g(t).

    3. Project back to data space

    using IDWT

    (DJ1994) threshold Universalis )log(2 T=δ

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 51

    Wavelets: Wavelet Regression

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 52

    Wavelets: Wavelet Regression

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 53

    Return

    Wavelets: Wavelet Regression

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 54

    Wavelets: Wavelet RegressionSpiky function: Local Linear Regression

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 55

    Wavelets: Wavelet RegressionSpiky function: Wavelet Regression

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 56

    Donoho

    and Johnstone’s

    Universal threshold

    Wavelets: Wavelet Regression

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 57

    Wavelets: Wavelet Regression

    Local Quadratic Regression: maximum h to preserve spike

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 58

    Wavelets: Wavelet Regression• Hard vs. Soft Thresholding

    – Hard thresholding: θjk

    =djk

    *I(|djk

    |>δσ2)– Soft thresholding: θjk

    =sgn(djk

    )×(|djk

    |-δσ2)+– Hard: preserves peak heights better at cost of smoothness

    • Universal vs. Level Dependent Thresholds– Level dependent thresholds δσj2

    perform even better

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 59

    Wavelets: Wavelet Regression

    • Bayesian hierarchical approach

    : Prior distribution on θjk

    can mimic the idea of thresholding

    ),( 2σθ jkjk Nd = 022 )1(),0( δγστγθ jkjkjk jN −+=

    )(Bernoulli jjk πγ =

    • τj2

    and πj

    are regularization parameters– Can be specified, or estimated from data using ML in

    empirical Bayes

    fashion (Clyde and George 2000)

    • Conjugate model (given σ2), so closed form (θjk

    |djk

    , σ2)

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 60

    Wavelets: Wavelet Regression( ) 022 )1(,),|( IPSFSFdNPd jkjjjkjkjkjk −+= σσθ

    ( ) OddsPosterior ,1

    |1 Pr =+

    == jkjk

    jkjkjk OO

    Odγ

    {( )

    444 3444 2143421Factor BayesOddsPrior

    OddsPosterior ⎟⎟⎠

    ⎞⎜⎜⎝

    ⎛+⎟

    ⎟⎠

    ⎞⎜⎜⎝

    −=

    2

    212

    2exp1

    1 στ

    ππ jjk

    jj

    jjk

    SFdO

    factor" shrinkageLinear "12

    2

    =+

    =j

    jjSF τ

    τ

    factor" shrinkageNonlinear ")|1Pr( === jkjkjk dP γ

    jjkjkjk

    jkjjkjk

    jkjkjk

    SFPd

    PISFd

    PId

    =

    >=

    >=

    shrinkage,

    hsoft_thres,

    hhard_thres,

    ˆ)(ˆ

    )(ˆ

    θ

    δθ

    δθ

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 61

    Wavelets: Wavelet Regression

    σ2=1, τ2=10, π=0.20, δ=0.50

    jjkjkjk

    jkjjkjk

    jkjkjk

    SFPd

    PISFd

    PId

    =

    >=

    >=

    shrinkage,

    hsoft_thres,

    hhard_thres,

    ˆ)(ˆ

    )(ˆ

    θ

    δθ

    δθ

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 62

    Wavelets: Wavelet Regression

    σ2=1, τ2=10, π=0.20

    jkjkjk

    jjkjk

    jjkjkjk

    Pd

    SFd

    SFPd

    =

    =

    =

    nonlinear,

    linear,

    shrinkage,

    ˆ

    ˆ

    ˆ

    θ

    θ

    θ

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 63

    Wavelets: Wavelet Regression

    σ2=1, τ2=5, π=0.20

    jkjkjk

    jjkjk

    jjkjkjk

    Pd

    SFd

    SFPd

    =

    =

    =

    nonlinear,

    linear,

    shrinkage,

    ˆ

    ˆ

    ˆ

    θ

    θ

    θ

  • 2/25/2008 http://biostatistics.mdanderson.org/Morris 64

    Wavelets: Wavelet Regression

    σ2=1, τ2=2, π=0.20

    jkjkjk

    jjkjk

    jjkjkjk

    Pd

    SFd

    SFPd

    =

    =

    =

    nonlinear,

    linear,

    shrinkage,

    ˆ

    ˆ

    ˆ

    θ

    θ

    θ

    Wavelets and Wavelet Regression Nonparametric RegressionNonparametric Regression: Kernel EstimatorsNonparametric Regression:�N-W Local Mean SmootherNonparametric Regression:�N-W Local Mean Smoother (Bandwidth)Nonparametric Regression: Kernel EstimatorsNonparametric Regression:�Local Quadratic SmootherNonparametric Regression:�Local Quadratic Smoother (Bandwidth)Nonparametric Regression: �Smoothing SplinesNonparametric Regression: �Regression SplinesNonparametric Regression: �Penalized Regression SplinesNonparametric Regression: �Penalized Regression SplinesNonparametric Regression:Nonparametric Regression�Spatially Adaptive Function EstimationWavelets: Fourier AnalysisWavelets: Fourier AnalysisWaveletsWavelets: Multiresolution AnalysisWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: DWT Pyramid AlgorithmWavelets: Filter CoefficientsWavelets: Daubechies FamilyWavelets: Daubechies FamilyWavelets: Daubechies FamilyWavelets: Daubechies FamilyWavelets: Coiflet FamilyWavelets: Coiflet FamilyWavelets: SummaryWavelets: DWT Pyramid AlgorithmWavelets: Practical IssuesWavelets: PropertiesWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionSpiky function: Local Linear RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet Regression