Upload
others
View
28
Download
1
Embed Size (px)
Citation preview
2/25/2008 http://biostatistics.mdanderson.org/Morris 1
Wavelets and Wavelet Regression
Rice University2/25/2008
Jeffrey S. MorrisDepartment of Biostatistics
The University of Texas MD Anderson Cancer Center
2/25/2008 http://biostatistics.mdanderson.org/Morris 2
Nonparametric Regression• Given a response vector y={y1
, …, yN
}’
with predictors t={t1
, …, tN
}’
, a nonparametric regression model
for y
on t
is given by
),0(~ )( 2eiiii Neetfy σ+=where the form of f is left unspecified.• Goal: estimate f• Various approaches:
– Kernel
methods– Spline-based methods– Wavelet-based methods
2/25/2008 http://biostatistics.mdanderson.org/Morris 3
Nonparametric Regression: Kernel Estimators• The idea behind kernel estimation is that simple
parametric estimators (mean, linear, quadratic) are applied locally, with data points weighted according to a kernel function
(bounded or decays to zero)
∑
∑
=
=
−
−= N
ii
N
iii
httw
yhttwtf
1
1
);(
);()(ˆ
Nadaraya/Watson local mean
estimator:
• w
is a kernel function (e.g. Gaussian pdf) • h
is a bandwidth
mitigating the trade-off between
bias and variance (e.g. Gaussian standard dev.)
2/25/2008 http://biostatistics.mdanderson.org/Morris 4
Nonparametric Regression: N-W Local Mean Smoother
2/25/2008 http://biostatistics.mdanderson.org/Morris 5
Nonparametric Regression: N-W Local Mean Smoother (Bandwidth)
• Can estimate h
from data by cross validation (or GCV)
2/25/2008 http://biostatistics.mdanderson.org/Morris 6
Nonparametric Regression: Kernel Estimators
• Can improve performance by replacing local mean with local regression fits (line/parabola)
• Fits line or parabola locally, with weights determined by kernel function
);(])}({[min
osolution t squaresleast )(ˆ
2
110, 10
httwttbby
tf
i
N
iiibb
−−+−
=
∑=
Local LinearSmoother:
Local Quadratic Smoother: );(]})()({[min
osolution t squaresleast )(ˆ
2
1
2210,, 210
httwttbttbby
tf
i
N
iiiibbb
−−+−+−
=
∑=
2/25/2008 http://biostatistics.mdanderson.org/Morris 7
Nonparametric Regression: Local Quadratic Smoother
2/25/2008 http://biostatistics.mdanderson.org/Morris 8
Nonparametric Regression: Local Quadratic Smoother (Bandwidth)
2/25/2008 http://biostatistics.mdanderson.org/Morris 9
Nonparametric Regression: Smoothing Splines
• Another way to view the nonparametric regression problem is as penalized regression; minimize L:
∫∑ +−==
− dttftfyNLN
iii
2''2
1
1 )(})({ α
• It can be shown that this function is minimized by a cubic smoothing spline with knots at values ti
–Piecewise cubic polynomial between knots–Two continuous derivatives–Third derivative: step function with change point at knots
• Smoothing parameter: α (can be estimated from data)
2/25/2008 http://biostatistics.mdanderson.org/Morris 10
Nonparametric Regression: Regression Splines
• An easier-to-fit alternative to smoothing splines
are regression splines, which involve polynomial regressions in intervals between chosen set of p
knots
κ={κ1
, κ2
, …, κp
}: ∑=
−N
1i
2)( minimize Xbyi
• where (t-κ)3+
=(t-κ)3 if (t-κ)>0, 0 otherwise• Knots
frequently chosen to be at quantiles of
t
–Can be shown that 7-8 knots sufficient for most smooth functions
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−
−−=
++
++
331
32
31
311
31
211
)()(1
)()(1
pNNNNN
p
ttttt
tttttX
κκ
κκ
L
MOMMMMM
LCubic Regression : Spline
2/25/2008 http://biostatistics.mdanderson.org/Morris 11
Nonparametric Regression: Penalized Regression Splines
• Problem:
wiggly fits because no penalty (overfitting)• Solution:
Penalize
Σbj2
for spline terms
P-Splines
(Eilers
and Marx, Ruppert
Wand and Carroll)
DbbXbyi ')( minimize2
N
1i
2 λ+−∑=
• Where D
is a diagonal matrix with 1’s corresponding to the “spline”
terms, and 0’s to the “polynomial”
• Smoothing parameter:
λ• Solution is ridge regression estimator:
yXDXXXy ')'(ˆ 12 −+= λ
2/25/2008 http://biostatistics.mdanderson.org/Morris 12
Nonparametric Regression: Penalized Regression Splines
• Can be represented by linear mixed model!!
),0(~),0(~
22
eNeNu
eZuXbYσλ
++=
• Spline coefficient estimators are BLUPs
from LMM!– Linearly shrunken
towards zero according to relative sizes
of λ2
and σe2 ; equivalent to mean zero Bayesian prior on u• Smoothing parameter is variance component!
– Estimated from data using REML
• Can also be fit as Bayesian hierarchical model
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−
−−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
=
++
++
331
31
311
32
31
211
)()(
)()(
1
1
pNN
p
NNN tt
ttZ
ttt
tttX
κκ
κκ
L
MOM
L
MMMM
2/25/2008 http://biostatistics.mdanderson.org/Morris 13
Nonparametric Regression:• What if we have a spatially heterogeneous
function?
2/25/2008 http://biostatistics.mdanderson.org/Morris 14
Nonparametric Regression Spatially Adaptive Function Estimation
• What if we have spatially heterogeneous
data?– Different degrees of smoothness at different parts of curve
• We need spatially adaptive
smoothers (regularization rather than smoothness)
• Kernels
can be spatially adaptive– Spatially varying bandwidth
• Splines
can be spatially adaptive– Vary knot locations (free knot splines)– Spatially varying smoothing parameter
• Wavelet Regression: has been shown to adjust optimally to varying smoothness (Fan, et al. 1993)
2/25/2008 http://biostatistics.mdanderson.org/Morris 15
Wavelets: Fourier Analysis
•Any function f
in L2[-π,π]
can be represented by a Fourier Series
∑∞
= ⎭⎬⎫
⎩⎨⎧ ++=
1
0 )sin()cos(2
)(j
jjjtbjtaatyπππ
∫
∫
∫
−
−
−
==
==
==
π
π
π
π
π
π
ππ
ππ
ππ
dtjttyjtyb
dtjttyjtya
dttyya
j
j
)sin()()sin(,
)cos()()cos(,
21)(
21,0
•Coefficients (aj
,bj
)
describe behavior of the function at frequency j
•{1/
%2π, 1/
%πcos(jt), 1/
%π sin(jt) j=1, …, ∞} form a complete orthonormal
basis
for L2[-π,π]
– + fj
,fk
, = I(j=k)
•This set of functions spans L2[-π,π]
2/25/2008 http://biostatistics.mdanderson.org/Morris 16
Wavelets: Fourier Analysis• Fourier Series also have a complex representation:
πππψψ
2)sin(
2)cos(
2)(for )( jtijtet(t)dty
j
ijt
jjj +=== ∑∞
−∞=∫−
−
==π
π πψ dtetyyd
ijt
jj 2)(,
• {ψj
, j=-∞, …, ∞}
form a complete orthonormal
basis for L2[-π,π]• Discrete Fourier Analysis: For signal defined on an equally
spaced grid of size T, a fast algorithm {FFT, O(TlogT)}
is available for computing Fourier coefficients from the observed signal. (Transform data to frequency domain: T T)
• Fourier analysis useful for signal processing, but has important limitation: all location information lost in frequency domain
– Useful for processing stationary signals, but not so much for nonstationary– What if we had a set of basis functions that decomposed information in
both the frequency
and time
domain? We do! Wavelets!!
2/25/2008 http://biostatistics.mdanderson.org/Morris 17
Wavelets )()(
,∑
ℑ∈
=kj
jkjk tdty ψ )2(2)( 2/ ktt jjjk −= ψψ
∫ℜ
== dtttyyd jkjkjk )()(, ψψ
• Coefficient djk
summarizes behavior of the function at scale (frequency) indexed by j, and location indexed by k.
• {ψjk
, j,k=-∞, …,∞}
form a complete orthonormal
basis for L2(ℜ)• The mother wavelet basis function,
ψ, has several properties:
– Iψ(t)dt=0, Iψ2(t)dt=1– It has a corresponding father wavelet
ϕ such that +ϕ,ψ,=I ϕ(t)ψ(t)dt=0 (also called a scaling function), Iϕ(t)dt=1, Iϕ2(t)dt=1
– It has compact or vanishing support (unlike Fourier bases)– It generates a multiresolution
analysis
2/25/2008 http://biostatistics.mdanderson.org/Morris 18
Wavelets: Multiresolution Analysis• Let Vj
, j= …, -2, -1, 0, 1, 2, …
be a sequence of function subspaces in L2(ℜ).
The collection of spaces {Vj
, j∈ℑ}
is called a multi- resolution analysis
w/ scaling function ϕ if these conditions hold:
– Orthonormal
Basis: The function ϕ belongs to V0
and the set {ϕ(t-k), k∈ℑ}
is an orthonormal
basis for V0
.– Nested: Vj
⊂Vj+1– Density:
^Vj
=L2(ℜ)– Separation: _Vj
={0}– Scaling: the function f(t)
belongs to Vj
iff
the function f(2jt)
belongs to V0Thus, {2j/2ϕ(2jt-k), k∈ℑ}
is an orthonormal
basis for
Vj
• Note the following properties also hold:– Wj
=space spanned by {2j/2ψ(2jt-k), k∈ℑ}– Vj
⊕Wj
=Vj+1, so one can think of W as “residual space”
2/25/2008 http://biostatistics.mdanderson.org/Morris 19
Wavelets: Haar Wavelet Example
•The simplest and oldest wavelet is the Haar
wavelet
⎪⎩
⎪⎨
⎧∈−∈
=
=
otherwise 0)1,5.0[for 1
)5.0,0[for 1)(
)()( )1,0[
tt
t
tIt
ψ
ϕ
• Iϕ(t)ψ(t)dt=0
Iϕ(t)dt=1
Iϕ2(t)dt=1
Iψ(t)dt=0
Iψ2(t)dt=1
• V0
=space spanned by {ϕ(t-k), k∈ℑ}
(piecewise const, jumps 1)• W0
=space spanned by {ψ(t-k), k∈ℑ}, • V1
=space from {21/2ϕ(2t-k), k∈ℑ}
(piecewise const, jumps ½)
• V0 ⊂V1
, V1
=V0 ⊕W0
, ^Vj
=L2(ℜ)
2/25/2008 http://biostatistics.mdanderson.org/Morris 20
Wavelets: Haar Wavelet Example
⎪⎩
⎪⎨
⎧∈−∈
==otherwise 0
)1,5.0[for 1)5.0,0[for 1
)( )()( 00)1,0[00 tt
ttIt ψϕ•Consider L2{[0,1)}
2/25/2008 http://biostatistics.mdanderson.org/Morris 21
Wavelets: Haar Wavelet Example
⎪⎩
⎪⎨
⎧
∈−∈
==otherwise 0
)5.0,25.0[for 2)25.0,0[for 2
)( )(2)( 2/12/1
10)5.0,0[2/1
10 tt
ttIt ψϕ⎪⎩
⎪⎨
⎧
∈−∈
==otherwise 0
)0.1,75.0[for 2)75.0,5.0[for 2
)( )(2)( 2/12/1
11)1,5.0[2/1
11 tt
ttIt ψϕ
2/25/2008 http://biostatistics.mdanderson.org/Morris 22
Wavelets: Haar Wavelet Example
⎪⎩
⎪⎨
⎧∈−∈
==otherwise 0
)25.0,125.0[for 2)125.0,0[for 2
)( )()( 20)25.0,0[20 tt
ttIt ψϕ⎪⎩
⎪⎨
⎧∈−
∈==
otherwise 0)5.0,375.0[for 2)375.0,25.0[for 2
)( )()( 21)5.0,25.0[21 tt
ttIt ψϕ⎪⎩
⎪⎨
⎧∈−∈
==otherwise 0
)75.0,625.0[for 2)625.0,5.0[for 2
)( )()( 22)75.0,5.0[22 tt
ttIt ψϕ⎪⎩
⎪⎨
⎧∈−
∈==
otherwise 0)1,875.0[for 2)875.0,75.0[for 2
)( )()( 23)00.1,75.0[23 tt
ttIt ψϕ
2/25/2008 http://biostatistics.mdanderson.org/Morris 23
Wavelets: Haar Wavelet Example
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.30.10.20.00.60.20.00.1
y
2/25/2008 http://biostatistics.mdanderson.org/Morris 24
Wavelets: Haar Wavelet Example
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.30.10.20.00.60.20.00.1
y
2/25/2008 http://biostatistics.mdanderson.org/Morris 25
Wavelets: Haar Wavelet Example
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.30.10.20.00.60.20.00.1
y
2/25/2008 http://biostatistics.mdanderson.org/Morris 26
Wavelets: Haar Wavelet Example
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.30.10.20.00.60.20.00.1
y
2/25/2008 http://biostatistics.mdanderson.org/Morris 27
Wavelets: Haar Wavelet Example
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.30.10.20.00.60.20.00.1
y
2/25/2008 http://biostatistics.mdanderson.org/Morris 28
Wavelets: Haar Wavelet Example
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
−−−
−−
=
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
4.04.08.0
2.028.099.03.05.1
'
0.30.10.20.00.60.20.00.1
'
23
22
21
20
11
10
00
00
dddddddc
dy
• Full rank linear transformation from data space
to wavelet space
• Can be written y=dW
4444 34444 21L
MOM
L
L
Matrix Transform Wavelet lOrthonorma
823123
800100
800100
)()(
)()()()(
'
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
tt
tttt
W
ψψ
ψψϕϕ
• W’W=WW’=I• Thus yW’=dWW’
– d=yW’
2/25/2008 http://biostatistics.mdanderson.org/Morris 29
Wavelets: Haar Wavelet Example
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.30.10.20.00.60.20.00.1
y
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.30.10.20.00.60.20.00.1
cccccccc
)0(
37
36
35
34
33
32
31
30
d
Level 0 Decomposition
2/25/2008 http://biostatistics.mdanderson.org/Morris 30
Wavelets: Haar Wavelet Example
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.30.10.20.00.60.20.00.1
y
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.4-0.4-0.8-0.20.80.41.60.2
ddddcccc
)1(
23
22
21
20
23
22
21
20
d
Level 1 Decomposition
2/25/2008 http://biostatistics.mdanderson.org/Morris 31
Wavelets: Haar Wavelet Example
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.30.10.20.00.60.20.00.1
y
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.4-0.4-0.8-0.20.28-0.990.851.27
ddddddcc
)2(
23
22
21
20
11
10
11
10
d
Level 2 Decomposition
2/25/2008 http://biostatistics.mdanderson.org/Morris 32
Wavelets: Haar Wavelet Example
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.30.10.20.00.60.20.00.1
y
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0.4-0.4-0.8-0.20.28-0.990.31.5
dddddddc
)3(
23
22
21
20
11
10
00
00
d
Level 3 (full) Decomposition
2/25/2008 http://biostatistics.mdanderson.org/Morris 33
Wavelets: DWT Pyramid Algorithm
y=cJ
d0
c0
dJ-2
cJ-2
dJ-1
cJ-1
• Consider function
y(t) observed on equally spaced grid size T=2J
Decomposition Level:
0 1 2 J (full)
…s sss
d d d d
2/25/2008 http://biostatistics.mdanderson.org/Morris 34
Wavelets: Filter Coefficients
• Note: Because of the special construction of the wavelet coefficients, each inner product only involves multiplication of a portion of the data vector by [2-1/2 2-1/2]
or [2-1/2
-2-1/2]
. These
are called the Haar’s
filter coefficients.
• Thus, the matrix W can be generated by convolutions of these filter coefficients within a banded structure. These allow this pyramid-
based algorithm to run very quickly O(T). (DWT/IDWT)
2/25/2008 http://biostatistics.mdanderson.org/Morris 35
Wavelets: Daubechies Family
• Properties of Haar
wavelets:– Iϕ(t)ψ(t)dt=0
Iϕ(t)dt=Iϕ2(t)dt=1
Iψ(t)dt=Iψ2(t)dt=0, MRA
– Orthonormal
transform (y=dW
for orthogonal W)– W determined by two “filter coefficients”: [2-1/2, 2-1/2]– Has compact support on [0,1)– Vanishing “zeroth”
moment: It0ψ(t)dt=Iψ(t)dt=0
• Other wavelet bases:– Suppose we want wavelet with orthonormality, compact
support, but also want vanishing 1st
moment Itψ(t)dt=0– Is there a wavelet/scaling function pair with these properties?
2/25/2008 http://biostatistics.mdanderson.org/Morris 36
Wavelets: Daubechies Family
• No closed form expression
• Determined recursively by just 4 filter coefficients
• Compact support on [0,3)
⎥⎦
⎤⎢⎣
⎡ ++−−4
31,4
33,4
33,4
31
2/25/2008 http://biostatistics.mdanderson.org/Morris 37
Wavelets: Daubechies Family
• Daubechies
family of wavelets (Daub 1988):• Indexed by number of vanishing moments (N)
– All Daub wavelets have the properties:• Iϕ(t)ψ(t)dt=0• Iϕ(t)dt=Iϕ2(t)dt=1• Iψ(t)dt=Iψ2(t)dt=0• Multiresolution
Analysis (MRA)
• Orthonormal
transform (y=dW
for orthogonal W)– Determined by L=2N
filter coefficients
– Has compact support on [0,2N-1)– N
vanishing moments: Itaψ(t)dt=0 for
a=0, 1, 2,…, N-1
2/25/2008 http://biostatistics.mdanderson.org/Morris 38
Wavelets: Daubechies Family
2/25/2008 http://biostatistics.mdanderson.org/Morris 39
Wavelets: Coiflet Family
• What if we also require L vanishing moments for the scaling function ϕ?
• Coiflet
Family
(Daubechies
1992, for R Coifman) (N)• All Coiflet
wavelets
have the properties:– Iϕ(t)ψ(t)dt=0– Iϕ(t)dt=Iϕ2(t)dt=1– Iψ(t)dt=Iψ2(t)dt=0– Multiresolution
Analysis (MRA)– Orthonormal
transform (y=dW
for orthogonal W)• Completely determined by
L=6N
filter coefficients
• Has compact support on [0,6N-1)• Wavelets:
2N
vanishing moments: Itaψ(t)dt=0 for
a=0, 1, 2,…, 2N-1• Scaling:
2N-1
vanishing moments: Itaϕ(t)dt=0 for
a=0, 1, 2,…, 2N-2
2/25/2008 http://biostatistics.mdanderson.org/Morris 40
Wavelets: Coiflet Family
2/25/2008 http://biostatistics.mdanderson.org/Morris 41
Wavelets: Summary∑
ℑ∈
=kj
jkjk tdty,
)()( ψ
)2(2)( 2/ ktt jjjk −= ψψ
∫= dtttyd jkjk )()( ψ
}
{
}TT
T
T ×
×
×
= Wdy1
1 }
{
}TT
T
T ×
×
×
= 'Wyd1
1
• Given T-vector y consisting of function sampled on equally- spaced grid, a pyramid-based algorithm
for DWT (Mallat) can
be used to obtain d, T
–vector of wavelet coefficients for the given family, in O(T)
operations (converse also true)
• We never need to compute or think about ψ or ϕ, since we can apply the DWT using only the wavelet family’s filter coefficients
Linear Representation:
2/25/2008 http://biostatistics.mdanderson.org/Morris 42
Wavelets: DWT Pyramid Algorithm
y=cJ
d0
c0
dJ-2
cJ-2
dJ-1
cJ-1
• Consider function
y(t) observed on equally spaced grid size T=2J
Decomposition Level:
0
1
2
J (full)
• DWT: Can be computed in O(T),
Also IDWT• Note:
The low pass and high pass filters, s
and d, are
determined solely by the wavelet’s filter coefficients
…s sss
d d d d
2/25/2008 http://biostatistics.mdanderson.org/Morris 43
Wavelets: Practical Issues
• T
not power of 2?– Adjustments to DWT possible (Percival&Walden
2000)
• Choice of wavelet basis
:match data features– Piecewise constant (aCGH): Haar– Smoother functions: Daub#N
with N higher
– Symmetric features?: Symmlets
or Coiflets– Tradeoff: smoothness vs. filter length
• Boundary correction
issues (all but Haar)– Periodic, Pad with zeros, Reflection
• J=Number of
levels
of decomposition– Full decomposition: J=log2
(T)– Choose
J: floor(KJ-1
/2)>L-1 (at least one coefficient in middle unaffected by boundary conditions, PW2000)
2/25/2008 http://biostatistics.mdanderson.org/Morris 44
Wavelets: Properties• Very fast calculation : O(T)• Compact support
: good for spatially heterogeneous
data• Signal compression: concentration on few d• Orthogonality: distributes white noise across d• Whitening: autocorrelation of wavelet coefficients
tends to die away rapidly across k; djk
approximately uncorrelated (Johnstone
and Silverman 1997)
• Time/frequency
decomposition: key to adaptive smoothing
2/25/2008 http://biostatistics.mdanderson.org/Morris 45
ε+= )(ty g1.
Project data into wavelet space
using DWT.
d=yW’
where W’
is the orthogonal DWT matrix
Recall:
d={djk
, j=1, …, J; k=1, …, Kj
}
),0(~ 2 TIN σε
•Wavelet Regression: –Row vector y: response on equally-spaced grid t
(length T)
Wavelets: Wavelet Regression
2/25/2008 http://biostatistics.mdanderson.org/Morris 46
1. Project data into wavelet space
using DWT.
d=yW’
where W’
is the orthogonal DWT matrix
),0(~ 2 TIN σε'')(' WWgW ε+= ty
•Wavelet Regression: –Row vector y: response on equally-spaced grid t
(length T)
Wavelets: Wavelet Regression
2/25/2008 http://biostatistics.mdanderson.org/Morris 47
1. Project data into wavelet space
using DWT.
d=yW’
where W’
is the orthogonal DWT matrix
),0(~ 2 TIN σε'')( WWg ε+= td
•Wavelet Regression: –Row vector y: response on equally-spaced grid t
(length T)
Wavelets: Wavelet Regression
2/25/2008 http://biostatistics.mdanderson.org/Morris 48
1. Project data into wavelet space
using DWT.
d=yW’
where W’
is the orthogonal DWT matrix
Note:
θ={θjk
: j=1, …, J; k=1, …, Kj
}By sparsity
property, we expect few
|θjk
|>0
'Wε+= θd ),0(~ 2 TIN σε
•Wavelet Regression: –Row vector y: response on equally-spaced grid t
(length T)
Wavelets: Wavelet Regression
2/25/2008 http://biostatistics.mdanderson.org/Morris 49
1. Project data into wavelet space
using DWT.
d=yW’
where W’
is the orthogonal DWT matrix
Note: ε*= εW’~N(0,Wσ2IT
W’)~N(0,σ2WW’)~N(0, σ2IT
) since WW’=IT (orthogonality)
*ε+= θd ),0(~ 2 TIN σε
•Wavelet Regression: –Row vector y: response on equally-spaced grid t
(length T)
Wavelets: Wavelet Regression
2/25/2008 http://biostatistics.mdanderson.org/Morris 50
Wavelets: Wavelet Regression•Wavelet Regression:
–Row vector y: response on equally-spaced grid t
(length T)
Wtg θ̂)(ˆ =
1. Project data into wavelet space
using DWT.
d=y W’
where W’
is the orthogonal DWT matrix
*ε+= θd ),0(~ 2* TIN σε
2. Estimate θ
by hard thresholding
: θjk
=djk
*I(|djk
|>δσ2)
• Yields adaptive regularized nonparametric estimate of g(t).
3. Project back to data space
using IDWT
(DJ1994) threshold Universalis )log(2 T=δ
2/25/2008 http://biostatistics.mdanderson.org/Morris 51
Wavelets: Wavelet Regression
2/25/2008 http://biostatistics.mdanderson.org/Morris 52
Wavelets: Wavelet Regression
2/25/2008 http://biostatistics.mdanderson.org/Morris 53
Return
Wavelets: Wavelet Regression
2/25/2008 http://biostatistics.mdanderson.org/Morris 54
Wavelets: Wavelet RegressionSpiky function: Local Linear Regression
2/25/2008 http://biostatistics.mdanderson.org/Morris 55
Wavelets: Wavelet RegressionSpiky function: Wavelet Regression
2/25/2008 http://biostatistics.mdanderson.org/Morris 56
Donoho
and Johnstone’s
Universal threshold
Wavelets: Wavelet Regression
2/25/2008 http://biostatistics.mdanderson.org/Morris 57
Wavelets: Wavelet Regression
Local Quadratic Regression: maximum h to preserve spike
2/25/2008 http://biostatistics.mdanderson.org/Morris 58
Wavelets: Wavelet Regression• Hard vs. Soft Thresholding
– Hard thresholding: θjk
=djk
*I(|djk
|>δσ2)– Soft thresholding: θjk
=sgn(djk
)×(|djk
|-δσ2)+– Hard: preserves peak heights better at cost of smoothness
• Universal vs. Level Dependent Thresholds– Level dependent thresholds δσj2
perform even better
2/25/2008 http://biostatistics.mdanderson.org/Morris 59
Wavelets: Wavelet Regression
• Bayesian hierarchical approach
: Prior distribution on θjk
can mimic the idea of thresholding
),( 2σθ jkjk Nd = 022 )1(),0( δγστγθ jkjkjk jN −+=
)(Bernoulli jjk πγ =
• τj2
and πj
are regularization parameters– Can be specified, or estimated from data using ML in
empirical Bayes
fashion (Clyde and George 2000)
• Conjugate model (given σ2), so closed form (θjk
|djk
, σ2)
2/25/2008 http://biostatistics.mdanderson.org/Morris 60
Wavelets: Wavelet Regression( ) 022 )1(,),|( IPSFSFdNPd jkjjjkjkjkjk −+= σσθ
( ) OddsPosterior ,1
|1 Pr =+
== jkjk
jkjkjk OO
Odγ
{( )
444 3444 2143421Factor BayesOddsPrior
OddsPosterior ⎟⎟⎠
⎞⎜⎜⎝
⎛+⎟
⎟⎠
⎞⎜⎜⎝
⎛
−=
−
2
212
2exp1
1 στ
ππ jjk
jj
jjk
SFdO
factor" shrinkageLinear "12
2
=+
=j
jjSF τ
τ
factor" shrinkageNonlinear ")|1Pr( === jkjkjk dP γ
jjkjkjk
jkjjkjk
jkjkjk
SFPd
PISFd
PId
=
>=
>=
shrinkage,
hsoft_thres,
hhard_thres,
ˆ)(ˆ
)(ˆ
θ
δθ
δθ
2/25/2008 http://biostatistics.mdanderson.org/Morris 61
Wavelets: Wavelet Regression
σ2=1, τ2=10, π=0.20, δ=0.50
jjkjkjk
jkjjkjk
jkjkjk
SFPd
PISFd
PId
=
>=
>=
shrinkage,
hsoft_thres,
hhard_thres,
ˆ)(ˆ
)(ˆ
θ
δθ
δθ
2/25/2008 http://biostatistics.mdanderson.org/Morris 62
Wavelets: Wavelet Regression
σ2=1, τ2=10, π=0.20
jkjkjk
jjkjk
jjkjkjk
Pd
SFd
SFPd
=
=
=
nonlinear,
linear,
shrinkage,
ˆ
ˆ
ˆ
θ
θ
θ
2/25/2008 http://biostatistics.mdanderson.org/Morris 63
Wavelets: Wavelet Regression
σ2=1, τ2=5, π=0.20
jkjkjk
jjkjk
jjkjkjk
Pd
SFd
SFPd
=
=
=
nonlinear,
linear,
shrinkage,
ˆ
ˆ
ˆ
θ
θ
θ
2/25/2008 http://biostatistics.mdanderson.org/Morris 64
Wavelets: Wavelet Regression
σ2=1, τ2=2, π=0.20
jkjkjk
jjkjk
jjkjkjk
Pd
SFd
SFPd
=
=
=
nonlinear,
linear,
shrinkage,
ˆ
ˆ
ˆ
θ
θ
θ
Wavelets and Wavelet Regression Nonparametric RegressionNonparametric Regression: Kernel EstimatorsNonparametric Regression:�N-W Local Mean SmootherNonparametric Regression:�N-W Local Mean Smoother (Bandwidth)Nonparametric Regression: Kernel EstimatorsNonparametric Regression:�Local Quadratic SmootherNonparametric Regression:�Local Quadratic Smoother (Bandwidth)Nonparametric Regression: �Smoothing SplinesNonparametric Regression: �Regression SplinesNonparametric Regression: �Penalized Regression SplinesNonparametric Regression: �Penalized Regression SplinesNonparametric Regression:Nonparametric Regression�Spatially Adaptive Function EstimationWavelets: Fourier AnalysisWavelets: Fourier AnalysisWaveletsWavelets: Multiresolution AnalysisWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: Haar Wavelet ExampleWavelets: DWT Pyramid AlgorithmWavelets: Filter CoefficientsWavelets: Daubechies FamilyWavelets: Daubechies FamilyWavelets: Daubechies FamilyWavelets: Daubechies FamilyWavelets: Coiflet FamilyWavelets: Coiflet FamilyWavelets: SummaryWavelets: DWT Pyramid AlgorithmWavelets: Practical IssuesWavelets: PropertiesWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionSpiky function: Local Linear RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet RegressionWavelets: Wavelet Regression