Upload
nancy-mcgee
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
1
Reproducing Kernel Exponential Manifold: Estimation and Geometry
Kenji FukumizuInstitute of Statistical Mathematics, ROIS
Graduate University of Advanced Studies
Mathematical Explorations in Contemporary Statistics
May 19-20, 2008. Sestri Levante, Italy
2
Outline Introduction
Reproducing kernel exponential manifold (RKEM)
Statistical asymptotic theory of singular models
Concluding remarks
3
Introduction
4
Maximal Exponential Manifold Maximal exponential manifold (P&S‘95)
A Banach manifold is defined so that the cumulant generating function is well-defined on a neighborhood of each probability density.
Orlicz space Lcosh-1(f)
This space is (perhaps) the most general to guarantee the finiteness of the cumulant generating functions around a point.
,))(exp( fuuf fu ][log)( uff eEu
][and][s.t.0| uf
uf eEeEu
5
Estimation with Data
Estimation with a finite sample A finite dimensional exponential family is suitable for the maximum
likelihood estimation (MLE) with a finite sample.
MLE: that maximizes
Is MLE extendable to the maximal exponential manifold?
But, the function value u(Xi) is not a continuous functional on u in the exponential manifold.
n
i
m
a iaa
nn Xun 1
1)()(
1)X;(
01 :,, fXX n ~i.i.d.
n
ifinn uXu
nu
1
)()(1
)X;(
nn XX ,,X 1
6
Reproducing kernel exponential manifold
7
Reproducing Kernel Hilbert Space Reproducing kernel Hilbert space (RKHS)
: set. A Hilbert space H consisting of functions on is called a reproducing kernel Hilbert space (RKHS) if the evaluation functional
is continuous for each
A Hilbert space H consisting of functions on is a RKHS if and only if there exists (reproducing kernel) such that
(by Riesz’s lemma)
H ),( xk
)(),,( xffxk H ., xf H
)(,: xffex RH
.x
Reproducing Kernel Hilbert Space II Positive definite kernel and RKHS
A symmetric kernel k: x R is said to be positive definite, if for any and
Theorem (construction of RKHS)
If k: x R is positive definite, there uniquely exists a RKHS Hk on such that
(1) for all
(2) the linear hull of is dense in Hk ,
(3) is a reproducing kernel of Hk, i.e.,
8
nxx ,,1
,0),(1, n
ji jiji xxkcc
,,,1 Rncc
}|),({ xxk
,xH ),( xk
),( xk
)(),,( xffxkk
H ., xf kH
9
Reproducing Kernel Hilbert Space III Some properties
If the pos. def. kernel k is of Cr, so is every function in Hk.
If the pos. def. kernel k is bounded, so is every function in Hk.
Examples: positive definite kernels on Rm
Euclidean inner product
Gaussian RBF kernel
Polynomial kernel
yxyxk T),(
dT cyxyxk )(),( ),0( N dc
22exp),( yxyxk
Hk = {polyn. deg d}≦
dim Hk = ∞
mk RH
10
Exponential Manifold by RKHS Definitions
: topological space. : Borel probability measure on s.t. supp = .
k : continuous pos. def. kernel on such that Hk contains 1 (constants).
Note: If || u || < ,
Tangent space
,1),(0)( ,:::)( fdxxff|fkM continuousR
)()(,0 ),( xdxfe xxk
0)]([|: XuEuT fkf H closed subspace of Hk
M(k) is provided with a Hilbert manifold structure.
.][][][ ),(||||),(,)( XXkuf
Xkuf
Xuf eEeEeE
11
Exponential Manifold by RKHS II Local coordinate
For
Then, for any
Define
Lemma
(1) Wf is an open subset of Tf.
(2)
fXXkXu
fff TeETuW ][,0|: ),()( ),(kMf
fWu
).())(exp(: kMfuuf fu
uff fukMW ),(: (one-to-one)1,: fffff WS
)(: fff WE
works as a local coordinate
.gffg EEE
12
Exponential Manifold by RKHS III Reproducing Kernel Exponential Manifold (RKEM)
Theorem. The system is a -atlas of M(k).
A structure of Hilbert manifold is defined on M(k) with Riemannian metric Ef[uv]. Likelihood functional is continuous. The function u(x) is decoupled in the inner product
u: natural coordinate, : sufficient statistics The manifold depends on the choice of k.
e.g = R, = N(0,1), k(x,y) = (xy+1)2. Hk = {polyn. deg ≦ 2}
M(k) = {N(m, ) | m ∈ R, > 0 } : the normal distributions.
)(
),(kMfff
E C
g
fuuE
g
fuuu f
gf
fg
))(exp(log
))(exp(log)(1
g
fuE
g
fu g loglog
coordinate transform
),(, xku ),( xk
13
Mean parameter of RKEM
Mean parameter For any there uniquely exists such that
The mean parameter does not necessarily give a coordinate, as in the case of the maximal exponential manifold.
Empirical mean parameter X1, …, Xn: i.i.d. sample ~ f.
),(kMf kfm H
kff muXuE
H,)]([ for all .ku H
n
iin Xk
nm
1
),(1
:ˆ
)(1ˆ nnOmm pfnkH
Fact 1.
Empirical mean parameter:
)()(1
,ˆ1
k
n
iin fXf
nfm H
Fact 2.
14
Applications of RKEM Maximum likelihood estimation (IGAIA2005)
Maximum likelihood estimation with regularization is possible. The consistency of the estimator is proved.
Statistical asymptotic theory of singular models There are examples of statistical model which is a submodel
of an infinite dimensional exponential family, but not embeddable into a finite dimensional exponential family.
For a submodel of RKEM, developing asymptotic theory of the maximum likelihood estimator is easy.
Geometry of RKEM Dual connections can be introduced on the tangent
bundle in some cases.
)1(
15
Statistical asymptotic theory of singular models
16
Singular Submodel of exponential family Standard asymptotic theory
Statistical model on a measure space (,B,).
: (finite dimensional) manifold.
“True” density: f0(x) = f(x ;0)
Maximum likelihood estimator (MLE)
Under some regularity conditions,
Likelihood ratio
01 :,, fXX n ~i.i.d.
}|);({ xf
n
iin Xf
1
);(logmaxargˆ
)( 0
))( (0,)ˆ( 100
INn n in law )( n f0MLE
d-dim smooth manifold
Asymptotically normal
2
1 0 );(
)ˆ;(log2)ˆ(2 d
n
i i
ninn Xf
Xf
in law )( n
17
Singular Submodel of exponential family II Singular submodel in ordinary exponential family
Finite dimensional exponential family M :
Submodel
Tangent cone:
Under some regularity conditions,
}|);({ SMxfS
f0
S
M
SC f0
))()(exp();( xuxf T
)(
)}()( s.t.0,}{|)({ 000 nMTxuSC nnnSnf
Tf
n
i i
ninn Xf
Xf
1 0 );(
)ˆ;(log)ˆ(
)1()(sup2
1 2
11
1||, 200
pni in
T
uESCu
oXuT
ffT
)( n
More explicit formula can be derived in some cases.
projection of empirical mean parameter
18
Singular submodel in RKEM Submodel of an infinite dimensional exponential
family There are some models, which are not embeddable into a finite
dimensional exponential family, but can be embedded into an infinite dimensional RKEM.
Example:Mixture of Beta distributions (on [0,1])
Singularity at
),1,1;()1()1,;(),;( xBxBxf
11)()(
)( )1(),;( xxxB
0 10
0.5
1
1.5
2
2.5
3
B(x;1,1)
B(x;3,1)
B(x;3,2)
where
)1,1;(),0;()(0 xBxfxf
is not identifiable.
19
Singular submodel in RKEM II
Hk = Sobolev space H1(0,1)
Submodel of Ef0
Tangent cone at f0 is not finite dimensional.
)1,0(in )0(1:),;(log 11 Hxw
f
}2/3,10|))(exp(),;({ 0,, fuufS f
)],;([log),;(log:)(0, xfExfxu f
S is a submodel of Ef0, and f0 is a singularity of S.
|),|exp(),( yxyxk 1
0
22222 |)(||)('|2
1)1()0(
2
1|||| dxxuxuuuu
kH
)1,0(),;(log 1Hxf .2/3,10 forFact:
20
Singular submodel in RKEM III General theory of singular submodel
M(k): RKEM.
Submodel defined by
),(kMf
fES fTK ]1,0[:
fSingularity
such that
(1) K: compact set(2) (3) (a,t): Frechet differentiable w.r.t. t and
(4)
00),( tta
),( tat
is continuous on ]1,0[K
0),(min0
ttKata
S
Ef
Singular submodel in RKEM IV
21
Lemma (tangent cone)
KataSC tft
0|),(
R
Theorem
n
i i
i
Sg Xf
Xg
1 )(
)(logsup )1(ˆ,sup
2
1 2
1||, 2pn
wESCw
omwff
)( n
• Analogue to the asymptotic theory on submodel in a finite dimensional exponential family.• The same assertion holds without assuming exponential family, but the sufficient conditions and the proof are much more involved.
projection of empirical mean parameter
2
1||, 2
sup2
1w
wESCw
Gff
Gw: Gaussian processin law
22
Summary Exponential Hilbert manifolds, which can be infinite dimensional, is
defined using reproducing kernel Hilbert spaces.
From the estimation viewpoint, an interesting class is submodels of infinite dimensional exponential manifolds, which are not embeddable into a finite dimensional exponential family.
The asymptotic behavior of MLE is analyzed for singular submodels of infinite dimensional exponential manifolds.