22
1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced Studies Mathematical Explorations in Contemporary Statistics May 19-20, 2008. Sestri Levante, Italy

1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

Embed Size (px)

Citation preview

Page 1: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

1

Reproducing Kernel Exponential Manifold: Estimation and Geometry

Kenji FukumizuInstitute of Statistical Mathematics, ROIS

Graduate University of Advanced Studies

Mathematical Explorations in Contemporary Statistics

May 19-20, 2008. Sestri Levante, Italy

Page 2: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

2

Outline Introduction

Reproducing kernel exponential manifold (RKEM)

Statistical asymptotic theory of singular models

Concluding remarks

Page 3: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

3

Introduction

Page 4: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

4

Maximal Exponential Manifold Maximal exponential manifold (P&S‘95)

A Banach manifold is defined so that the cumulant generating function is well-defined on a neighborhood of each probability density.

Orlicz space Lcosh-1(f)

This space is (perhaps) the most general to guarantee the finiteness of the cumulant generating functions around a point.

,))(exp( fuuf fu ][log)( uff eEu

][and][s.t.0| uf

uf eEeEu

Page 5: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

5

Estimation with Data

Estimation with a finite sample A finite dimensional exponential family is suitable for the maximum

likelihood estimation (MLE) with a finite sample.

MLE: that maximizes

Is MLE extendable to the maximal exponential manifold?

But, the function value u(Xi) is not a continuous functional on u in the exponential manifold.

n

i

m

a iaa

nn Xun 1

1)()(

1)X;(

01 :,, fXX n ~i.i.d.

n

ifinn uXu

nu

1

)()(1

)X;(

nn XX ,,X 1

Page 6: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

6

Reproducing kernel exponential manifold

Page 7: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

7

Reproducing Kernel Hilbert Space Reproducing kernel Hilbert space (RKHS)

: set. A Hilbert space H consisting of functions on is called a reproducing kernel Hilbert space (RKHS) if the evaluation functional

is continuous for each

A Hilbert space H consisting of functions on is a RKHS if and only if there exists (reproducing kernel) such that

(by Riesz’s lemma)

H ),( xk

)(),,( xffxk H ., xf H

)(,: xffex RH

.x

Page 8: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

Reproducing Kernel Hilbert Space II Positive definite kernel and RKHS

A symmetric kernel k: x R is said to be positive definite, if for any and

Theorem (construction of RKHS)

If k: x R is positive definite, there uniquely exists a RKHS Hk on such that

(1) for all

(2) the linear hull of is dense in Hk ,

(3) is a reproducing kernel of Hk, i.e.,

8

nxx ,,1

,0),(1, n

ji jiji xxkcc

,,,1 Rncc

}|),({ xxk

,xH ),( xk

),( xk

)(),,( xffxkk

H ., xf kH

Page 9: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

9

Reproducing Kernel Hilbert Space III Some properties

If the pos. def. kernel k is of Cr, so is every function in Hk.

If the pos. def. kernel k is bounded, so is every function in Hk.

Examples: positive definite kernels on Rm

Euclidean inner product

Gaussian RBF kernel

Polynomial kernel

yxyxk T),(

dT cyxyxk )(),( ),0( N dc

22exp),( yxyxk

Hk = {polyn. deg d}≦

dim Hk = ∞

mk RH

Page 10: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

10

Exponential Manifold by RKHS Definitions

: topological space. : Borel probability measure on s.t. supp = .

k : continuous pos. def. kernel on such that Hk contains 1 (constants).

Note: If || u || < ,

Tangent space

,1),(0)( ,:::)( fdxxff|fkM continuousR

)()(,0 ),( xdxfe xxk

0)]([|: XuEuT fkf H closed subspace of Hk

M(k) is provided with a Hilbert manifold structure.

.][][][ ),(||||),(,)( XXkuf

Xkuf

Xuf eEeEeE

Page 11: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

11

Exponential Manifold by RKHS II Local coordinate

For

Then, for any

Define

Lemma

(1) Wf is an open subset of Tf.

(2)

fXXkXu

fff TeETuW ][,0|: ),()( ),(kMf

fWu

).())(exp(: kMfuuf fu

uff fukMW ),(: (one-to-one)1,: fffff WS

)(: fff WE

works as a local coordinate

.gffg EEE

Page 12: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

12

Exponential Manifold by RKHS III Reproducing Kernel Exponential Manifold (RKEM)

Theorem. The system is a -atlas of M(k).

A structure of Hilbert manifold is defined on M(k) with Riemannian metric Ef[uv]. Likelihood functional is continuous. The function u(x) is decoupled in the inner product

u: natural coordinate, : sufficient statistics The manifold depends on the choice of k.

e.g = R, = N(0,1), k(x,y) = (xy+1)2. Hk = {polyn. deg ≦ 2}

M(k) = {N(m, ) | m ∈ R, > 0 } : the normal distributions.

)(

),(kMfff

E C

g

fuuE

g

fuuu f

gf

fg

))(exp(log

))(exp(log)(1

g

fuE

g

fu g loglog

coordinate transform

),(, xku ),( xk

Page 13: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

13

Mean parameter of RKEM

Mean parameter For any there uniquely exists such that

The mean parameter does not necessarily give a coordinate, as in the case of the maximal exponential manifold.

Empirical mean parameter X1, …, Xn: i.i.d. sample ~ f.

),(kMf kfm H

kff muXuE

H,)]([ for all .ku H

n

iin Xk

nm

1

),(1

)(1ˆ nnOmm pfnkH

Fact 1.

Empirical mean parameter:

)()(1

,ˆ1

k

n

iin fXf

nfm H

Fact 2.

Page 14: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

14

Applications of RKEM Maximum likelihood estimation (IGAIA2005)

Maximum likelihood estimation with regularization is possible. The consistency of the estimator is proved.

Statistical asymptotic theory of singular models There are examples of statistical model which is a submodel

of an infinite dimensional exponential family, but not embeddable into a finite dimensional exponential family.

For a submodel of RKEM, developing asymptotic theory of the maximum likelihood estimator is easy.

Geometry of RKEM Dual connections can be introduced on the tangent

bundle in some cases.

)1(

Page 15: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

15

Statistical asymptotic theory of singular models

Page 16: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

16

Singular Submodel of exponential family Standard asymptotic theory

Statistical model on a measure space (,B,).

: (finite dimensional) manifold.

“True” density: f0(x) = f(x ;0)

Maximum likelihood estimator (MLE)

Under some regularity conditions,

Likelihood ratio

01 :,, fXX n ~i.i.d.

}|);({ xf

n

iin Xf

1

);(logmaxargˆ

)( 0

))( (0,)ˆ( 100

INn n in law )( n f0MLE

d-dim smooth manifold

Asymptotically normal

2

1 0 );(

)ˆ;(log2)ˆ(2 d

n

i i

ninn Xf

Xf

in law )( n

Page 17: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

17

Singular Submodel of exponential family II Singular submodel in ordinary exponential family

Finite dimensional exponential family M :

Submodel

Tangent cone:

Under some regularity conditions,

}|);({ SMxfS

f0

S

M

SC f0

))()(exp();( xuxf T

)(

)}()( s.t.0,}{|)({ 000 nMTxuSC nnnSnf

Tf

n

i i

ninn Xf

Xf

1 0 );(

)ˆ;(log)ˆ(

)1()(sup2

1 2

11

1||, 200

pni in

T

uESCu

oXuT

ffT

)( n

More explicit formula can be derived in some cases.

projection of empirical mean parameter

Page 18: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

18

Singular submodel in RKEM Submodel of an infinite dimensional exponential

family There are some models, which are not embeddable into a finite

dimensional exponential family, but can be embedded into an infinite dimensional RKEM.

Example:Mixture of Beta distributions (on [0,1])

Singularity at

),1,1;()1()1,;(),;( xBxBxf

11)()(

)( )1(),;( xxxB

0 10

0.5

1

1.5

2

2.5

3

B(x;1,1)

B(x;3,1)

B(x;3,2)

where

)1,1;(),0;()(0 xBxfxf

is not identifiable.

Page 19: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

19

Singular submodel in RKEM II

Hk = Sobolev space H1(0,1)

Submodel of Ef0

Tangent cone at f0 is not finite dimensional.

)1,0(in )0(1:),;(log 11 Hxw

f

}2/3,10|))(exp(),;({ 0,, fuufS f

)],;([log),;(log:)(0, xfExfxu f

S is a submodel of Ef0, and f0 is a singularity of S.

|),|exp(),( yxyxk 1

0

22222 |)(||)('|2

1)1()0(

2

1|||| dxxuxuuuu

kH

)1,0(),;(log 1Hxf .2/3,10 forFact:

Page 20: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

20

Singular submodel in RKEM III General theory of singular submodel

M(k): RKEM.

Submodel defined by

),(kMf

fES fTK ]1,0[:

fSingularity

such that

(1) K: compact set(2) (3) (a,t): Frechet differentiable w.r.t. t and

(4)

00),( tta

),( tat

is continuous on ]1,0[K

0),(min0

ttKata

S

Ef

Page 21: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

Singular submodel in RKEM IV

21

Lemma (tangent cone)

KataSC tft

0|),(

R

Theorem

n

i i

i

Sg Xf

Xg

1 )(

)(logsup )1(ˆ,sup

2

1 2

1||, 2pn

wESCw

omwff

)( n

• Analogue to the asymptotic theory on submodel in a finite dimensional exponential family.• The same assertion holds without assuming exponential family, but the sufficient conditions and the proof are much more involved.

projection of empirical mean parameter

2

1||, 2

sup2

1w

wESCw

Gff

Gw: Gaussian processin law

Page 22: 1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced

22

Summary Exponential Hilbert manifolds, which can be infinite dimensional, is

defined using reproducing kernel Hilbert spaces.

From the estimation viewpoint, an interesting class is submodels of infinite dimensional exponential manifolds, which are not embeddable into a finite dimensional exponential family.

The asymptotic behavior of MLE is analyzed for singular submodels of infinite dimensional exponential manifolds.