35
Reproducing kernel Hilbert spaces Nuno Vasconcelos ECE Department, UCSD

Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

  • Upload
    others

  • View
    14

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

Reproducing kernel Hilbert spaces

Nuno Vasconcelos ECE Department, UCSD

Page 2: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

2

Classificationa classification problem has two types of variables

• X - vector of observations (features) in the world• Y - state (class) of the world

Perceptron: classifier implements the linear decision rule

with

appropriate when the classes arelinearly separableto deal with non-linear separabilitywe introduce a kernel

[ ]g(x)sgn )( =xh bxwxg T +=)( w

wb

x

wxg )(

Page 3: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

3

Kernel summary1. D not linearly separable in X, apply feature transformation Φ:X → Z,

such that dim(Z) >> dim(X)

2. computing Φ(x) too expensive:• write your learning algorithm in dot-product form• instead of Φ(xi), we only need Φ(xi)TΦ(xj) ∀ij

3. instead of computing Φ(xi)TΦ(xj) ∀ij, define the “dot-product kernel”

and compute K(xi,xj) ∀ij directly• note: the matrix

is called the “kernel” or Gram matrix

4. forget about Φ(x) and use K(x,z) from the start!

)()(),( zxzxK T ΦΦ=

⎥⎥⎥

⎢⎢⎢

⎡=

M

LL

M

),( ji xxKK

Page 4: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

4

Polynomial kernelsthis makes a significant difference when K(x,z) is easier to compute that Φ(x)TΦ(z)e.g., we have seen that

while K(x,z) has complexity O(d), Φ(x)TΦ(z) is O(d2)for K(x,z) = (xTz)k we go from O(d) to O(dk)

( )

( )Tddddd

d

TT

xxxxxxxxxxxxx

x

zxzxzxK

,,,,,,,,

)()(),(

2112111

1

2

LLLM →⎟⎟⎟

⎜⎜⎜

ℜ→ℜΦ

ΦΦ==

: with2dd

Page 5: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

5

Questionin practice:• pick a kernel from a library of known kernels• we talked about

• the linear kernel K(x,z) = xTz• the Gaussian family

• the polynomial family

what if this is not good enough?• how do I know if a function k(x,y) is a dot-product kernel?

σ

2

),(zx

ezxK−

−=

( ) { }L,2,1,1),( ∈+= kzxzxK kT

Page 6: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

6

Dot-product kernelslet’s start by the definitionDefinition: a mapping

k: X x X → ℜ(x,y) → k(x,y)

is a dot-product kernel if and only if

k(x,y) = <Φ(x),Φ(y)>where Φ: X → H,

H is a vector space, <.,.> is a dot-product in H

note that both H and <.,.> can be abstract, not necessarily ℜd

xx

x

x

x

xxx

xxx

x

oo

o

o

ooo

ooo

oo

x1

x2

xx

x

x

x

xxx

xxx

x

oo

o

o

oo oo

o ooox1

x3x2

xn

Φ

X

H

Page 7: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

7

Dot product vs positive definite kernelsthat is pretty abstract. how do I turn it into something I can compute?Theorem: k(x,y), x,y∈ X, is a dot-product kernel if and only if it is a positive definite kernelthis can be checkedlet’s start by the definitionDefinition: k(x,y) is a positive definite kernel on X xX if ∀ l and ∀ {x1, ..., xl}, xi∈ X, the Gram matrix

is positive definite.

[ ] ),( jiij xxkK =

Page 8: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

8

Positive definite matricesrecall that (e.g. Linear Algebra and Applications, Strang)

Definition: each of the following is a necessary and sufficient condition for a real symmetric matrix A to be positive definite:

i) xTAx ≥ 0, ∀ x ≠ 0 ii) all eigenvalues of A satisfy λi ≥ 0iii) all upper-left submatrices Ak have non-negative determinantiv) there is a matrix R with independent rows such that

A = RTR

upper left submatrices:

L

3,32,31,3

3,22,21,2

3,12,11,1

32,21,2

2,11,121,11

⎥⎥⎥

⎢⎢⎢

⎡=⎥

⎤⎢⎣

⎡==

aaaaaaaaa

Aaaaa

AaA

Page 9: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

9

Dot product vs positive definite kernelsequivalence between dot product and positive definite automatically proves some simple properties• dot-product kernels are non-negative functions

• dot-product kernels are symmetric

• Cauchy-Schwarz inequality for dot-product kernels

• note that this is just

Xxxxk ∈∀≥ 0),(

Xyxxykyxk ∈∀= ,),,(),(

Xyxyykxxkyxk ∈∀≤ ,),,(),(),( 2

1,

11),(),(

),(1

cos

≤≤⇔≤≤−yyx

-yykxxk

yxkx

444 3444 21

θ

Page 10: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

10

Dot product vs positive definite kernelsthe proof actually gives insight on what the kernel does• we defined H as the space spanned by linear combinations of

k(.,xi)

H =

• e.g. the the space of all linear combinations of Gaussians when the we adopt the Gaussian kernel

• note: this is a function of thefirst argument, xi is fixed

( )⎭⎬⎫

⎩⎨⎧

∈∀∀= ∑=

m

iiii Xxmxkff

1,,.,(.)|(.) α

2

2.

)(., σix

i exK−

−=

Page 11: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

11

Dot product vs positive definite kernels• if f(.) and g(.) ∈ H, with

• we defined the dot-product <.,.>* on H as

• for the Gaussian kernel this is

• a dot product in H• a non-linear measure of similarity in X, somewhat related to

likelihoods

( ) ( )∑∑==

=='

11

'.,(.).,(.)m

jjj

m

iii xkgxkf βα

( )∑∑= =

=m

iji

m

jji xxkgf

1

'

1*

',, βα

∑∑=

−−

=

=m

i

xxm

jji

ji

egf1

''

1*

2

2

, σβα

Page 12: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

12

The dot-product <.,.>*

it is not hard to show thatthis means that

with

the feature transformation associated with the kernel k(x,y) sends the points xi to the functions k(.,xi)furthermore, <.,.>* is itself a dot-product kernel on H x H

( )jiji xxkxkxk ,)(.,),(.,*

=

( )*

)(),(, jiji xxxxk ΦΦ=

Φ: X → Hx → k(.,x)

Page 13: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

13

The reproducing propertywith this definition of H and <.,.>*

∀ f ∈ H, <k(.,x),f(.)>* = f(x)

this is called the reproducing propertyan analogy is to think of linear time-invariant systems• the dot product as a convolution• k(.,x) as the Dirac delta• f(.) as a system input• the equation above is the basis of all linear time invariant systems

theory

we will see that it also plays a fundamental role in the theory (and use) of reproducing Kernel Hilbert Spaces

Page 14: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

14

The big picturewhen we use the Gaussian kernel• the point xi ∈ X is mapped into the Gaussian G(x,xi,σI)• H is the space of all functions that are linear combinations of

Gaussians• the kernel is a dot product in H, and a non-linear

similarity on X

• reproducing property on H: analogy to linear systems

σ

2

),(ixx

i exxK−

−=

xx

x

x

x

xx

x

xx

x

x

oo

o

o

o

o oo

o oo

ox1

x3x2

xn

xx

x

x

x

xx

x

xx

x

x

oo

o

o

o

ooo

ooo

o

x1

x2

Φ

X H*

*

*

Page 15: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

15

Hilbert spacesplay an important role in functional analysiswe will do very informal reviewH is a vector space with a dot product:• this is known as a “dot-product space” or a “pre-Hilbert” space

the difference to a Hilbert space is mostly technicalDefinition: a Hilbert space is a complete dot-product spacewhat do we mean by completeness? Definition: S is complete if all Cauchy sequences in Sconvergethis is useful mostly to prove convergence results

Page 16: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

16

Cauchy sequencesjust for completeness (no pun intended)Definition: a sequence {x1,x2,...} in a normed space is a Cauchy sequence if

you can picture this as

why might this not converge? well, the limit point could be outside the spacewhy is this important? complete ⇒ convergent = Cauchy

εε ≤−>∀∈∃≥∀ "',",',0 nn xxnnnNn s.t.

ε

n

Page 17: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

17

reproducing kernel Hilbert spacesto turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the norm

this is just adding to H the limit points of all its Cauchy sequenceswe represent the completion of H by Hmore than an Hilbert space, H becomes a reproducing kernel Hilbert space (RKHS)

**, fff =

Page 18: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

18

reproducing kernel Hilbert spacesDefinition: Let H be a Hilbert space of functions f: X → R. H is a RKHS endowed with dot-product <.,.>* if ∃ k: X x X → R such that

1.k spans H, i.e., ∃ {xi}, {αi}, such that

H =

2.<f(.),k(.,x)>* = f(x), ∀ f ∈ H,

in summary, • the H we built can easily be transformed into a RKHS by

adding to it the limit points (functions) of all Cauchy seqs

Question: is there a one-to-one mapping between kernels and RKHSs?

{ }⎭⎬⎫

⎩⎨⎧

== ∑i

ii xkffxkspan )(.,(.)|(.))(., α

Page 19: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

19

kernels vs RKHSsanswer: RKHS does indeed specify kernel uniquelyproof:• assume that kernels k1 and k2 span H

• using reproducing property• k1(x’,x) = <k1(.,x), k2(.,x’)>*

• k2(x,x’) = <k2(.,x’),k1(.,x)>*

• hence, from symmetry of dot product <.,.>* we must have

k1(x’,x) = k2(x,x’)

and from symmetry of kernel it follows that

k1(x,x’) = k2(x,x’)

• since this holds for all x and x’, the kernels are the same

Page 20: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

20

kernels vs RKHSshowever, it is not clear that a kernel uniquely specifies the RKHS• there might be multiple feature transforms and dot-products that

are consistent with a kernel

to study this we need to introduce Mercer kernelsDefinition: a symmetric mapping k: X x X → R such that

is a Mercer kernel∞<∀

∫∫∫

dxxfxf

dxdyyfxfyxk2)()(

,0)()(),(

s.t.

(*)

Page 21: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

21

Mercer kernelswhy do we care about them? Two reasons1) Theorem: a kernel is positive definite (and dot-product)

if and only if it is a Mercer kernelproof that Mercer implies PD:• consider any sequence {x1,...,xn}, xi ∈ X. Let

• since , if k(x,y) is Mercer, it follows from (*) that

∫∫ ∑∫∫

−−=

dxdyxxxxwwyxk

dxdyyfxfyxk

jiij

ji )()(),(

)()(),(0

δδ

( ) ∞<−= ∑∑==

n

ii

n

iii wtsxxwxf

1

2

1.. ,)( δ

)( 2 ∞<∫ dxxf

Page 22: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

22

Mercer kernels

Kwwxxkww

dxdyxxxxyxkww

Tji

ijji

jiij

ji

==

−−≤

∫∫∑),(

)()(),(0 δδ

• since this holds for any w, k(x,y) is PD

Page 23: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

23

Mercer kernels

proof that PD implies Mercer:• suppose there is a g(x) s.t.

• consider a partition of X xX fine enough that

• hence

with u = (g(x1), ..., g(xn))T

• K is not PD and k(x,y) not a PD kernel• the Theorem follows by contradiction

∫∫ <= 0)()(),( εdxdyygxgyxk

2)()(),()()(),( ε

≤−∆∆ ∫∫∑ dxdyygxgyxkxgxgxxk yxij

jiji

0 0)()(),( <⇔<∆∆∑ Kuuxgxgxxk Tyx

ijjiji

Page 24: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

24

Mercer kernels2) they have the following property (without proof)Theorem: Let k: X x X → R be a Mercer kernel. Then, there exists an orthonormal set of functions

and a set of λi ≥ 0, such that

intuition: think of this as a 2D Fourier series (+Parseval)

∫∫∑∞

=

=

=

=

1

2

1

2

)()(),()

),()

iiii

ii

yxyxkii

dxdyyxki

φφλ

λ

ijji dxxx δφφ =∫ )()(

(**)

Page 25: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

25

Eigenfunctionsnote: if we define the operator

then

the functions φi(x) are the eigenfunctions of (Tf)(x)again, a connection to linear systems (LTI if k(x,y) = h(x-y))

( ) ∫= dyyfyxkxTf )(),()(

( )

)()()()(

)()()(

)(),()(

xdyyyx

dyyyx

dyyyxkxT

iik

ikkk

kikkk

ii

ij

φλφφφλ

φφφλ

φφ

δ

==

=

=

∑ ∫

∑ ∫∫

44 344 21

Page 26: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

26

Mercer kernelsthe eigenfunction decomposition gives us another way to design the feature transformation

where l2d is the space of vectors s.t. and d the number of non-zero eigenvalues λi

clearly

i.e. there is a vector space (l2d) other than H s.t. k(x,y) is a dot product in that space

)()(),( yxyxk T ΦΦ=

∞<∑i ia 2

( )Td

xxxlX

L),(),( :

2211

2

φλφλ→

→Φ

Page 27: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

27

The picturethis is a mapping from Rn to Rd

much more like to a multi-layer Perceptronthan beforethe kernelized Perceptronas a neural net

xx

x

x

x

xx

x

xx

x

x

oo

o

o

o

o oo

o oo

ox1

x3x2

xd

xx

x

x

x

xx

x

xx

x

x

oo

o

o

o

ooo

ooo

o

x1

x2

Φ

X l2dΦ(xi)

xi

x

Φ1(.) Φ2(.) Φd(.)

Page 28: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

28

In summaryReproducing kernel map

HK = ( )⎭⎬⎫

⎩⎨⎧

= ∑=

m

iii xkff

1

.,(.)|(.) α

( )∑∑= =

=m

iji

m

jji xxkgf

1

'

1*

',, βα

)(.,: xkx →Φ

Mercer kernel map

HM=⎭⎬⎫

⎩⎨⎧

∞<= ∑=

d

ii

d xxl1

22 |

gfgf T=*

,

( )Txxx L),(),(: 2211 φλφλ→Φ

where λi,φi are the eigenvaluesand eigenfunctions of k(x,y)

two very different pictures of what the kernel doesare the two spaces really that different?

Page 29: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

29

RK vs Mercer mapsnote that for HM we are writing

but, since the φi(.) are orthonormal, there is a 1-1 map

and we can write

hence k(.,x) maps x into M = span{φk(.)}

dexexxr

Lr

)()()( 11111 φλφλ ++=Φ

{ } (.)(.): 2

kkk

kd

espanl

φλφ

→→Γ

r

( ))(.,

(.))((.))()( 111

xkxxx ddd

=++=ΦΓ φφλφφλ Lo

from (**)

(***)

Page 30: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

30

RK vs Mercer mapsdefine the dot product in M so that

then {φk(.)} is a basis, M is a vector space, any function in M can be written as

and

i.e., k is a reproducing kernel on M

ijj

kjj

kj dxxx δλ

φφλ

φφ 1)()(1,** ∫ ==

( )xxfk

kk∑= φα)(

)()(,)(

)((.),(.))(.,(.),

**

1

****

xfxx

xxkf

iii

ijjijji

jjjj

iii

ijj

===

=

∑∑

∑∑

φαφφφλα

φφλφα

δλ

321

Page 31: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

31

RK vs Mercer mapsfurthermore, since k(.,x) ∈ M, any functions of the form

are in M and

note that f,g ∈ H and this is the dot product we had in H

( ) ( ) jj

jii

i xkgxkf .,(.).,(.) ∑∑ == βα

),()()(

(.)(.),)()(

)((.)),((.)

)(.,,)(.,,

**

**

****

jij

ijiij

jlill

lji

ijmljmil

lmmlji

ijjmm

mmill

llji

jjj

iii

xxkxx

xx

xx

xkxkgf

∑∑ ∑

∑ ∑

∑ ∑∑

∑∑

==

=

=

=

βαφφλβα

φφφφλλβα

φφλφφλβα

βα

Page 32: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

32

In summaryH ⊂ M and <.,.>* in H is the same as <.,.>** in MQuestion: is M ⊂ H ?• need to show that any H

• from (***),

and, for any sequence {x1, ..., xd},

• if there is an invertible P, then H and M ⊂ H

( ) ∈= ∑ xxfk

kkφα)(

⎥⎥⎥

⎢⎢⎢

⎥⎥⎥

⎢⎢⎢

⎡=

⎥⎥⎥

⎢⎢⎢

(.)

(.)

)()(

)()(

)(.,

)(., 1

11

11111

d

P

dddd

dd

d xx

xx

xk

xk

φ

φ

φλφλ

φλφλM

44444 344444 21

LMM

(.))((.))()(., 111 ddd xxxk φφλφφλ ++= L

( ) ∈= ∑ kk

kk .,xkx αφ )(

Page 33: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

33

In summarysince λi > 0

is invertible when Π is. If Π is not invertible, then

if there is no sequence for which Π is invertible then

the φk(x) cannot be orthonormal. Hence there must be invertible P and M ⊂ H.

⎥⎥⎥

⎢⎢⎢

⎥⎥⎥

⎢⎢⎢

⎡=

Π

d

i

ddd

d

xx

xxP

λλ

λ

φφ

φφ

0

0

)()(

)()( 1

1

111

M

4444 34444 21

LM

( ) 00 =≠∃∀ ∑ ik

kk xi φαα s.t.

( ) 00 =≠∃∀ ∑ xxk

kkφαα s.t.

Page 34: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

34

In summaryH = M and <.,.>* in H is the same as <.,.>** in M

the reproducing kernel map and the Mercer kernel map lead to the same RKHS

Reproducing kernel map

HK = ( )⎭⎬⎫

⎩⎨⎧

= ∑=

m

iii xkff

1

.,(.)|(.) α

( )∑∑= =

=m

iji

m

jji xxkgf

1

'

1*

',, βα

)(.,: xkxr →Φ

Mercer kernel map

HM=⎭⎬⎫

⎩⎨⎧

∞<= ∑=

d

ii

d xxl1

22 |

gfgf T=*

,

( )TM xxx L),(),(: 2211 φλφλ→Φ

rM Φ=ΦΓ o { } (.)(.): 2

kkk

kd

espanl

φλφ

→→Γ

r

Page 35: Reproducing kernel Hilbert spaces - SVCL · reproducing kernel Hilbert spaces to turn our pre-Hilbert space H into a Hilbert space we have to “complete it” with respect to the

35