45
On the Relation Between Universality, Characteristic Kernels and RKHS Embedding of Measures Bharath K. Sriperumbudur , Kenji Fukumizu and Gert R. G. Lanckriet UC San Diego The Institute of Statistical Mathematics AISTATS 2010

On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

On the Relation Between Universality,Characteristic Kernels and RKHS Embedding of

Measures

Bharath K. Sriperumbudur⋆, Kenji Fukumizu† andGert R. G. Lanckriet⋆

⋆UC San Diego †The Institute of Statistical Mathematics

AISTATS 2010

Page 2: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Outline

RKHS embedding of probability measures

Characteristic kernels

Universal kernels

Various notions of universality

Novel characterization of universality

Relation to RKHS embedding of signed measures

Page 3: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

RKHS Embedding of Probability Measures

Input space : X

Feature space : H (with reproducing kernel, k)

Feature map : Φ

Φ : X → H x 7→ Φ(x) := k(·, x)

Page 4: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

RKHS Embedding of Probability Measures

Input space : X

Feature space : H (with reproducing kernel, k)

Feature map : Φ

Φ : X → H x 7→ Φ(x) := k(·, x)

Extension to probability measures:

P 7→ Φ(P) :=

X

k(·, x) dP(x)

Page 5: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

RKHS Embeddings of Probability Measures

Input space : X

Feature space : H (with reproducing kernel, k)

Feature map : Φ

Φ : X → H x 7→ Φ(x) := k(·, x)

Extension to probability measures:

P 7→ Φ(P) :=

X

k(·, x) dP(x)

︸ ︷︷ ︸EY∼P[Φ(Y )]=EY∼P[k(·,Y )]

Page 6: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

RKHS Embeddings of Probability Measures

Input space : X

Feature space : H (with reproducing kernel, k)

Feature map : Φ

Φ : X → H x 7→ Φ(x) := k(·, x)

Extension to probability measures:

P 7→ Φ(P) :=

X

k(·, x) dP(x)

Advantage: Φ(P) can distinguish P by high-order moments.

k(y , x) = c0 + c1(xy) + c2(xy)2 + · · · (ci 6= 0) e.g. k(y , x) = exy

Φ(P)(y) = c0 + c1

(∫

X

x dP(x)

)y + c2

(∫

X

x2 dP(x)

)y2 + · · ·

Page 7: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Applications

Two-sample problem:

Given random samples X1, . . . ,Xm and Y1, . . . ,Yn drawn i.i.d.from P and Q, respectively.

Determine: are P and Q different?

Page 8: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Applications

Two-sample problem:

Given random samples X1, . . . ,Xm and Y1, . . . ,Yn drawn i.i.d.from P and Q, respectively.

Determine: are P and Q different?

γ(P,Q) = ‖Φ(P)− Φ(Q)‖H : distance metric between P and Q.

H0 : P = Q H0 : γ(P,Q) = 0≡

H1 : P 6= Q H1 : γ(P,Q) > 0

Test: Say H0 if γ(P,Q) < ε. Otherwise say H1.

Page 9: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Applications

Two-sample problem:

Given random samples X1, . . . ,Xm and Y1, . . . ,Yn drawn i.i.d.from P and Q, respectively.

Determine: are P and Q different?

γ(P,Q) = ‖Φ(P)− Φ(Q)‖H : distance metric between P and Q.

H0 : P = Q H0 : γ(P,Q) = 0≡

H1 : P 6= Q H1 : γ(P,Q) > 0

Test: Say H0 if γ(P,Q) < ε. Otherwise say H1.

Other applications:

Hypothesis testing : Independence test, Goodness of fit test, etc.

Feature selection, message passing, density estimation, etc.

Page 10: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Characteristic Kernels

Define: k is characteristic if

P 7→

X

k(·, x) dP(x) is injective.

In other words,∫

X

k(·, x) dP(x) =

X

k(·, x) dQ(x) ⇔ P = Q.

When k(·, x) = e√−1〈·,x〉, Φ(P) is the characteristic function of P.

Page 11: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Characteristic Kernels

Define: k is characteristic if

P 7→

X

k(·, x) dP(x) is injective.

In other words,∫

X

k(·, x) dP(x) =

X

k(·, x) dQ(x) ⇔ P = Q.

When k(·, x) = e√−1〈·,x〉, Φ(P) is the characteristic function of P.

Not all kernels are characteristic, e.g., k(x , y) = xT y .

µP = µQ ; P = Q

When is k characteristic? [Gretton et al., 2007,Sriperumbudur et al., 2008, Fukumizu et al., 2008,Fukumizu et al., 2009, Sriperumbudur et al., 2009].

Page 12: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Universal Kernels Regularization approach to supervised learning

minf∈H

1

n

n∑

i=1

ℓ(f (xi ), yi ) + λΩ[f ], (1)

where λ > 0 and (xi , yi )ni=1 is the training data.

Page 13: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Universal Kernels Regularization approach to supervised learning

minf∈H

1

n

n∑

i=1

ℓ(f (xi ), yi ) + λΩ[f ], (1)

where λ > 0 and (xi , yi )ni=1 is the training data.

Representer theorem : The solution to (1) is of the form

f =

n∑

i=1

cik(·, xi ),

where cini=1 ⊂ R are the parameters typically obtained from the

training data.

Page 14: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Universal Kernels Regularization approach to supervised learning

minf∈H

1

n

n∑

i=1

ℓ(f (xi ), yi ) + λΩ[f ], (1)

where λ > 0 and (xi , yi )ni=1 is the training data.

Representer theorem : The solution to (1) is of the form

f =

n∑

i=1

cik(·, xi ),

where cini=1 ⊂ R are the parameters typically obtained from the

training data.

Question: Can f approximate any target function arbitrarily “well”as n → ∞?

Page 15: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Universal Kernels Regularization approach to supervised learning

minf∈H

1

n

n∑

i=1

ℓ(f (xi ), yi ) + λΩ[f ], (1)

where λ > 0 and (xi , yi )ni=1 is the training data.

Representer theorem : The solution to (1) is of the form

f =

n∑

i=1

cik(·, xi ),

where cini=1 ⊂ R are the parameters typically obtained from the

training data.

Question: Can f approximate any target function arbitrarily “well”as n → ∞?

We need H to be “dense” in the space of target functions — k isuniversal.

Page 16: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Various Notions of Universality

Prior work

c-universality [Steinwart, 2001]

cc-universality [Micchelli et al., 2006]

Proposed notion: c0-universality

Characterization of c-, cc- and c0-universality : Relation to RKHSembedding of measures

Translation invariant kernels on Rd

Radial kernels on Rd

Page 17: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

c-universality [Steinwart, 2001]

X : compact metric space

k : continuous on X × X

Target function space : C (X ), continuous functions on X

Define k to be c-universal if H is dense in C (X ) w.r.t. the uniform norm(‖f ‖u := supx∈X |f (x)|).

Page 18: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

c-universality [Steinwart, 2001]

X : compact metric space

k : continuous on X × X

Target function space : C (X ), continuous functions on X

Define k to be c-universal if H is dense in C (X ) w.r.t. the uniform norm(‖f ‖u := supx∈X |f (x)|).

Sufficient conditions are obtained based on the Stone-Weierstraßtheorem. Not easy to check!

Examples: Gaussian and Laplacian kernels on any compact subset ofRd .

Page 19: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

c-universality [Steinwart, 2001]

X : compact metric space

k : continuous on X × X

Target function space : C (X ), continuous functions on X

Define k to be c-universal if H is dense in C (X ) w.r.t. the uniform norm(‖f ‖u := supx∈X |f (x)|).

Sufficient conditions are obtained based on the Stone-Weierstraßtheorem. Not easy to check!

Examples: Gaussian and Laplacian kernels on any compact subset ofRd .

Issue: X is compact which excludes many interesting spaces, such as Rd .

Page 20: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

cc-universality [Micchelli et al., 2006]

X : Hausdorff space

k : continuous on X × X

Target function space : C (X )

Define k to be cc-universal if H is dense in C (X ) endowed with thetopology of compact convergence.

Page 21: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

cc-universality [Micchelli et al., 2006]

X : Hausdorff space

k : continuous on X × X

Target function space : C (X )

Define k to be cc-universal if H is dense in C (X ) endowed with thetopology of compact convergence.

In other words, for any compact set Z ⊂ X , H|Z := f|Z : f ∈ H isdense in C (Z ) w.r.t. ‖ · ‖u.

Page 22: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

cc-universality [Micchelli et al., 2006]

X : Hausdorff space

k : continuous on X × X

Target function space : C (X )

Define k to be cc-universal if H is dense in C (X ) endowed with thetopology of compact convergence.

Necessary and sufficient conditions are obtained, which are relatedto the injectivity of RKHS embedding of measures.

Examples: Gaussian, Laplacian and Sinc kernels on Rd .

Page 23: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

cc-universality [Micchelli et al., 2006]

X : Hausdorff space

k : continuous on X × X

Target function space : C (X )

Define k to be cc-universal if H is dense in C (X ) endowed with thetopology of compact convergence.

Necessary and sufficient conditions are obtained, which are relatedto the injectivity of RKHS embedding of measures.

Examples: Gaussian, Laplacian and Sinc kernels on Rd .

Issue: Topology of compact convergence is weaker than the topology ofuniform convergence.

Page 24: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Proposed Notion: c0-universality

X : locally compact Hausdorff (LCH) space

Target function space : C0(X ), the space of bounded continuousfunctions that “vanish at infinity” (for every ǫ > 0,x ∈ X : |f (x)| ≥ ǫ is compact).

k is bounded and k(·, x) ∈ C0(X ) for all x ∈ X .

Page 25: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Proposed Notion: c0-universality

X : locally compact Hausdorff (LCH) space

Target function space : C0(X ), the space of bounded continuousfunctions that “vanish at infinity” (for every ǫ > 0,x ∈ X : |f (x)| ≥ ǫ is compact).

k is bounded and k(·, x) ∈ C0(X ) for all x ∈ X .

Define k to be c0-universal if H is dense in C0(X ) w.r.t. ‖ · ‖u.

Handles non-compact X and ensures uniform convergence overentire X .

Page 26: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Embedding Characterization of UniversalityTheorem

k is c0-universal if and only if

µ 7→

X

k(·, x) dµ(x), µ ∈ Mb(X ),

is injective. Mb(X ) is the space of finite signed Radon measures onX .

Page 27: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Embedding Characterization of UniversalityTheorem

k is c0-universal if and only if

µ 7→

X

k(·, x) dµ(x), µ ∈ Mb(X ),

is injective. Mb(X ) is the space of finite signed Radon measures onX .

k is cc-universal if and only if

µ 7→

X

k(·, x) dµ(x), µ ∈ Mbc(X ),

is injective. Mbc(X ) = µ ∈ Mb(X ) | supp(µ) is compact.

Page 28: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Embedding Characterization of UniversalityTheorem

k is c0-universal if and only if

µ 7→

X

k(·, x) dµ(x), µ ∈ Mb(X ),

is injective. Mb(X ) is the space of finite signed Radon measures onX .

k is cc-universal if and only if

µ 7→

X

k(·, x) dµ(x), µ ∈ Mbc(X ),

is injective. Mbc(X ) = µ ∈ Mb(X ) | supp(µ) is compact.

k is c-universal if and only if

µ 7→

X

k(·, x) dµ(x), µ ∈ Mb(X ),

is injective.

Page 29: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Postive Definite Characterization of Universality

Theorem

k is c0-universal (resp. c-universal) if and only if

X

X

k(x , y) dµ(x) dµ(y) > 0, ∀µ ∈ Mb(X )\0.

Page 30: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Postive Definite Characterization of Universality

Theorem

k is c0-universal (resp. c-universal) if and only if

X

X

k(x , y) dµ(x) dµ(y) > 0, ∀µ ∈ Mb(X )\0.

k is cc-universal if and only if

X

X

k(x , y) dµ(x) dµ(y) > 0, ∀µ ∈ Mbc(X )\0.

Page 31: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Postive Definite Characterization of Universality

Theorem

k is c0-universal (resp. c-universal) if and only if

X

X

k(x , y) dµ(x) dµ(y) > 0, ∀µ ∈ Mb(X )\0.

k is cc-universal if and only if

X

X

k(x , y) dµ(x) dµ(y) > 0, ∀µ ∈ Mbc(X )\0.

If k is c-, cc- or c0-universal, then it is strictly positive definite.

Page 32: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

X is an LCH space: Summary

Page 33: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to
Page 34: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Translation Invariant Kernels on Rd

X = Rd and k(x , y) = ψ(x − y), where

ψ(x) =

Rd

e√−1xTω dΛ(ω), x ∈ Rd ,

and Λ is a non-negative finite Borel measure.

Page 35: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Translation Invariant Kernels on Rd

X = Rd and k(x , y) = ψ(x − y), where

ψ(x) =

Rd

e√−1xTω dΛ(ω), x ∈ Rd ,

and Λ is a non-negative finite Borel measure.

Theorem

k is c0-universal if and only if supp(Λ) = Rd .

k is c0-universal if and only if it is characteristic.

If supp(Λ) has a non-empty interior, then k is cc-universal.[Micchelli et al., 2006]

Page 36: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Examples

Gaussian kernel: ψ(x) = e−x2/2σ2

; Ψ(ω) = σe−σ2ω2/2; dΛ(ω) = Ψ(ω) dω.

−4 −3 −2 −1 0 1 2 3 40

1

x

ψ(x

)

−4 −3 −2 −1 0 1 2 3 40

σ

Ψ(ω

)

Laplacian kernel: ψ(x) = e−σ|x|; Ψ(ω) =√

σσ2+ω2 .

−4 −3 −2 −1 0 1 2 3 40

1

x

ψ(x

)

−4 −3 −2 −1 0 1 2 3 40

(2/πσ2)1/2

Ψ(ω

)

Page 37: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Examples

B1-spline kernel: ψ(x) = (1− |x |)1[−1,1](x); Ψ(ω) = 2√

2√π

sin2(ω2)

ω2 .

−3 −2 −1 0 1 2 30

1

x

ψ(x

)

0

0.2

0.4

0.6

0.8

1

−8π −6π −4π −2π 0 2π 4π 6π 8π

(2π)

1/2 Ψ

(ω)

Sinc kernel: ψ(x) = sin(σx)x

; Ψ(ω) =√

π21[−σ,σ](ω).

−0.2

0

0.2

0.4

0.6

0.8

1

−6π −5π −4π −3π −2π −π 0 π 2π 3π 4π 5π 6π

ψ(x

)

Ψ(ω

)

−3 −2 −1 0 1 2 30

(π/2)1/2

Page 38: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Translation Invariant Kernels on Rd : Summary

Page 39: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to
Page 40: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Radial Kernels on Rd

Let

k(x , y) =

[0,∞)

e−t‖x−y‖22 dν(t),

where ν is a finite non-negative Borel measure on [0,∞).

Examples: Gaussian kernel, Inverse multi-quadratic kernel,k(x , y) = (c2 + ‖x − y‖22)

−β , β > d2 , c > 0, etc.

Page 41: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Radial Kernels on Rd

Let

k(x , y) =

[0,∞)

e−t‖x−y‖22 dν(t),

where ν is a finite non-negative Borel measure on [0,∞).

Examples: Gaussian kernel, Inverse multi-quadratic kernel,k(x , y) = (c2 + ‖x − y‖22)

−β , β > d2 , c > 0, etc.

TheoremThe following conditions are equivalent.

supp(ν) 6= 0.

k is c0-universal.

k is cc-universal.

k is characteristic.

k is strictly pd.

Page 42: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Radial Kernels on Rd : Summary

Page 43: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to
Page 44: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

Summary

Characteristic kernel

Injective RKHS embedding of probability measures.

Applications: Hypothesis testing, feature selection, etc.

Universal kernel

Consistency of learning algorithms.

Injective RKHS embedding of finite signed Radon measures.

Clarified the relation between various notions of universality andcharacteristic kernels.

Page 45: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to

References

Fukumizu, K., Gretton, A., Sun, X., and Scholkopf, B. (2008).Kernel measures of conditional dependence.In Platt, J., Koller, D., Singer, Y., and Roweis, S., editors, Advances in Neural Information Processing Systems 20, pages 489–496,Cambridge, MA. MIT Press.

Fukumizu, K., Sriperumbudur, B. K., Gretton, A., and Scholkopf, B. (2009).Characteristic kernels on groups and semigroups.In Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 21, pages473–480.

Gretton, A., Borgwardt, K. M., Rasch, M., Scholkopf, B., and Smola, A. (2007).A kernel method for the two sample problem.In Scholkopf, B., Platt, J., and Hoffman, T., editors, Advances in Neural Information Processing Systems 19, pages 513–520. MITPress.

Micchelli, C. A., Xu, Y., and Zhang, H. (2006).Universal kernels.Journal of Machine Learning Research, 7:2651–2667.

Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Lanckriet, G. R. G., and Scholkopf, B. (2009).Kernel choice and classifiability for RKHS embeddings of probability distributions.In Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C. K. I., and Culotta, A., editors, Advances in Neural Information Processing

Systems 22, pages 1750–1758. MIT Press.

Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Lanckriet, G. R. G., and Scholkopf, B. (2008).Injective Hilbert space embeddings of probability measures.In Servedio, R. and Zhang, T., editors, Proc. of the 21st Annual Conference on Learning Theory, pages 111–122.

Steinwart, I. (2001).On the influence of the kernel on the consistency of support vector machines.Journal of Machine Learning Research, 2:67–93.