A Generalization of Principal Component Analysis to the...

Preview:

Citation preview

A Generalization of PrincipalComponent Analysis to the

Exponential FamilyMichael Collins, Sanjoy Dasgupta, Robert E. Schapire

Presented by: Sameer AgarwalArtificial Intelligence Laboratory, UCSD

A Generalization of Principal Component Analysis to the Exponential Family – p.1/??

The shoe size problem

If you had only one variable to describe this data, whatwould it be ?

A Generalization of Principal Component Analysis to the Exponential Family – p.2/??

The shoe size problem

Fit a line to the data and take the projection of the dataonto it.

A Generalization of Principal Component Analysis to the Exponential Family – p.3/??

The PCA problem statement

Given a set of data points X = {x1,x2, . . . ,xn}, xi ∈ Rn

and an integer k < n. Find a k-dimensional subspace Sof R

n and the corresponding projectionsΘ = {θ1, θ2, . . . , θn} of X onto S, s.t.

i

‖xi − θi‖2 (1)

is minimized. Subject to

ΘΘT is diagonal.

A Generalization of Principal Component Analysis to the Exponential Family – p.4/??

The Solution

The subspace spanned by the k-largest eigenvectors ofthe matrix

C = (X −X)(X −X)T

A Generalization of Principal Component Analysis to the Exponential Family – p.5/??

The PCA Algorithm

PCA (X, k)

1. [U,D, V ] = svd(X −X)

2. P = V [:, 1 : k]

or,

PCA (X, k)

1. C = (X −X)(X −X)T

2. CV = ΛV

3. P = V [:, 1 : k]

A Generalization of Principal Component Analysis to the Exponential Family – p.6/??

PCA in pictures

Assume that data is an ellipsoid.

A Generalization of Principal Component Analysis to the Exponential Family – p.7/??

PCA in pictures

Find the major and minor axes of the best fittingellipsoid.

A Generalization of Principal Component Analysis to the Exponential Family – p.8/??

Why PCA?

1. Reduce the dimenionality of the data.

2. Optimal in terms of L2 norm.

3. Output dimensions have zero correlation.

4. Denoises the data.

A Generalization of Principal Component Analysis to the Exponential Family – p.9/??

The age of the subspace

Assumption : The underlying model generating theobservations lives in a low dimensional subspace.

X = AV (2)size(V, 1) < size(X, 2) (3)

PCA Rows of V are orthonormal.

ICA Rows of V are independent.

LDA Rows of V are such that there is maximumdiscrimination between classes.

and variants thereof.

A Generalization of Principal Component Analysis to the Exponential Family – p.10/??

A gaussian view of PCA

Take a Line.

A Generalization of Principal Component Analysis to the Exponential Family – p.11/??

A gaussian view of PCA

Sample points from it.

A Generalization of Principal Component Analysis to the Exponential Family – p.12/??

A gaussian view of PCA

Add a sprinkling of unit variance gaussian noise.

A Generalization of Principal Component Analysis to the Exponential Family – p.13/??

A gaussian view of PCA

Remove all trace of the original model.

A Generalization of Principal Component Analysis to the Exponential Family – p.14/??

A gaussian view of PCA

Recover the best fitting subspace in the MLE sense, andthe projection of the data onto this subspace.

A Generalization of Principal Component Analysis to the Exponential Family – p.15/??

to summarize

Each data point xi is a sample from a probabilitydistribution P (x; θi) of the form

P (x; θi) =1√2π

e(x−θi)

2

2

Find Θ = {θ1, θ2, . . . , θn}, s.t. the likelihood of the data

given the parameters Θ is maximized, subject to the con-

straint that Θ lies in a k-dimensional subspace.

A Generalization of Principal Component Analysis to the Exponential Family – p.16/??

But what if ..

You knew that all your noise was positive ?

A Generalization of Principal Component Analysis to the Exponential Family – p.17/??

Beyond the Gaussian

1. Guassian error is not suitable for every problem.

2. Integer valued data ∼ Poisson.

3. Positive valued data ∼ Exponential.

4. Binary valued data ∼ Bernoulli.

A Generalization of Principal Component Analysis to the Exponential Family – p.18/??

The Exponential Family

P (x|θ) = P0(x)ex·θ−G(θ)

log P (x|θ) = log P0(x) + x · θ −G(θ)

1. θ is the natural parameter.

2. G(θ) = ln∫

ex·θP0(x)dx is the Cumulant Function.

3. G(θ) is strictly convex.

A Generalization of Principal Component Analysis to the Exponential Family – p.19/??

Members of the Exponential family

Gaussian

P (x|µ) =1√2π

e(x−µ)2

2 =e−x2

√2π

exθ− θ2

2

θ = µ

Bernoulli

P (x|p) = px(1− p)(1−x) = 1exθ−log(1+eθ)

θ = logp

1− p

A Generalization of Principal Component Analysis to the Exponential Family – p.20/??

Problem Statement

Given data points X = {x1,x2, . . . ,xn} and a memberof the exponential family PG(x|θ), find a setΘ = {θ1, θ2, . . . , θn}, which maximizes the totallikelihood of the data.

Θ = argmaxΘ

L(PG,X,Θ)

= argmaxΘ

i

P (xi|θi)

A Generalization of Principal Component Analysis to the Exponential Family – p.21/??

Convex Sets

Convex Not Convex

x

y x

y

S ⊆ Rn is a convex set if

x, y ∈ S, p, q ≥ 0, p + q = 1 ⇒ pq + qy ∈ S

A Generalization of Principal Component Analysis to the Exponential Family – p.22/??

Convex Function

yx

p q

f (px + qy) ≤ pf(x) + qf(y)

p + q = 1

p, q ≥ 0

A Generalization of Principal Component Analysis to the Exponential Family – p.23/??

Convex Optimization Problems

minimize f0(x)

subject to fi(x) ≤ 0, i = 1, . . . ,m

Ax = b, A ∈ Rp×n

f0, f1, . . . , fm are convex.

Affine equality constraints.

Feasible set is convex.

A Generalization of Principal Component Analysis to the Exponential Family – p.24/??

and the big deal is ?

Theorem: For a convex optimization problem, anylocal solution is also a global solution.

x ∈ C, is locally optimal if it satisfies

y ∈ C, ‖y − x‖ ≤ R ⇒ f0(y) ≥ f0(x)

x ∈ C is globally optimal if

y ∈ C ⇒ f0(y) ≥ f0(x)

A Generalization of Principal Component Analysis to the Exponential Family – p.25/??

Proof

Suppose x is locally optimal, buty ∈ C, f0(y) < f0(x)

Let z = εy + (1− ε)x with a small ε > 0.

z is near x, with

f0(z) = f0(εy + (1− ε)x)

≤ εf0(y) + (1− ε)f0(x)

< f0(x)

A contradiction, since x is locally optimal.

A Generalization of Principal Component Analysis to the Exponential Family – p.26/??

Bregman divergences

q p

BF (p, q) = F (p)− F (q)− (p− q) · ∇F (q)

F is real, convex and differentiable.

A Generalization of Principal Component Analysis to the Exponential Family – p.27/??

Properties of Bregman divergences

1. BF (q, p) measures the convexity ofF . It measuresthe increase in F (q) over F (p) above linear growthwith slope ∇F (p).

2. BF (q, p) ≥ 0.

3. BF (q, p) = 0 ⇐⇒ p = q and F (p) is strictlyconvex.

A Generalization of Principal Component Analysis to the Exponential Family – p.28/??

Properties of the Exponential Family

Let ,

λG(θ;x) = log PG(x; θ) = log P0(x) + x · θ −G(θ)

and assume,Eθ[∇θλ(θ;x)] = 0

then,

1. ∇θλ(θ;x) = x−∇θG(θ)

2. Eθ[x] = µ = ∇θG(θ) = g(θ)

3. G stricly convex ⇒ g(θ) is invertible.

A Generalization of Principal Component Analysis to the Exponential Family – p.29/??

Properties of the Exponential Family

Define, g−1(µ) = θ, then we can define the dual of G(θ)

F (µ) = θ · µ−G(θ)

G(θ) strictly convex ⇒ F (µ) is strictly convex too. now,

BG(θ, θ̂) = G(θ) + F (µ̂)− θ · µ̂ = BF (µ̂, µ)

λ(θ;x) = log P0(x) + θ · x−G(θ)

λ(θ;x) = log P0(x) + F (x)−BF (x, g(θ))

A Generalization of Principal Component Analysis to the Exponential Family – p.30/??

ePCA

since for members of the exponential family,

− log P (x|θ) = − log P0(x)− F (x) + BF (x, g(θ))

we have

L(Θ) = −∑

ij

log PG(xij|θij)

= C(X) +∑

ij

BF (xij, g(θij))

= C(X) + BF (X, g(Θ))

Θpca = argminΘ

BF (X, g(Θ))

A Generalization of Principal Component Analysis to the Exponential Family – p.31/??

Subspace rank constraint

We want Θ to be constrained in a subspace of dimensionk. Hence,

Θ = V A

wheresize(V ) = [d, k]

size(A) = [k, n]

V is a set of basis vectors for the subspace containing Θ.A is the set of bregman projections of X onto V .The optimization problem now is

argminΘ

L(Θ) = argmin{V,A}

BF (X, g(V A))

A Generalization of Principal Component Analysis to the Exponential Family – p.32/??

Minimizing L(V, A): idea

1. Start Randomly

2. Hold V constant and minimize w.r.t A

3. Hold A constant and minimize w.r.t V

4. Repeat 2-3 until convergence.

A Generalization of Principal Component Analysis to the Exponential Family – p.33/??

The 1-d Algorithm

1. V = rand(d, 1)

2. A = rand(1, n)

3. until convergence

4. ati = argmin

a

j BF (xij, g(av(t−1)j ))

5. vtj = argmin

v

i BF (xij, g(ativ))

6. t = t + 1

A Generalization of Principal Component Analysis to the Exponential Family – p.34/??

An example : Gaussian Noise

ati = argmin

aBF (xij, g(av

(t−1)j ))

g(x) = x

BF (q, p) =(p− q)2

2

→ ati = argmin

a

[

j

(xij − av(t−1)j )2

2

]

Differentiating and equating to zero we get∑

j

(xij − av(t−1)j )v

(t−1)j = 0

A Generalization of Principal Component Analysis to the Exponential Family – p.35/??

An example (contd.)

Differentiating and equating to zero we get

a∑

j

v(t−1)j

2=

j

xijv(t−1)j

a‖V (t−1)‖2 = Xi · V (t−1)

→ ati =

Xi · V (t−1)

‖V (t−1)‖2

→ At =X · V (t−1)

‖V (t−1)‖2

A Generalization of Principal Component Analysis to the Exponential Family – p.36/??

An example (contd.)

Similarly

V t =(At)

TX

‖At‖2

combining the two we get

V t = cV (t−1)XTX

equivalent to the power method of calculating the largest

eigenvector.

A Generalization of Principal Component Analysis to the Exponential Family – p.37/??

Comments

1. Each step of the algorithm solves a convexoptimization problem.

2. The over all optimization problem is not convex.

3. Convergence to global optima is difficult to prove ingeneral.

4. Gaussian is a special case, where the power methodconverges to the global optima.

A Generalization of Principal Component Analysis to the Exponential Family – p.38/??

Avoiding Infinity

Problem : A local optima may exist at infnity.Solution : Penalize large movements away from a fixedpoint in the range of g(θ).

L′(Θ) =∑

ij

[BF (xij , g(θij)) + εBF (µ0, g(θij))]

ε is a small positive constant. µ0 is an arbitrary point inthe range of g.

The alternating minimization procedure is guaranteed to

find a bounded solution to L′(Θ).

A Generalization of Principal Component Analysis to the Exponential Family – p.39/??

The General Algorithm: code

1. A = 0, V = 0

2. For n = 1, . . . , N, c = 1, . . . , l

3. Initialize v0c randomly.

4. sij =∑

k 6=c aikvkj

5. until convergence

6. i = 1, . . . , n

7. atic = argmin

a

j BF (xij, g(av(t−1)cj + sij))

8. j = 1, . . . , d

9. vtcj = argmin

v

j BF (xij, g(aticv + sij) + sij)

A Generalization of Principal Component Analysis to the Exponential Family – p.40/??

Hindsight and GLMs

Generalized Linear Models are extensions ofstandard regression

y = Ax + ε

where ε is a zero mean, constant variance normal.They generalize the model to

y = g(Ax) + η

here, g is a real differentiable function known asthe link function.η is distributed according to a fixed member ofthe exponential family.

A Generalization of Principal Component Analysis to the Exponential Family – p.41/??

Related Work

Hoffman et. al Probabilistc Latent SemanticIndexing

Seung et. al Non-negative Matrix Factorization

Rely on explicit constraints on A and V .

A Generalization of Principal Component Analysis to the Exponential Family – p.42/??

Summary

Interpret PCA using a generative model.

Extend the model to the exponential family.

Convert the Log Likelihood into a Bregmandivergence.

Minimize the divergence using an alternatingminimization procedure.

A Generalization of Principal Component Analysis to the Exponential Family – p.43/??

References

Collins and Schapire A Generalization of Principal Component

Analsysis to the Exponential Family

Tipping, M. E. & Bishop C. M. Probabilistic Principal

Component Analysis.

Azoury K. S. & Warmuth M. K. Relative loss bounds foron-line density estimation with the exponential family ofdistributions.

A Generalization of Principal Component Analysis to the Exponential Family – p.44/??

Acknowledgements

Sanjoy Dasgupta for answering last minutequestions.

The music of Nusrat Fateh Ali Khan for keeping mecompany.

A Generalization of Principal Component Analysis to the Exponential Family – p.45/??

Recommended