Volodymyr Kuleshovú Arun Tejasvi Chagantyú …kuleshov/papers/aistats2015...Tensor Factorization...

Preview:

Citation preview

Tensor Factorization via Matrix Factorization

Volodymyr Kuleshovú

Arun Tejasvi Chagantyú

Percy Liang

Stanford University

May 11, 2015

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 1 / 28

Introduction: tensor factorization

An application: community detection

Anandkumar, Ge, Hsu, andS. Kakade 2013

a b

c d

? ?

? ?

= + + · · · +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28

Introduction: tensor factorization

An application: community detection

Anandkumar, Ge, Hsu, andS. Kakade 2013

a b

c d

? ?

? ?

= + + · · · +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28

Introduction: tensor factorization

An application: community detection

Anandkumar, Ge, Hsu, andS. Kakade 2013

a b

c d

? ?

? ?

= + + · · · +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28

Introduction: tensor factorization

An application: community detection

Anandkumar, Ge, Hsu, andS. Kakade 2013

a b

c d

? ?

? ?

= + + · · · +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28

Introduction: tensor factorization

An application: community detectionAnandkumar, Ge, Hsu, and

S. Kakade 2013

a b

c d

? ?

? ?

= + + · · · +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28

Introduction: tensor factorization

Applications of tensor factorizationI Community detection

I Anandkumar, Ge, Hsu, and S. Kakade 2013I Parsing

I Cohen, Satta, and Collins 2013I Knowledge base completion

I Chang et al. 2014I Singh, Rocktaschel, and Riedel 2015

I Topic modellingI Anandkumar, Foster, et al. 2012

I CrowdsourcingI Zhang et al. 2014

I Mixture modelsI Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013

I Bottlenecked modelsI Chaganty and Liang 2014

I . . .Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 3 / 28

Introduction: tensor factorization

What is tensor (CP) factorization?I Tensor analogue of matrix eigen-decomposition.

M =kÿ

i=1fi

i

ui

¢ ui

.

I Goal: Given T with noise, ‘R, recover factors ui

.

= + + · · · +

k

= + + · · · +

+

k

Orth

ogon

al

Non-

ortho

gona

l

= + + · · · + +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 4 / 28

Introduction: tensor factorization

What is tensor (CP) factorization?I Tensor analogue of matrix eigen-decomposition.

T =kÿ

i=1fi

i

ui

¢ ui

¢ui

.

I Goal: Given T with noise, ‘R, recover factors ui

.

= + + · · · +

k

= + + · · · +

+

k

Orth

ogon

al

Non-

ortho

gona

l

= + + · · · + +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 4 / 28

Introduction: tensor factorization

What is tensor (CP) factorization?I Tensor analogue of matrix eigen-decomposition.

‚T =kÿ

i=1fi

i

ui

¢ ui

¢ui

+‘R.

I Goal: Given T with noise, ‘R, recover factors ui

.

= + + · · · +

k

= + + · · · + +

k

Orth

ogon

al

Non-

ortho

gona

l

= + + · · · + +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 4 / 28

Introduction: tensor factorization

What is tensor (CP) factorization?I Tensor analogue of matrix eigen-decomposition.

‚T =kÿ

i=1fi

i

ui

¢ ui

¢ui

+‘R.

I Goal: Given T with noise, ‘R, recover factors ui

.

= + + · · · +

k

= + + · · · + +

k

Orth

ogon

al

Non-

ortho

gona

l

= + + · · · + +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 4 / 28

Introduction: tensor factorization

Existing tensor factorization algorithms

I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al.2013)

I Analog of matrix power method.I Sensitive to noise.I Restricted to orthogonal tensors.

I Alternating least squares (Comon, Luciani, and Almeida 2009;Anandkumar, Ge, and Janzamin 2014)

I Sensitive to initialization.

Our approach: reduce to existing fast and robust matrixalgorithms.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 5 / 28

Introduction: tensor factorization

Existing tensor factorization algorithms

I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al.2013)

I Analog of matrix power method.I Sensitive to noise.I Restricted to orthogonal tensors.

I Alternating least squares (Comon, Luciani, and Almeida 2009;Anandkumar, Ge, and Janzamin 2014)

I Sensitive to initialization.

Our approach: reduce to existing fast and robust matrixalgorithms.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 5 / 28

Introduction: tensor factorization

Existing tensor factorization algorithms

I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al.2013)

I Analog of matrix power method.I Sensitive to noise.I Restricted to orthogonal tensors.

I Alternating least squares (Comon, Luciani, and Almeida 2009;Anandkumar, Ge, and Janzamin 2014)

I Sensitive to initialization.

Our approach: reduce to existing fast and robust matrixalgorithms.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 5 / 28

Orthogonal Tensor factorization

Outline

Introduction: tensor factorization

Orthogonal Tensor factorizationProjections

Non-orthogonal tensor factorization

Related work

Empirical results

Conclusions

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 6 / 28

Orthogonal Tensor factorization

Tensor factorization via single matrix factorization

T = fi1u¢31 + fi2u¢3

2 + fi3u¢33 + ‘R

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 7 / 28

Orthogonal Tensor factorization

Tensor factorization via single matrix factorization

T = u¢31 + u¢3

1 + u¢31

¿

T (I, I, w) = (w€u1)u¢21 + (w€u2)u¢2

2 + (w€u3)u¢23

I Proposal: Eigen-decomposition on the projected matrix.I Return: recovered eigenvectors, u

i

.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 8 / 28

Orthogonal Tensor factorization

Tensor factorization via single matrix factorization

T = u¢31 + u¢3

1 + u¢31

¿

T (I, I, w) = (w€u1)u¢21 + (w€u2)u¢2

2 + (w€u3)u¢23

I Proposal: Eigen-decomposition on the projected matrix.I Return: recovered eigenvectors, u

i

.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 8 / 28

Orthogonal Tensor factorization

Tensor factorization via single matrix factorization

T = u¢31 + u¢3

1 + u¢31

¿

T (I, I, w) = (w€u1)¸ ˚˙ ˝

⁄1

u¢21 + (w€u2)

¸ ˚˙ ˝⁄2

u¢22 + (w€u3)

¸ ˚˙ ˝⁄3

u¢23

I Proposal: Eigen-decomposition on the projected matrix.I Return: recovered eigenvectors, u

i

.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 8 / 28

Orthogonal Tensor factorization

Sensitivity of single matrix projectionI Problem: Eigendecomposition is very sensitive to the eigengap.

error in factors à 1min(di�erence in eigenvalues) .

I Intuition: If two eigenvalues are equal, corresponding eigenvectorsare arbitrary.

= + ++ +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 9 / 28

Orthogonal Tensor factorization

Sensitivity of single matrix projectionI Problem: Eigendecomposition is very sensitive to the eigengap.

error in factors à 1min(di�erence in eigenvalues) .

I Intuition: If two eigenvalues are equal, corresponding eigenvectorsare arbitrary.

= + +

+ +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 9 / 28

Orthogonal Tensor factorization

Sensitivity of single matrix projectionI Problem: Eigendecomposition is very sensitive to the eigengap.

error in factors à 1min(di�erence in eigenvalues) .

I Intuition: If two eigenvalues are equal, corresponding eigenvectorsare arbitrary.

=

+ +

+ +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 9 / 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

error in factors Ã1

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

error in factors Ã1

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

error in factors Ã1

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

error in factors Ã1

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

error in factors Ã1

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

error in factors Ã1

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

error in factors Ã1

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

error in factors Ã1

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

error in factors Ã1

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

error in factors Ã1

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

error in factors Ã1

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

error in factors Ã1

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

Orthogonal Tensor factorization

Reduction to simultaneous diagonalization

T (I, I, w1)¸ ˚˙ ˝

M1

= (w€1 u1)

¸ ˚˙ ˝⁄11

u1u€1 + (w€

1 u2)¸ ˚˙ ˝

⁄21

u2u€2 + (w€

1 u3)¸ ˚˙ ˝

⁄31

u3u€3

......

......

T (I, I, w¸)¸ ˚˙ ˝M¸

= (w€¸ u1)

¸ ˚˙ ˝⁄1¸

u1u€1 + (w€

¸ u2)¸ ˚˙ ˝

⁄2¸

u2u€2 + (w€

¸ u3)¸ ˚˙ ˝

⁄3¸

u3u€3

I Projections share factors: can be simultaneously diagonalized.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 11 / 28

Orthogonal Tensor factorization

Reduction to simultaneous diagonalization

T (I, I, w1)¸ ˚˙ ˝

M1

= (w€1 u1)

¸ ˚˙ ˝⁄11

u1u€1 + (w€

1 u2)¸ ˚˙ ˝

⁄21

u2u€2 + (w€

1 u3)¸ ˚˙ ˝

⁄31

u3u€3

......

......

T (I, I, w¸)¸ ˚˙ ˝M¸

= (w€¸ u1)

¸ ˚˙ ˝⁄1¸

u1u€1 + (w€

¸ u2)¸ ˚˙ ˝

⁄2¸

u2u€2 + (w€

¸ u3)¸ ˚˙ ˝

⁄3¸

u3u€3

I Projections share factors: can be simultaneously diagonalized.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 11 / 28

Orthogonal Tensor factorization

Reduction to simultaneous diagonalization

T (I, I, w1)¸ ˚˙ ˝

M1

= (w€1 u1)

¸ ˚˙ ˝⁄11

u1u€1 + (w€

1 u2)¸ ˚˙ ˝

⁄21

u2u€2 + (w€

1 u3)¸ ˚˙ ˝

⁄31

u3u€3

......

......

T (I, I, w¸)¸ ˚˙ ˝M¸

= (w€¸ u1)

¸ ˚˙ ˝⁄1¸

u1u€1 + (w€

¸ u2)¸ ˚˙ ˝

⁄2¸

u2u€2 + (w€

¸ u3)¸ ˚˙ ˝

⁄3¸

u3u€3

I Projections share factors: can be simultaneously diagonalized.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 11 / 28

Orthogonal Tensor factorization

Simultaneous diagonalization algorithm

I Algorithm: Simultaneously diagonalize projected matrices.

‚U = arg minU:UU

€=I

Lÿ

¸=1o�(U€M¸U) o�(A) =

ÿ

i ”=j

A2ij

.

I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).

I Used widly in the ICA community.I Empirically appears to be generically global convergence.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28

Orthogonal Tensor factorization

Simultaneous diagonalization algorithm

I Algorithm: Simultaneously diagonalize projected matrices.

‚U = arg minU:UU

€=I

Lÿ

¸=1o�(U€M¸U) o�(A) =

ÿ

i ”=j

A2ij

.

I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).

I Used widly in the ICA community.I Empirically appears to be generically global convergence.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28

Orthogonal Tensor factorization

Simultaneous diagonalization algorithm

I Algorithm: Simultaneously diagonalize projected matrices.

‚U = arg minU:UU

€=I

Lÿ

¸=1o�(U€M¸U) o�(A) =

ÿ

i ”=j

A2ij

.

I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).

I Used widly in the ICA community.I Empirically appears to be generically global convergence.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28

Orthogonal Tensor factorization

Simultaneous diagonalization algorithm

I Algorithm: Simultaneously diagonalize projected matrices.

‚U = arg minU:UU

€=I

Lÿ

¸=1o�(U€M¸U) o�(A) =

ÿ

i ”=j

A2ij

.

I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).

I Used widly in the ICA community.I Empirically appears to be generically global convergence.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28

Orthogonal Tensor factorization

Simultaneous diagonalization algorithm

I Algorithm: Simultaneously diagonalize projected matrices.

‚U = arg minU:UU

€=I

Lÿ

¸=1o�(U€M¸U) o�(A) =

ÿ

i ”=j

A2ij

.

I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).I Used widly in the ICA community.I Empirically appears to be generically global convergence.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28

Orthogonal Tensor factorization Projections

Outline

Introduction: tensor factorization

Orthogonal Tensor factorizationProjections

Non-orthogonal tensor factorization

Related work

Empirical results

Conclusions

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 13 / 28

Orthogonal Tensor factorization Projections

Oracle and random projections

I Hypothetically: “oracle”projections along the factorsis good.

I Practically: use projectionsalong random directions.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 14 / 28

Orthogonal Tensor factorization Projections

Oracle and random projections

I Hypothetically: “oracle”projections along the factorsis good.

I Practically: use projectionsalong random directions.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 14 / 28

Orthogonal Tensor factorization Projections

Results: Orthogonal tensor decomposition

‚T =kÿ

i=1fi

i

u¢3i

+ ‘R.

Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,

error in factors Æ O

Q

ccca

ÎfiÎ1fimax

fi2min

dL

R

dddb ‘

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 15 / 28

Orthogonal Tensor factorization Projections

Results: Orthogonal tensor decomposition

‚T =kÿ

i=1fi

i

u¢3i

+ ‘R.

Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,

error in factors Æ O

Q

ccca

ÎfiÎ1fimax

fi2min

dL

R

dddb ‘

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 15 / 28

Orthogonal Tensor factorization Projections

Results: Orthogonal tensor decomposition

‚T =kÿ

i=1fi

i

u¢3i

+ ‘R.

Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,

error in factors Æ O

Q

ccca

ÎfiÎ1fimax

fi2min¸ ˚˙ ˝

oracle error

dL

R

dddb ‘

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 15 / 28

Orthogonal Tensor factorization Projections

Results: Orthogonal tensor decomposition

‚T =kÿ

i=1fi

i

u¢3i

+ ‘R.

Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,

error in factors Æ O

Q

ccca

ÎfiÎ1fimax

fi2min¸ ˚˙ ˝

oracle error

dL

¸˚˙˝conc. term

R

dddb ‘

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 15 / 28

Orthogonal Tensor factorization Projections

Empirical: Random vs. oracle projections

0 10 20 30 40 50 601umEer of projections

0.000

0.002

0.004

0.006

0.008

0.010

Err

or

2rthogonaO case

Figure: Comparing random vs. oracle projections (d = k = 10, ‘ = 0.05)

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 16 / 28

Non-orthogonal tensor factorization

Outline

Introduction: tensor factorization

Orthogonal Tensor factorizationProjections

Non-orthogonal tensor factorization

Related work

Empirical results

Conclusions

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 17 / 28

Non-orthogonal tensor factorization

Non-orthogonal simultaneous diagonalization

T (I, I, w1)¸ ˚˙ ˝

M1

= (w€1 u1)

¸ ˚˙ ˝⁄11

u1u€1 + (w€

1 u2)¸ ˚˙ ˝

⁄21

u2u€2 + (w€

1 u3)¸ ˚˙ ˝

⁄31

u3u€3

......

......

T (I, I, w¸)¸ ˚˙ ˝M¸

= (w€¸ u1)

¸ ˚˙ ˝⁄1¸

u1u€1 + (w€

¸ u2)¸ ˚˙ ˝

⁄2¸

u2u€2 + (w€

¸ u3)¸ ˚˙ ˝

⁄3¸

u3u€3

I No unique non-orthogonal factorization for a single matrix.

I Ø 2 matrices have a unique non-orthogonal factorization.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 18 / 28

Non-orthogonal tensor factorization

Non-orthogonal simultaneous diagonalization

T (I, I, w1)¸ ˚˙ ˝

M1

= (w€1 u1)

¸ ˚˙ ˝⁄11

u1u€1 + (w€

1 u2)¸ ˚˙ ˝

⁄21

u2u€2 + (w€

1 u3)¸ ˚˙ ˝

⁄31

u3u€3

......

......

T (I, I, w¸)¸ ˚˙ ˝M¸

= (w€¸ u1)

¸ ˚˙ ˝⁄1¸

u1u€1 + (w€

¸ u2)¸ ˚˙ ˝

⁄2¸

u2u€2 + (w€

¸ u3)¸ ˚˙ ˝

⁄3¸

u3u€3

I No unique non-orthogonal factorization for a single matrix.I Ø 2 matrices have a unique non-orthogonal factorization.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 18 / 28

Non-orthogonal tensor factorization

Non-orthogonal simultaneous diagonalization

I Algorithm: Simultaneously diagonalize projected matrices.

‚U = arg minU

Lÿ

¸=1o�(U≠1M¸U≠€) o�(A) =

ÿ

i ”=j

A2ij

.

I U are not constrained to be orthogonal.I Optimize using the QR1JD algorithm (Afsari 2006).

I Only guaranteed to have local convergence.I More stable than ALS in practice.

I Sensitivity analysis due to Afsari 2008

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 19 / 28

Non-orthogonal tensor factorization

Non-orthogonal simultaneous diagonalization

I Algorithm: Simultaneously diagonalize projected matrices.

‚U = arg minU

Lÿ

¸=1o�(U≠1M¸U≠€) o�(A) =

ÿ

i ”=j

A2ij

.

I U are not constrained to be orthogonal.

I Optimize using the QR1JD algorithm (Afsari 2006).I Only guaranteed to have local convergence.I More stable than ALS in practice.

I Sensitivity analysis due to Afsari 2008

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 19 / 28

Non-orthogonal tensor factorization

Non-orthogonal simultaneous diagonalization

I Algorithm: Simultaneously diagonalize projected matrices.

‚U = arg minU

Lÿ

¸=1o�(U≠1M¸U≠€) o�(A) =

ÿ

i ”=j

A2ij

.

I U are not constrained to be orthogonal.I Optimize using the QR1JD algorithm (Afsari 2006).

I Only guaranteed to have local convergence.I More stable than ALS in practice.

I Sensitivity analysis due to Afsari 2008

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 19 / 28

Non-orthogonal tensor factorization

Non-orthogonal simultaneous diagonalization

I Algorithm: Simultaneously diagonalize projected matrices.

‚U = arg minU

Lÿ

¸=1o�(U≠1M¸U≠€) o�(A) =

ÿ

i ”=j

A2ij

.

I U are not constrained to be orthogonal.I Optimize using the QR1JD algorithm (Afsari 2006).

I Only guaranteed to have local convergence.I More stable than ALS in practice.

I Sensitivity analysis due to Afsari 2008

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 19 / 28

Non-orthogonal tensor factorization

Results: Non-orthogonal tensor decomposition

Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,

error in factors Æ O

Q

ccccca

ÎU≠€Î22

1 ≠ µ2

ÎfiÎ1fimax

fi2min

Q

a1 +Û

dL

R

b

R

dddddb‘

where U = [u1| . . . |uk

], µ = max u€i

uj

and ÎU

≠€Î22

1≠µ2 measuresnon-orthogonality.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 20 / 28

Non-orthogonal tensor factorization

Results: Non-orthogonal tensor decomposition

Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,

error in factors Æ O

Q

ccccca

ÎU≠€Î22

1 ≠ µ2

ÎfiÎ1fimax

fi2min

Q

a1 +Û

dL

R

b

¸ ˚˙ ˝ortho. cost

R

dddddb‘

where U = [u1| . . . |uk

], µ = max u€i

uj

and ÎU

≠€Î22

1≠µ2 measuresnon-orthogonality.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 20 / 28

Non-orthogonal tensor factorization

Results: Non-orthogonal tensor decomposition

Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,

error in factors Æ O

Q

ccccca

ÎU≠€Î22

1 ≠ µ2¸ ˚˙ ˝

non≠ortho. cost

ÎfiÎ1fimax

fi2min

Q

a1 +Û

dL

R

b

¸ ˚˙ ˝ortho. cost

R

dddddb‘

where U = [u1| . . . |uk

], µ = max u€i

uj

and ÎU

≠€Î22

1≠µ2 measuresnon-orthogonality.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 20 / 28

Related work

Outline

Introduction: tensor factorization

Orthogonal Tensor factorizationProjections

Non-orthogonal tensor factorization

Related work

Empirical results

Conclusions

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 21 / 28

Related work

Notes and Related work

I Orthogonal tensor methods can factorize non-orthogonal tensorsusing a whitening transformation (Anandkumar, Ge, Hsu,S. M. Kakade, et al. 2013).

I Is a major source of errors itself (Souloumiac 2009).

I Simultaneous diagonalization for tensors proposed by Lathauwer2006.

I Relies on computing the SVD of a d4 ◊ k2 matrix.

I Simultaneous diagonalizations for multiple projections mentionedin Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013.

I No analysis presented.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 22 / 28

Related work

Notes and Related work

I Orthogonal tensor methods can factorize non-orthogonal tensorsusing a whitening transformation (Anandkumar, Ge, Hsu,S. M. Kakade, et al. 2013).

I Is a major source of errors itself (Souloumiac 2009).I Simultaneous diagonalization for tensors proposed by Lathauwer

2006.I Relies on computing the SVD of a d4 ◊ k2 matrix.

I Simultaneous diagonalizations for multiple projections mentionedin Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013.

I No analysis presented.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 22 / 28

Related work

Notes and Related work

I Orthogonal tensor methods can factorize non-orthogonal tensorsusing a whitening transformation (Anandkumar, Ge, Hsu,S. M. Kakade, et al. 2013).

I Is a major source of errors itself (Souloumiac 2009).I Simultaneous diagonalization for tensors proposed by Lathauwer

2006.I Relies on computing the SVD of a d4 ◊ k2 matrix.

I Simultaneous diagonalizations for multiple projections mentionedin Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013.

I No analysis presented.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 22 / 28

Empirical results

Outline

Introduction: tensor factorization

Orthogonal Tensor factorizationProjections

Non-orthogonal tensor factorization

Related work

Empirical results

Conclusions

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 23 / 28

Empirical results

Community detectionAnandkumar, Ge, Hsu, and

S. Kakade 2013

a b

c d

= + + · · · +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 24 / 28

Empirical results

Community detection

0 0.02 0.04 0.06 0.080

0.05

0.1

0.15

Recall

Erro

r

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 25 / 28

Empirical results

Community detection

0 0.02 0.04 0.06 0.080

0.05

0.1

0.15

Recall

Erro

rTPM

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 25 / 28

Empirical results

Community detection

0 0.02 0.04 0.06 0.080

0.05

0.1

0.15

Recall

Erro

rTPMOJD

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 25 / 28

Empirical results

CrowdsourcingZhang et al. 2014

a

b

c

Y

NN

YN

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 26 / 28

Empirical results

CrowdsourcingZhang et al. 2014

web rte birds dogs

80

85

90

95

100

Accu

racy

TPM ALSKuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28

Empirical results

CrowdsourcingZhang et al. 2014

web rte birds dogs

80

85

90

95

100

Accu

racy

TPM ALS OJDKuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28

Empirical results

CrowdsourcingZhang et al. 2014

web rte birds dogs

80

85

90

95

100

Accu

racy

TPM ALS OJD NOJDKuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28

Empirical results

CrowdsourcingZhang et al. 2014

web rte birds dogs

80

85

90

95

100

Accu

racy

TPM ALS OJD NOJD MV+EMKuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28

Conclusions

Conclusions

$I Reduce tensor problems to matrix ones with random projections.

I Empirically, competitive with state of the art with support fornon-orthogonal, asymmetric tensors of arbitrary order.

I Open question: is the Jacobi angles algorithm for orthogonalsimultaneous diagaonalization generically globally convergent?

I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/I Thanks! Questions?

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28

Conclusions

Conclusions

$I Reduce tensor problems to matrix ones with random projections.I Empirically, competitive with state of the art with support for

non-orthogonal, asymmetric tensors of arbitrary order.

I Open question: is the Jacobi angles algorithm for orthogonalsimultaneous diagaonalization generically globally convergent?

I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/I Thanks! Questions?

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28

Conclusions

Conclusions

$I Reduce tensor problems to matrix ones with random projections.I Empirically, competitive with state of the art with support for

non-orthogonal, asymmetric tensors of arbitrary order.I Open question: is the Jacobi angles algorithm for orthogonal

simultaneous diagaonalization generically globally convergent?

I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/I Thanks! Questions?

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28

Conclusions

Conclusions

$I Reduce tensor problems to matrix ones with random projections.I Empirically, competitive with state of the art with support for

non-orthogonal, asymmetric tensors of arbitrary order.I Open question: is the Jacobi angles algorithm for orthogonal

simultaneous diagaonalization generically globally convergent?I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/

I Thanks! Questions?

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28

Conclusions

Conclusions

$I Reduce tensor problems to matrix ones with random projections.I Empirically, competitive with state of the art with support for

non-orthogonal, asymmetric tensors of arbitrary order.I Open question: is the Jacobi angles algorithm for orthogonal

simultaneous diagaonalization generically globally convergent?I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/I Thanks! Questions?

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28

Recommended