Volodymyr Kuleshovú Arun Tejasvi Chagantyú …kuleshov/papers/aistats2015...Tensor Factorization...

Tensor Factorization via Matrix Factorization

Volodymyr Kuleshovú

Arun Tejasvi Chagantyú

Percy Liang

Stanford University

May 11, 2015

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 1 / 28

Introduction: tensor factorization

An application: community detection

Anandkumar, Ge, Hsu, andS. Kakade 2013

= + + · · · +

An application: community detectionAnandkumar, Ge, Hsu, and

S. Kakade 2013

= + + · · · +

Applications of tensor factorizationI Community detection

I Anandkumar, Ge, Hsu, and S. Kakade 2013I Parsing

I Cohen, Satta, and Collins 2013I Knowledge base completion

I Chang et al. 2014I Singh, Rocktaschel, and Riedel 2015

I Topic modellingI Anandkumar, Foster, et al. 2012

I CrowdsourcingI Zhang et al. 2014

I Mixture modelsI Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013

I Bottlenecked modelsI Chaganty and Liang 2014

I . . .Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 3 / 28

What is tensor (CP) factorization?I Tensor analogue of matrix eigen-decomposition.

M =kÿ

I Goal: Given T with noise, ‘R, recover factors ui

= + + · · · +

= + + · · · + +

T =kÿ

= + + · · · +

= + + · · · + +

‚T =kÿ

+‘R.

= + + · · · +

= + + · · · + +

‚T =kÿ

+‘R.

= + + · · · +

= + + · · · + +

Existing tensor factorization algorithms

I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al.2013)

I Analog of matrix power method.I Sensitive to noise.I Restricted to orthogonal tensors.

I Alternating least squares (Comon, Luciani, and Almeida 2009;Anandkumar, Ge, and Janzamin 2014)

I Sensitive to initialization.

Our approach: reduce to existing fast and robust matrixalgorithms.

Orthogonal Tensor factorization

Outline

Orthogonal Tensor factorizationProjections

Non-orthogonal tensor factorization

Related work

Empirical results

Conclusions

Tensor factorization via single matrix factorization

T = fi1u¢31 + fi2u¢3

2 + fi3u¢33 + ‘R

T = u¢31 + u¢3

1 + u¢31

T (I, I, w) = (w€u1)u¢21 + (w€u2)u¢2

2 + (w€u3)u¢23

I Proposal: Eigen-decomposition on the projected matrix.I Return: recovered eigenvectors, u

T = u¢31 + u¢3

1 + u¢31

T (I, I, w) = (w€u1)u¢21 + (w€u2)u¢2

2 + (w€u3)u¢23

T = u¢31 + u¢3

1 + u¢31

T (I, I, w) = (w€u1)¸ ˚˙ ˝

u¢21 + (w€u2)

¸ ˚˙ ˝⁄2

u¢22 + (w€u3)

¸ ˚˙ ˝⁄3

Sensitivity of single matrix projectionI Problem: Eigendecomposition is very sensitive to the eigengap.

error in factors Ã 1min(di�erence in eigenvalues) .

I Intuition: If two eigenvalues are equal, corresponding eigenvectorsare arbitrary.

= + ++ +

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors Ã1

min di�. in eigenvalues .

I Simultaneous matrixfactorization:

min avg. di�. in eigenvalues .

Every coordinate pair needsone good projection (with alarge eigengap).

(Cardoso 1994)

Sensitivity analysis (contd.)(Cardoso 1994)

Reduction to simultaneous diagonalization

T (I, I, w1)¸ ˚˙ ˝

= (w€1 u1)

¸ ˚˙ ˝⁄11

u1u€1 + (w€

1 u2)¸ ˚˙ ˝

u2u€2 + (w€

1 u3)¸ ˚˙ ˝

u3u€3

......

T (I, I, w¸)¸ ˚˙ ˝M¸

= (w€¸ u1)

¸ ˚˙ ˝⁄1¸

u1u€1 + (w€

¸ u2)¸ ˚˙ ˝

⁄2¸

u2u€2 + (w€

¸ u3)¸ ˚˙ ˝

⁄3¸

u3u€3

I Projections share factors: can be simultaneously diagonalized.

T (I, I, w1)¸ ˚˙ ˝

= (w€1 u1)

¸ ˚˙ ˝⁄11

u1u€1 + (w€

1 u2)¸ ˚˙ ˝

u2u€2 + (w€

1 u3)¸ ˚˙ ˝

u3u€3

......

T (I, I, w¸)¸ ˚˙ ˝M¸

= (w€¸ u1)

¸ ˚˙ ˝⁄1¸

u1u€1 + (w€

¸ u2)¸ ˚˙ ˝

⁄2¸

u2u€2 + (w€

¸ u3)¸ ˚˙ ˝

⁄3¸

u3u€3

T (I, I, w1)¸ ˚˙ ˝

= (w€1 u1)

¸ ˚˙ ˝⁄11

u1u€1 + (w€

1 u2)¸ ˚˙ ˝

u2u€2 + (w€

1 u3)¸ ˚˙ ˝

u3u€3

......

T (I, I, w¸)¸ ˚˙ ˝M¸

= (w€¸ u1)

¸ ˚˙ ˝⁄1¸

u1u€1 + (w€

¸ u2)¸ ˚˙ ˝

⁄2¸

u2u€2 + (w€

¸ u3)¸ ˚˙ ˝

⁄3¸

u3u€3

Simultaneous diagonalization algorithm

I Algorithm: Simultaneously diagonalize projected matrices.

‚U = arg minU:UU

¸=1o�(U€M¸U) o�(A) =

i ”=j

I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).

I Used widly in the ICA community.I Empirically appears to be generically global convergence.

‚U = arg minU:UU

¸=1o�(U€M¸U) o�(A) =

i ”=j

‚U = arg minU:UU

¸=1o�(U€M¸U) o�(A) =

i ”=j

‚U = arg minU:UU

¸=1o�(U€M¸U) o�(A) =

i ”=j

‚U = arg minU:UU

¸=1o�(U€M¸U) o�(A) =

i ”=j

I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).I Used widly in the ICA community.I Empirically appears to be generically global convergence.

Orthogonal Tensor factorization Projections

Outline

Related work

Empirical results

Conclusions

Oracle and random projections

I Hypothetically: “oracle”projections along the factorsis good.

I Practically: use projectionsalong random directions.

Oracle and random projections

I Hypothetically: “oracle”projections along the factorsis good.

I Practically: use projectionsalong random directions.

Results: Orthogonal tensor decomposition

‚T =kÿ

+ ‘R.

Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,

error in factors Æ O

ÎfiÎ1fimax

fi2min

dddb ‘

‚T =kÿ

+ ‘R.

ÎfiÎ1fimax

fi2min

dddb ‘

‚T =kÿ

+ ‘R.

ÎfiÎ1fimax

fi2min¸ ˚˙ ˝

oracle error

dddb ‘

‚T =kÿ

+ ‘R.

ÎfiÎ1fimax

fi2min¸ ˚˙ ˝

oracle error

¸˚˙˝conc. term

dddb ‘

Empirical: Random vs. oracle projections

0 10 20 30 40 50 601umEer of projections

2rthogonaO case

Figure: Comparing random vs. oracle projections (d = k = 10, ‘ = 0.05)

Outline

Related work

Empirical results

Conclusions

Non-orthogonal simultaneous diagonalization

T (I, I, w1)¸ ˚˙ ˝

= (w€1 u1)

¸ ˚˙ ˝⁄11

u1u€1 + (w€

1 u2)¸ ˚˙ ˝

u2u€2 + (w€

1 u3)¸ ˚˙ ˝

u3u€3

......

T (I, I, w¸)¸ ˚˙ ˝M¸

= (w€¸ u1)

¸ ˚˙ ˝⁄1¸

u1u€1 + (w€

¸ u2)¸ ˚˙ ˝

⁄2¸

u2u€2 + (w€

¸ u3)¸ ˚˙ ˝

⁄3¸

u3u€3

I No unique non-orthogonal factorization for a single matrix.

I Ø 2 matrices have a unique non-orthogonal factorization.

T (I, I, w1)¸ ˚˙ ˝

= (w€1 u1)

¸ ˚˙ ˝⁄11

u1u€1 + (w€

1 u2)¸ ˚˙ ˝

u2u€2 + (w€

1 u3)¸ ˚˙ ˝

u3u€3

......

T (I, I, w¸)¸ ˚˙ ˝M¸

= (w€¸ u1)

¸ ˚˙ ˝⁄1¸

u1u€1 + (w€

¸ u2)¸ ˚˙ ˝

⁄2¸

u2u€2 + (w€

¸ u3)¸ ˚˙ ˝

⁄3¸

u3u€3

I No unique non-orthogonal factorization for a single matrix.I Ø 2 matrices have a unique non-orthogonal factorization.

‚U = arg minU

¸=1o�(U≠1M¸U≠€) o�(A) =

i ”=j

I U are not constrained to be orthogonal.I Optimize using the QR1JD algorithm (Afsari 2006).

I Only guaranteed to have local convergence.I More stable than ALS in practice.

I Sensitivity analysis due to Afsari 2008

‚U = arg minU

¸=1o�(U≠1M¸U≠€) o�(A) =

i ”=j

I U are not constrained to be orthogonal.

I Optimize using the QR1JD algorithm (Afsari 2006).I Only guaranteed to have local convergence.I More stable than ALS in practice.

‚U = arg minU

¸=1o�(U≠1M¸U≠€) o�(A) =

i ”=j

‚U = arg minU

¸=1o�(U≠1M¸U≠€) o�(A) =

i ”=j

Results: Non-orthogonal tensor decomposition

ccccca

ÎU≠€Î22

1 ≠ µ2

ÎfiÎ1fimax

fi2min

a1 +Û

dddddb‘

where U = [u1| . . . |uk

], µ = max u€i

and ÎU

≠€Î22

1≠µ2 measuresnon-orthogonality.

ccccca

ÎU≠€Î22

1 ≠ µ2

ÎfiÎ1fimax

fi2min

a1 +Û

¸ ˚˙ ˝ortho. cost

dddddb‘

where U = [u1| . . . |uk

], µ = max u€i

and ÎU

≠€Î22

ccccca

ÎU≠€Î22

1 ≠ µ2¸ ˚˙ ˝

non≠ortho. cost

ÎfiÎ1fimax

fi2min

a1 +Û

¸ ˚˙ ˝ortho. cost

dddddb‘

where U = [u1| . . . |uk

], µ = max u€i

and ÎU

≠€Î22

Related work

Outline

Related work

Empirical results

Conclusions

Related work

Notes and Related work

I Orthogonal tensor methods can factorize non-orthogonal tensorsusing a whitening transformation (Anandkumar, Ge, Hsu,S. M. Kakade, et al. 2013).

I Is a major source of errors itself (Souloumiac 2009).

I Simultaneous diagonalization for tensors proposed by Lathauwer2006.

I Relies on computing the SVD of a d4 ◊ k2 matrix.

I Simultaneous diagonalizations for multiple projections mentionedin Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013.

I No analysis presented.

Related work

I Is a major source of errors itself (Souloumiac 2009).I Simultaneous diagonalization for tensors proposed by Lathauwer

2006.I Relies on computing the SVD of a d4 ◊ k2 matrix.

Related work

I Is a major source of errors itself (Souloumiac 2009).I Simultaneous diagonalization for tensors proposed by Lathauwer

2006.I Relies on computing the SVD of a d4 ◊ k2 matrix.

Empirical results

Outline

Related work

Empirical results

Conclusions

Empirical results

Community detectionAnandkumar, Ge, Hsu, and

S. Kakade 2013

= + + · · · +

Empirical results

Community detection

0 0.02 0.04 0.06 0.080

Recall

Empirical results

Community detection

0 0.02 0.04 0.06 0.080

Recall

Empirical results

Community detection

0 0.02 0.04 0.06 0.080

Recall

rTPMOJD

Empirical results

CrowdsourcingZhang et al. 2014

Empirical results

web rte birds dogs

TPM ALSKuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28

Empirical results

web rte birds dogs

TPM ALS OJDKuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28

Empirical results

web rte birds dogs

TPM ALS OJD NOJDKuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28

Empirical results

web rte birds dogs

TPM ALS OJD NOJD MV+EMKuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28

Conclusions

$I Reduce tensor problems to matrix ones with random projections.

I Empirically, competitive with state of the art with support fornon-orthogonal, asymmetric tensors of arbitrary order.

I Open question: is the Jacobi angles algorithm for orthogonalsimultaneous diagaonalization generically globally convergent?

I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/I Thanks! Questions?

Conclusions

$I Reduce tensor problems to matrix ones with random projections.I Empirically, competitive with state of the art with support for

non-orthogonal, asymmetric tensors of arbitrary order.

I Open question: is the Jacobi angles algorithm for orthogonalsimultaneous diagaonalization generically globally convergent?

Conclusions

non-orthogonal, asymmetric tensors of arbitrary order.I Open question: is the Jacobi angles algorithm for orthogonal

simultaneous diagaonalization generically globally convergent?

Conclusions

simultaneous diagaonalization generically globally convergent?I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/

I Thanks! Questions?

Conclusions

simultaneous diagaonalization generically globally convergent?I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/I Thanks! Questions?

Volodymyr Kuleshovú Arun Tejasvi Chagantyú …kuleshov/papers/aistats2015...Tensor Factorization...

Documents

LLU ERDev ISC · 2019. 10. 15. · Volodymyr Bulgakov, Valerii Adamchuk, Semjons Ivanovs, Hryhorij Kaletnik Valentyna Krutyakova, Volodymyr Bulgakov, Volodymyr Belchenko, Adolfs Rucins

Volodymyr Prykhodko

assistant-professor Volodymyr Voloshyn

Volodymyr lisin teasers

Volodymyr Hucul

Volodymyr Yakubov 1 StandardsRegs - INOGATE

VOLODYMYR MOSOROV MARIAN NIED WIEDZISKI - …€¦ · VOLODYMYR MOSOROV, MARIAN NIED WIEDZISKI STEGANOGRAPHY IN E-COMMERCE: ... validation of electronic documents,

Bots, Volodymyr Sheremeta

Volodymyr lisin portfolio

Madhushala song ...presented by tejasvi anant

Timothy Cantrell , Giancarlo Corti , Miles Beaux , Tejasvi ...Next Generation Nanospring-enhanced Catalytic Converters Timothy Cantrell1, Giancarlo Corti1, Miles Beaux1, Tejasvi Prakash1

34 St. Volodymyr

Factorization ：

Which Direction Will Volodymyr Zelenskiy Go?

composite number into factors Using Prime Factorization … · 1 Using Prime Factorization to Find GCF & LCM Prime Factorization the factorization of a composite number into _____

Tejasvi Kumar Technology Specialist – VSTS Microsoft Corporation tejkumar@microsoft.com

Tejasvi Exports Maharashtra India

Euroregion Dniester presentation by Volodymyr Merezhko

ChidaMbara Rahasya - TEjasvi (Kannadanudi.wikidot.com)

RecruitingFEST | Volodymyr Glaschenkov