Download pdf - Generalization of Tensor Factorization and Applications

Generalization of Tensor Factorizationand Applications

Kohei Hayashi

Collaborators:T. Takenouchi, T. Shibata, Y. Kamiya, D. Kato, K. Kunieda, K. Yamada, K. Ikeda,

R. Tomioka, H. Kashima

May 14, 2012

1 / 29

Relational datais a collection of relationships among multiple objects.

relationships of pair ⇔ matrix

relationships of 3-tuple ⇔ 3 dim. array...

...

tensor

can represent as a tensor with missing values.

2 / 29


relationships of pair ⇔ matrixrelationships of 3-tuple ⇔ 3 dim. array

......

tensor

can represent as a tensor with missing values.

3 / 29


relationships of pair ⇔ matrixrelationships of 3-tuple ⇔ 3 dim. array

......

tensor

can represent as a tensor with missing values.4 / 29

Issue of tensor representation

Tensor representation is generally high-dimensional andlarge-scale

• e.g. 1, 000 users × 1, 000 items × 1, 000 times= total 1, 000, 000, 000 relationships

Dimensional reduction techniques such as tensorfactorization is used.

5 / 29

Tensor factorization

Tucker decomposition [Tucker 1966]: a tensor factorizationmethod assuming

..1 observation noise is Gaussian

..2 underlying tensor is low-dimensional

These assumptions are general ... but are not alwaystrue.

6 / 29

Contributions

Generalize Tucker decomposition and propose two newmodels:

..1 Exponential family tensor factorization (ETF)[Joint work with Takenouchi, Shibata, Kamiya, Kunieda, Yamada, and

Ikeda]

• Generalize the noise distribution.• Can handle a tensor containing mixed discrete and

continuous values.

..2 Full-rank tensor completion (FTC)[Joint work with Tomioka and Kashima]

• Kernelize Tucker decomposition• Complete missing values without reducing the

dimensionality.

7 / 29

Outline

..1 Tucker decomposition

..2 Exponential family tensor factorization (ETF)

..3 Full-rank tensor completion (FTC)

..4 Conclusion

8 / 29

Outline




..4 Conclusion

9 / 29

Tucker decomposition

xijk =

K1∑q=1

K2∑r=1

K3∑s=1

u(1)iq u

(2)jr u

(3)ks zqrs + εijk (1)

• ε: i.i.d Gaussian noise

10 / 29


xijk =

K1∑q=1

K2∑r=1

K3∑s=1

u(1)iq u

(2)jr u



11 / 29


xijk =

K1∑q=1

K2∑r=1

K3∑s=1

u(1)iq u

(2)jr u



12 / 29

Vectorized form

Let ~x ∈ RD and ~z ∈ RK denote vectorized X and Z,resp., then

~x = W~z + ~ε where W ≡ U(3) ⊗ U(2) ⊗ U(1).

• D ≡ D1D2D3, K ≡ K1K2K3

• ⊗: the Kronecker product

Tucker decomposition = A linear Gaussian model

~x ∼ N(~x | W~z, σ2I)

13 / 29

Rank of tensor

Call dimensionalities of the core tensor rank of tensor .

• Rank of X is (K1, K2, K3).

Rank of tensor represents its complexity

• low-rank tensor has less information

14 / 29

Outline




..4 Conclusion

15 / 29

Motivation: heterogeneous tensor

Tucker decomposition assumes Gaussian noise

• not appropriate for such data

.Approach..

......generalize Tucker decomposition, assuming a differentdistribution for each element

16 / 29

Motivation: heterogeneous tensor

Tucker decomposition assumes Gaussian noise

• not appropriate for such data.Approach..

......generalize Tucker decomposition, assuming a differentdistribution for each element

17 / 29

Exponential family tensor factorization.Likelihood..

......

~x ∼D∏

d=1

Expond(~xd | ~θd)︸︷︷︸exp. family

, ~θ ≡ W~z︸︷︷︸Tucker decomp.

where Expon(x | θ) ≡ exp [xθ − ψ(θ) + F (x)]

exponential family : a class of distributions

Gaussian Poisson Bernoulli

Density(θ = 0)

−4 −2 0 2 4

0.0

0.2

0.4

x

Den

sity

0 2 4 6 8

0.0

0.2

−0.5 0.0 0.5 1.0 1.50.

30.

50.

7

ψ(θ) θ2/2 exp[θ] ln(1 + exp[θ])

18 / 29

.Priors..

......

• for ~z: a Gaussian prior N(0, I)

• for U(m): a Gaussian prior N(0, α−1m I)

.Joint log-likelihood..

......

L =~x> ~W~z −D∑

d=1

ψhd(w>

d ~z)

− 1

2||~z||2 −

M∑m=1

αm

2

∣∣∣∣∣∣U(m)∣∣∣∣∣∣2

Fro+ const.

Estimate parameters by Bayesian inference.19 / 29

Bayesian inferenceMarginal-MAP estimator:

argmaxU(1),...,U(M)

∫exp[L(~z,U(1), . . . ,U(M))]d~z

• The integral is not analytical.

Develop efficient yet accurate approximation withLaplace approximation and Gaussian process.

• Computational cost is still higher than Tuckerdcomp.

• For a long and thin tensor (e.g. time series), onlinealgorithm is applicable (see thesis.)

20 / 29

Experiments

21 / 29

Anomaly detection by ETFPurpose Find irregular parts of tensor dataMethod Apply distance-based outlier

DB(p,D) [Knorr+ VLDB’00] to the estimatedfactor matrix U(m)

.Definition of DB(p,D)..

......

“An object O is a DB(p,D) outlier if at least fraction pof the objects lies at a distance greater than D from O.”

22 / 29

Data set

A time series of multisensor measurements• Each sensor recorded the human behavior (e.g.

position) of researchers in NEC lab. for 8 month.• 6 (sensors) × 21 (persons) × 1927 (hours)

Sensor Type Min MaxX1 # of sent emails Count 0.00 14.00X2 # of received emails Count 0.00 15.00X3 # of typed keys Count 0.00 50422.00X4 X coordinate Real −550.17 3444.15X5 Y coordinate Real 128.71 2353.55X6 Movement distance Non-negative 0.00 203136.96

23 / 29

Evaluation

List of irregular events are provided.

Examples of irregular eventsDate Time Description

Dec 21 All day Private incidentDec 22 15:00 Monthly seminar

16:00Jun 15 13:00 Visiting tour...

......

• Evaluate as a binary classification

• 50% are missing, 10 trials

• Apply ETF with online algorithm

24 / 29

Result: classification performance

Rank

AU

C

0.4

0.5

0.6

0.7

2x2x2 3x3x3 4x4x4 5x5x5 6x6x6 6x7x7

method

PARAFAC

Tucker

pTucker (EM)

ETF online

3 ETF well detect anomaly events

25 / 29

Result: classification performance

Rank

AU

C

0.4

0.5

0.6

0.7

2x2x2 3x3x3 4x4x4 5x5x5 6x6x6 6x7x7

method

PARAFAC

Tucker

pTucker (EM)

ETF online

3 ETF well detect anomaly events26 / 29

Skip the rest

27 / 29