Generalization of Tensor Factorizationand Applications
Kohei Hayashi
Collaborators:T. Takenouchi, T. Shibata, Y. Kamiya, D. Kato, K. Kunieda, K. Yamada, K. Ikeda,
R. Tomioka, H. Kashima
May 14, 2012
1 / 29
Relational datais a collection of relationships among multiple objects.
relationships of pair ⇔ matrix
relationships of 3-tuple ⇔ 3 dim. array...
...
tensor
can represent as a tensor with missing values.
2 / 29
Relational datais a collection of relationships among multiple objects.
relationships of pair ⇔ matrixrelationships of 3-tuple ⇔ 3 dim. array
......
tensor
can represent as a tensor with missing values.
3 / 29
Relational datais a collection of relationships among multiple objects.
relationships of pair ⇔ matrixrelationships of 3-tuple ⇔ 3 dim. array
......
tensor
can represent as a tensor with missing values.4 / 29
Issue of tensor representation
Tensor representation is generally high-dimensional andlarge-scale
• e.g. 1, 000 users × 1, 000 items × 1, 000 times= total 1, 000, 000, 000 relationships
Dimensional reduction techniques such as tensorfactorization is used.
5 / 29
Tensor factorization
Tucker decomposition [Tucker 1966]: a tensor factorizationmethod assuming
..1 observation noise is Gaussian
..2 underlying tensor is low-dimensional
These assumptions are general ... but are not alwaystrue.
6 / 29
Contributions
Generalize Tucker decomposition and propose two newmodels:
..1 Exponential family tensor factorization (ETF)[Joint work with Takenouchi, Shibata, Kamiya, Kunieda, Yamada, and
Ikeda]
• Generalize the noise distribution.• Can handle a tensor containing mixed discrete and
continuous values.
..2 Full-rank tensor completion (FTC)[Joint work with Tomioka and Kashima]
• Kernelize Tucker decomposition• Complete missing values without reducing the
dimensionality.
7 / 29
Outline
..1 Tucker decomposition
..2 Exponential family tensor factorization (ETF)
..3 Full-rank tensor completion (FTC)
..4 Conclusion
8 / 29
Outline
..1 Tucker decomposition
..2 Exponential family tensor factorization (ETF)
..3 Full-rank tensor completion (FTC)
..4 Conclusion
9 / 29
Tucker decomposition
xijk =
K1∑q=1
K2∑r=1
K3∑s=1
u(1)iq u
(2)jr u
(3)ks zqrs + εijk (1)
• ε: i.i.d Gaussian noise
10 / 29
Tucker decomposition
xijk =
K1∑q=1
K2∑r=1
K3∑s=1
u(1)iq u
(2)jr u
(3)ks zqrs + εijk (1)
• ε: i.i.d Gaussian noise
11 / 29
Tucker decomposition
xijk =
K1∑q=1
K2∑r=1
K3∑s=1
u(1)iq u
(2)jr u
(3)ks zqrs + εijk (1)
• ε: i.i.d Gaussian noise
12 / 29
Vectorized form
Let ~x ∈ RD and ~z ∈ RK denote vectorized X and Z,resp., then
~x = W~z + ~ε where W ≡ U(3) ⊗ U(2) ⊗ U(1).
• D ≡ D1D2D3, K ≡ K1K2K3
• ⊗: the Kronecker product
Tucker decomposition = A linear Gaussian model
~x ∼ N(~x | W~z, σ2I)
13 / 29
Rank of tensor
Call dimensionalities of the core tensor rank of tensor .
• Rank of X is (K1, K2, K3).
Rank of tensor represents its complexity
• low-rank tensor has less information
14 / 29
Outline
..1 Tucker decomposition
..2 Exponential family tensor factorization (ETF)
..3 Full-rank tensor completion (FTC)
..4 Conclusion
15 / 29
Motivation: heterogeneous tensor
Tucker decomposition assumes Gaussian noise
• not appropriate for such data
.Approach..
......generalize Tucker decomposition, assuming a differentdistribution for each element
16 / 29
Motivation: heterogeneous tensor
Tucker decomposition assumes Gaussian noise
• not appropriate for such data.Approach..
......generalize Tucker decomposition, assuming a differentdistribution for each element
17 / 29
Exponential family tensor factorization.Likelihood..
......
~x ∼D∏
d=1
Expond(~xd | ~θd)︸ ︷︷ ︸exp. family
, ~θ ≡ W~z︸ ︷︷ ︸Tucker decomp.
where Expon(x | θ) ≡ exp [xθ − ψ(θ) + F (x)]
exponential family : a class of distributions
Gaussian Poisson Bernoulli
Density(θ = 0)
−4 −2 0 2 4
0.0
0.2
0.4
x
Den
sity
0 2 4 6 8
0.0
0.2
−0.5 0.0 0.5 1.0 1.50.
30.
50.
7
ψ(θ) θ2/2 exp[θ] ln(1 + exp[θ])
18 / 29
.Priors..
......
• for ~z: a Gaussian prior N(0, I)
• for U(m): a Gaussian prior N(0, α−1m I)
.Joint log-likelihood..
......
L =~x> ~W~z −D∑
d=1
ψhd(w>
d ~z)
− 1
2||~z||2 −
M∑m=1
αm
2
∣∣∣∣∣∣U(m)∣∣∣∣∣∣2
Fro+ const.
Estimate parameters by Bayesian inference.19 / 29
Bayesian inferenceMarginal-MAP estimator:
argmaxU(1),...,U(M)
∫exp[L(~z,U(1), . . . ,U(M))]d~z
• The integral is not analytical.
Develop efficient yet accurate approximation withLaplace approximation and Gaussian process.
• Computational cost is still higher than Tuckerdcomp.
• For a long and thin tensor (e.g. time series), onlinealgorithm is applicable (see thesis.)
20 / 29
Experiments
21 / 29
Anomaly detection by ETFPurpose Find irregular parts of tensor dataMethod Apply distance-based outlier
DB(p,D) [Knorr+ VLDB’00] to the estimatedfactor matrix U(m)
.Definition of DB(p,D)..
......
“An object O is a DB(p,D) outlier if at least fraction pof the objects lies at a distance greater than D from O.”
22 / 29
Data set
A time series of multisensor measurements• Each sensor recorded the human behavior (e.g.
position) of researchers in NEC lab. for 8 month.• 6 (sensors) × 21 (persons) × 1927 (hours)
Sensor Type Min MaxX1 # of sent emails Count 0.00 14.00X2 # of received emails Count 0.00 15.00X3 # of typed keys Count 0.00 50422.00X4 X coordinate Real −550.17 3444.15X5 Y coordinate Real 128.71 2353.55X6 Movement distance Non-negative 0.00 203136.96
23 / 29
Evaluation
List of irregular events are provided.
Examples of irregular eventsDate Time Description
Dec 21 All day Private incidentDec 22 15:00 Monthly seminar
16:00Jun 15 13:00 Visiting tour...
......
• Evaluate as a binary classification
• 50% are missing, 10 trials
• Apply ETF with online algorithm
24 / 29
Result: classification performance
Rank
AU
C
0.4
0.5
0.6
0.7
2x2x2 3x3x3 4x4x4 5x5x5 6x6x6 6x7x7
method
PARAFAC
Tucker
pTucker (EM)
ETF online
3 ETF well detect anomaly events
25 / 29
Result: classification performance
Rank
AU
C
0.4
0.5
0.6
0.7
2x2x2 3x3x3 4x4x4 5x5x5 6x6x6 6x7x7
method
PARAFAC
Tucker
pTucker (EM)
ETF online
3 ETF well detect anomaly events26 / 29
Skip the rest
27 / 29