30
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energys National Nuclear Security Administration under contract DE-AC04-94AL85000. Brett Bader*, Richard Harshman** & Tamara Kolda* *Sandia National Laboratories **University of Western Ontario Workshop for Algorithms on Modern Massive Data Sets June 24, 2006 Analysis of Latent Relationships in Semantic Graphs using DEDICOM

Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration

under contract DE-AC04-94AL85000.

Brett Bader*, Richard Harshman** & Tamara Kolda**Sandia National Laboratories**University of Western Ontario

Workshop for Algorithms on Modern Massive Data SetsJune 24, 2006

Analysis of Latent Relationships in Semantic Graphs using DEDICOM

Page 2: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Common Graph Analysis Technique

Uk

Vk

Adjacencymatrix

ΣkT

Best rank-k matrix filters out noise and captures “latent” information, which improves

certain data mining tasks

But we may have ignored critical informationby not considering edge metadata!

Truncated SVD

Web search - HITS (Kleinberg, 1998)

Ak = UkΣkVT

k =

k∑

i=1

σiuivTi

For example:

Page 3: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Semantic Graphs

• Different types of edges

• Examples- WWW (anchor text)- Subway map [thanks Orly!]

- Email communications (time stamp, to/cc)

Page 4: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Tucker

New Paradigm:“Multidimensional Data Mining”

+ + ...Third dimension offers more explanatory power: uncovers new

latent information and reveals subtle relationships

Build an “adjacency tensor” such that there is an adjacency matrix for each edge type.

DEDICOM

PARAFAC

Multilinearalgebra

Adjacencymatrix

Adjacencytensor

Page 5: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Objective

Use DEDICOM to analyze a semantic graph of email communications

changing over time

David

Ellen

Bob

Frank

Alice Carl

IngridHenk

Gary

role

s

time patterns

3-way DEDICOM=

Page 6: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

DEDICOM

• DEcomposition into DIrectional COMponents

• Introduced in 1978 by Harshman

• Past applications- Study asymmetries in telephone calls among cities- Marketing research• car switching: car owners and what they buy next • free associations of words- words to describe hair in advertising shampoo:

“body” evokes “fullness” more often than “fullness” evokes “body”

- Asymmetric measures of world trade (import/export)

• Variations- Three-way DEDICOM- Constrained DEDICOM

Page 7: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

DEDICOM Models & Algorithms

=XR AT

=

All are “alternating” algorithms

• Generalized Takane method• New algorithm

• Kiers’ method• New algorithm

X AR

A

AT

(Takane, 1985; Kiers et al., 1990)

(Kiers, 1993)

Page 8: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Mathematical Notation

• Scalars• Vectors• Matrices• Tensors (3-way array) - frontal slices of :

• Special symbols- Kronecker product

- Hadamard product (elementwise)

A ⊗ B =

a11B . . . a1nB

.

.

.

...

.

.

.

am1B . . . amnB

a

a

A

D

XiX

X

A ∗ B =

a11b11 . . . a1nb1n

.

.

.

...

.

.

.

am1bm1 . . . amnbmn

X1

Page 9: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Two-way DEDICOM

X = ARAT

+ E

minA,R

X − ARAT

2

F

• A (n x p) is an orthogonal matrix of loadings or weights• R (p x p) is a dense matrix that captures asymmetric relationships

• Decomposition is not unique- A can be transformed with no loss of fit to the data- Nonsingular transformation Q:

- Usually “fix” A with some standard rotation (e.g., VARIMAX)

X ≈ ARAT

=X AR AT

s.t. A orthogonal

ARAT = (AQ)(Q−1RQ−T )(AQ)T

n

n p

n

Single domain model

Page 10: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

New Algorithm

Anew ←

(

X XT)

(

(

R RT)

(

AT 0

0 AT

))†

Anew =(

XART + XTAR

) (

R(ATA)RT + RT (ATA)R)

−1.

Solving for A:

or

Solving for R:

Stack data and model “side by side” in a single equation(

X XT)

=(

ARAT

ARTAT

)

= A

(

(

R RT)

(

AT 0

0 AT

))

...and solve least-squares problem:

Rnew = A†X(AT )†

= AY ZT

minA

Y − AZT

2

F

Page 11: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Three-way DEDICOM

Xi = ADiRDiAT + Ei for i = 1, . . . ,m,

=

minA,R,D

m∑

i=1

∥Xi − ADiRDiAT

2

F

• A (n x p) is a matrix of loadings or weights (not necessarily orthogonal)• R (p x p) is a dense matrix that captures asymmetric relationships• D (p x p x m) is a tensor with diagonal frontal slices giving the weights

of the columns of A for each slice in third mode

• *Unique* solution with enough slices of X with sufficient variation- i.e., no rotation of A possible- greater confidence in interpretation of results

n

nm

A

ATRD D

n

p

Page 12: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

New Algorithm - Updating A

(

X1 XT1 · · · Xm XT

m

)

= A(

D1RD1 D1RTD1 · · · DmRDm DmRTDm

) (

I2m ⊗ AT)

A =

[

m∑

i=1

(

XiADiRTDi + XT

i ADiRDi

)

] [

m∑

i=1

(Bi + Ci)

]

−1

Bi ≡ DiRDi(ATA)DiR

TDi,

Ci ≡ DiRTDi(A

TA)DiRDi.

Solving for A:

where

= AY

minA,R,D

m∑

i=1

∥Xi − ADiRDiAT

2

F

ZT

A = YZ(ZTZ)−1

Page 13: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

New Algorithm - Updating D

minDi

∥Xi − ADiRDiAT

2

F

A = QA,

minDi

QTXiQ − ADiRDiAT

2

F

gk = −

i,j

[

2(X − ADRDAT ) ∗ (ADrka

Tk + akrk,:DA

T )]

i,j

Solving for D:

Use compressionQR factorization:

Use Newton’s method to solve the optimization problem for

Smaller problem (p x p)

hst = −2∑

i,j

[

(X − ADRDAT ) ∗ (asrsta

Tt + atrtsa

Ts )

− (ADrsaTs + asrs:DA

T ) ∗ (ADrtaTt + atrt:DA

T )]

i,j

dnew = d − H−1g

d = diag(Di)

Gradient:

Hessian:

Page 14: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Our Algorithm - Updating R

Solving for R:

f(R) =

Vec(X1)...

Vec(Xm)

AD1 ⊗ AD1

...ADm ⊗ A Dm

Vec(R)

Vec(R) =

(

m∑

i=1

(DiATADi) ⊗ (DiA

TADi)

)

−1 m∑

i=1

Vec(DiATXiADi)

minimize:

Use the approach in (Kiers, 1993)

minR

m∑

i=1

∥Xi − ADiR DiAT

2

F

Page 15: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Algorithm Costs

ATA

XiART

XT

i AR

QTXiQ

QR factorization of AO(p2

n)

Xi

Dominant costs:

Updating A is most expensive part

linear in nnz of

Page 16: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Application: Enron Email Analysis

• Links consist of email communications

• What can we learn about this network strictly from their communication patterns? (Social network analysis)

David

Ellen

Bob

Frank

Alice Carl

IngridHenk

Gary

Page 17: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

!

"#$%&'()

*&$+,#-+.#/,(01+&.(/2(3,&/,(0/&4

!"#$"%"&'"(

!"#$"

)(*+$#,-

!"#$"

.$+(#/01#,(*'"2

!"#$"

31-/01#,(*'"2

!"#$"

3("(#1*'$"

!"#$"

)$#*4

56(#'71

!"#$"

/!"(#28

/9(#:'7(-

!"#$"

;#$1<=1"<

!"#$"

.'>(?'"(-

!"#$"

@#1"->$#*1*'$"

9(#:'7(-

!"#$"/A$#>

+,5(6783(9+.#/,+:(3,'&$;(7&/%4<((=1'(.'+>(+:?/(>'.(@#.1(&'4&'?',.+.#A'?(/2(.1'(9'@

B/&C(D'&E+,.#:'(3FE1+,$'(G9BD3HI(+,5(.1'(9'@(B/&C(J./EC(3FE1+,$'(G9BJ3I(./

:'+&,(/2(.1'#&('F4'&#',E'?(@#.1(':'E.&/,#E(.&+5#,$(+,5(+,;(4&/K:'>?(.1';(2/&'?''(@#.1

','&$;('L.&+5#,$(+?(#.('A/:A'?<((M,(+55#.#/,N(.1'(.'+>(>'.(@#.1(&'4&'?',.+.#A'?(/2(O#/5'FN(+

2#,+,E#+:(?/2.@+&'(2#&>N(.1+.(#?(5'A':/4#,$(+(,'@('L.&+5#,$(?;?.'>(2/&(.1'(9BD3H(+,5

@1/?'(&#?C(>+,+$'>',.(?/2.@+&'(#?(>+5'(+A+#:+K:'(/,(3*P(./(#.?(E%?./>'&?<((=1'

>'>K'&?(/2(.1'(.'+>(+:?/(.//C(E/%&?'?(#,(','&$;(5'&#A+.#A'?(+,5(2#,+,E#+:(+,5(','&$;

>+&C'.(.&+5#,$<

!"#$%&'()*

=1'(K%?#,'??(E/,E'4.(%,5'&:;#,$(3*P(#?(?#>4:'<((=&+5#,$(/,(3*P(%?#,$(.1'

M,.'&,'.(&'4:+E'?(>+&C'.#,$(.1+.(4&'A#/%?:;(.//C(4:+E'(K;(.':'41/,'(+,5(2+F<((*,(3*PN

3,&/,(>+&C'.'&?(+&'(/,(/,'(?#5'(/2('A'&;(.&+5'(Q%?.(+?(.1';(+&'(@1',(.1';(%?'(41/,'(+,5

2+F(./(.&+5'<((3,&/,*,:#,'(#?(+(E/>4%.'&(?;?.'>(/4'&+.'5(K;(+,(3,&/,(?%K?#5#+&;(E+::'5

3,&/,(9'.@/&C?N(M,E<((3,&/,(>+&C'.#,$(?%K?#5#+&#'?(3,&/,(6/@'&(D+&C'.#,$N(M,E<(

G3D6MI(+,5(3,&/,(9/&.1(R>'&#E+N(M,E<((G39RI(E/,5%E.(3,&/,S?(':'E.&#E(4/@'&(+,5

,+.%&+:($+?(.&+5#,$N(&'?4'E.#A':;+,,"#$%&'()(#?(+(?.;:#-'5(/&$+,#-+.#/,(E1+&.(/2(3,&/,(0/&4<

?1/@#,$(@1'&'(3*P(#?(:/E+.'5(#,(&':+.#/,(./($+?(+,5(4/@'&(>+&C'.#,$N(+,5(.1'(4#4':#,'?<

3*P(%?'?(+(/,'L./L>+,;(.&+5#,$(>/5':N(@1'&'(3,&/,(.+C'?(/,'(?#5'(/2('A'&;

.&+,?+E.#/,(.+C#,$(4:+E'(/,(3*P<((3*P(5#22'&?(2&/>(.&+5#.#/,+:('FE1+,$'?(:#C'(.1'(9BJ3

Enron Corp.

• U.S. corporation involved with creating energy markets- 7th largest by revenue• EnronOnline: e-trading business- natural gas- electric power

• Investigations- U.S. Federal Energy Regulatory Commission (FERC)• energy market manipulation• involved energy traders- U.S. Securities and Exchange Commission (SEC)• accounting fraud• insider trading

Page 18: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Enron Email Data

• FERC collected email of ~150 employees as evidence- Included emails saved in inbox, sent items, deleted

items, and all other folders

• Released to the public in 2002 by FERC as part of their investigation- To/from, date, subject, body- Attachments and some names/emails removed- Approx. 500,000 email messages

Page 19: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Smaller Enron Data Set

N D 99 F M A M J J A S O N D 00 F M A M J J A S O N D 01 F M A M J J A S O N D 02 F M A M J0

500

1000

1500

2000

2500

3000

3500

Month

Messages

Figure 1: Number of emails per month in the Enron email graph.

biasing from prolific emailers. Other weightings are possibleas well.

An obvious difficulty in dealing with the Enron corpusis the lack of information regarding the former employees.Without access to a corporate directory or organizationalchart at Enron at the time of these emails, it is difficult toascertain the validity of our results and assess the perfor-mance of the DEDICOM model. Other researchers usingthe Enron corpus have had this same problem, and informa-tion on the participants has been collected and slowly madeavailable.

The Priebe data set [32] provided partial information onthe 184 employees of the small Enron network, which ap-pears to be based largely on information collected by Shettyand Adibi [36]. It provides most employees’ position andbusiness unit. To facilitate a better analysis of the DEDI-COM results, we collected extra information on the partic-ipants from the email messages themselves. We searchedfor corroborating information of the preexisting data or fornew identification information, such as title, business unit,or manager to help analyze our results. We also collectedsome relevant information posted on the FERC website [9].

5. EXPERIMENTAL RESULTSIn this section we summarize our findings of applying two-

way and three-way DEDICOM on the Enron email network.Our algorithms were written in MATLAB, using sparse ex-tensions of the Tensor Toolbox [2].

Table 1 shows the A and R matrices for a single decompo-sition (p = 3) of the two-way DEDICOM model. The largeadjacency matrix X, showing nonsymmetric relations amongemployees at Enron, related by flows of email, is condensedinto a smaller matrix R giving the same kind of asymmetricrelations but among “types” or abstract idealized individ-uals. In this case, the relations among elements in R areexchanges of email. The latent components are patterns ofthe same kind of flow as among the surface objects, justabstracted into a “higher level” summary of patterns.

DEDICOM does not actually identify clusters, except inspecial circumstances when such clusters happen to exist inthe data as we are partially seeing in the Enron data. Thecomponents or patterns of asymmetric relationships that itidentifies have loadings in A that are continuously-valued,like factor loadings, rather than discrete cluster membershipassignments.

Here, DEDICOM describes the employees by the differentlatent dimensions. The first factor (a1) describes an execu-

tive role that fits many of the top executives. The secondfactor (a2) describes a legal role, and the third factor (a3)describes a pipeline employee.

The R matrices show that most of the communication isamong employees that share the same role, as evidenced bythe large diagonal values in R. We do see some asymmetriccommunication. The entries in the lower triangular por-tion are typically larger than the corresponding transposeentry in the upper triangular. This suggests that slightlymore communication “flows up” the management chain than“down.”

As a point of reference, we compute the singular valuedecomposition X = UΣV T . Table 1 shows the first threecolumns of the left singular vectors (U matrix) and rightsingular vectors (V matrix). Because X is nearly symmetric,the left and right singular vectors are nearly the same. Anydifferences between U and V indicate whether the person ismore likely to send mail (U) or receive mail (V).

The SVD solution is somewhat similar to the DEDICOMmodel. Many of the same people are identified and weightedsimilarly by DEDICOM and SVD. However, there are manymore negative entries in SVD than in DEDICOM. The DEDI-COM model also provides directional information betweenthe latent groups in the R matrix that the SVD does notshow.

Table 2 shows the A and R matrices for three instances(p = 2, 3, 4) of the three-way DEDICOM model. The 2-dimensional solution groups the employees largely from thelegal department and those executives dealing with govern-ment and regulatory affairs. The 3-dimensional solutionadds a another role of top executives, and the 4-dimensionalsolution includes those from the pipeline business in a fourthrole.

The aggregate communication patterns over the 44 monthsamong these 2-4 groups is summarized in the R matrix. Inthe 2-dimensional solution we see that most of the com-munication is within each group as evidenced by the largediagonal elements and small off-diagonal elements. The 3-dimensional solution shows some communication betweenthe government/regulatory affairs people and other seniorVP’s (dimensions 2 and 3, respectively). However, the com-munication is substantially asymmetric in that the r2,3 ele-ment is larger than r3,2. This indicates that the VP’s weremostly recipients of messages while the government/regulatoryaffairs employees were senders. With the addition of thepipeline employees in the 4-dimensional solution, we see thatthey interact almost exclusively with themselves due to the

Email communications at Enron (1998-2002)

34,427 emails among 184 employees over 44 months

• Limited information on the 184 employees

• No org chart

We used a smaller data set prepared by Priebe et al.

Page 20: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

DEDICOM Experiment

• Aggregate communications- Sparse matrix of size 184 x 184 (3007 nonzeros)

• Time series of communication graphs - Sparse tensor of size 184 x 184 x 44 (9838 nonzeros)

• Weighted adjacency matrix - scaling: x number of messages scaled by log(x)+1- other common choices give similar results

• Models:- SVD- 2-way DEDICOM- 3-way DEDICOM

Page 21: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Social Network Analysis

Communication graph among employees over

all times

patterns

• Description of employees by their roles• Aggregate communication patterns among roles

DEDICOM

Adjacencymatrix

role

s

Page 22: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

DEDICOM Results

LegalExecutives

Pipeline

patternsro

les

Some employees have dual rolesPattern of communications in R matrix

Lega

lEx

ecs

Pipe

line

DEDICOM SVD (left) SVD (right)Solution Solution Solution

Employee 1 2 3 1 2 3 1 2 3

J. Lavorato - CEO, Enron America 0.41 0.07 0.04 0.30 -0.07 -0.21 0.31 -0.09 -0.07L. Kitchen - President, Enron Online 0.26 0.21 0.04 0.31 0.07 -0.05 0.29 0.02 0.04M. Grigsby - Director, West Desk Gas Trading 0.22 -0.01 -0.01 0.16 -0.09 -0.33 0.14 -0.06 -0.20D. Delainey - CEO, ENA and Enron Energy Services 0.20 0.06 0.06 0.20 -0.05 -0.00 0.20 -0.05 0.03G. Whalley - President, 0.17 0.05 0.04 0.08 -0.02 -0.02 0.24 -0.07 0.02L. Taylor - Executive Assistant to Greg Whalley, 0.17 0.06 0.03 0.24 -0.05 -0.08 0.09 -0.01 -0.02T. Jones - Employee, Financial Trading Group (ENA Legal) -0.12 0.38 -0.02 0.17 0.36 0.13 0.10 0.24 0.10M. Taylor - Manager, Financial Trading Group ENA Legal -0.10 0.35 -0.01 0.13 0.27 0.13 0.13 0.26 0.12S. Shackleton - Employee, ENA Legal -0.13 0.31 -0.02 0.08 0.26 0.10 0.08 0.26 0.10S. Panus - Senior Legal Specialist, ENA Legal -0.11 0.26 -0.02 0.09 0.27 0.10 0.05 0.20 0.08M. Heard - Senior Legal Specialist, ENA Legal -0.10 0.24 -0.02 0.06 0.20 0.09 0.08 0.22 0.09E. Sager - VP and Asst Legal Counsel, ENA Legal -0.01 0.24 0.02 0.12 0.13 0.10 0.15 0.21 0.12S. Corman - VP, Regulatory Affairs -0.04 -0.01 0.33 0.08 -0.18 0.22 0.07 -0.18 0.21K. Watson - Employee, Transwestern Pipeline Company (ETS) -0.08 -0.03 0.32 0.03 -0.16 0.19 0.04 -0.18 0.22L. Donoho - Employee, Transwestern Pipeline Company (ETS) -0.08 -0.03 0.30 0.03 -0.16 0.18 0.03 -0.17 0.20D. Fossum - VP, Transwestern Pipeline Company (ETS)? -0.06 -0.00 0.30 0.07 -0.18 0.23 0.05 -0.13 0.16M. Lokay - Admin. Asst., Transwestern Pipeline Company (ETS) -0.07 -0.02 0.28 0.03 -0.14 0.17 0.04 -0.17 0.20K. Hyatt - Director, Asset Development TW Pipeline Co. (ETS) -0.06 -0.02 0.25 0.03 -0.13 0.17 0.04 -0.14 0.17R. Hayslett - VP, Also CFO and Treasurer -0.04 -0.01 0.23 0.04 -0.13 0.16 0.05 -0.14 0.16

R matrix / singular values 70.3 11.6 6.7 86.3 86.315.4 68.2 5.0 54.1 54.19.9 6.7 59.5 52.6 52.6

Figure 1:

1

Identify shared characteristics to

label group

159.911

6.7

5

6.7

Execs

Pipe-

lineLegal

Legal

Executives

Pipeline employees

Page 23: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Social Network Analysis

Communication graph among employees over

all times

• “Hubs” and “authorities” for different roles

Adjacencymatrix

Hubs Authorities

UkVk

ΣkT

SVD

Page 24: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

DEDICOM & SVD ResultsLe

gal

Exec

sPi

pelin

e

LegalExecutives

Pipeline

patternsro

les

SVD: Hubs and Authorities in U and VRoles more difficult to identify in singular vectors

UkVk

ΣkT

DEDICOM SVD (left) SVD (right)Solution Solution Solution

Employee 1 2 3 1 2 3 1 2 3

J. Lavorato - CEO, Enron America 0.41 0.07 0.04 0.30 -0.07 -0.21 0.31 -0.09 -0.07L. Kitchen - President, Enron Online 0.26 0.21 0.04 0.31 0.07 -0.05 0.29 0.02 0.04M. Grigsby - Director, West Desk Gas Trading 0.22 -0.01 -0.01 0.16 -0.09 -0.33 0.14 -0.06 -0.20D. Delainey - CEO, ENA and Enron Energy Services 0.20 0.06 0.06 0.20 -0.05 -0.00 0.20 -0.05 0.03G. Whalley - President, 0.17 0.05 0.04 0.08 -0.02 -0.02 0.24 -0.07 0.02L. Taylor - Executive Assistant to Greg Whalley, 0.17 0.06 0.03 0.24 -0.05 -0.08 0.09 -0.01 -0.02T. Jones - Employee, Financial Trading Group (ENA Legal) -0.12 0.38 -0.02 0.17 0.36 0.13 0.10 0.24 0.10M. Taylor - Manager, Financial Trading Group ENA Legal -0.10 0.35 -0.01 0.13 0.27 0.13 0.13 0.26 0.12S. Shackleton - Employee, ENA Legal -0.13 0.31 -0.02 0.08 0.26 0.10 0.08 0.26 0.10S. Panus - Senior Legal Specialist, ENA Legal -0.11 0.26 -0.02 0.09 0.27 0.10 0.05 0.20 0.08M. Heard - Senior Legal Specialist, ENA Legal -0.10 0.24 -0.02 0.06 0.20 0.09 0.08 0.22 0.09E. Sager - VP and Asst Legal Counsel, ENA Legal -0.01 0.24 0.02 0.12 0.13 0.10 0.15 0.21 0.12S. Corman - VP, Regulatory Affairs -0.04 -0.01 0.33 0.08 -0.18 0.22 0.07 -0.18 0.21K. Watson - Employee, Transwestern Pipeline Company (ETS) -0.08 -0.03 0.32 0.03 -0.16 0.19 0.04 -0.18 0.22L. Donoho - Employee, Transwestern Pipeline Company (ETS) -0.08 -0.03 0.30 0.03 -0.16 0.18 0.03 -0.17 0.20D. Fossum - VP, Transwestern Pipeline Company (ETS)? -0.06 -0.00 0.30 0.07 -0.18 0.23 0.05 -0.13 0.16M. Lokay - Admin. Asst., Transwestern Pipeline Company (ETS) -0.07 -0.02 0.28 0.03 -0.14 0.17 0.04 -0.17 0.20K. Hyatt - Director, Asset Development TW Pipeline Co. (ETS) -0.06 -0.02 0.25 0.03 -0.13 0.17 0.04 -0.14 0.17R. Hayslett - VP, Also CFO and Treasurer -0.04 -0.01 0.23 0.04 -0.13 0.16 0.05 -0.14 0.16

R matrix / singular values 70.3 11.6 6.7 86.3 86.315.4 68.2 5.0 54.1 54.19.9 6.7 59.5 52.6 52.6

Figure 1:

1

No patterns of communication

U (hubs) V (authorities)

Page 25: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Temporal Social Network Analysis

April

March

January

February

Time series of communication graphs

among employees

Adjacencytensor

role

stim

e patterns

• Unique description of employees by their roles• Aggregate communication patterns among roles• Behavior over time

3-way DEDICOM

Page 26: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Roles of Employees

2-Dimensional 3-Dimensional 4-DimensionalSolution Solution Solution

Employee 1 2 1 2 3 1 2 3 4

T. Jones - Employee, Financial Trading Group (ENA Legal) 0.64 -0.02 0.64 -0.02 0.01 0.64 -0.01 0.02 -0.00S. Shackleton - Employee, ENA Legal 0.45 -0.02 0.45 -0.01 -0.02 0.45 -0.00 -0.01 -0.00M. Taylor - Manager, Financial Trading Group ENA Legal 0.38 0.00 0.37 -0.01 0.01 0.37 0.01 0.02 -0.00S. Bailey - Legal Assistant, ENA Legal 0.26 -0.01 0.26 -0.01 -0.01 0.26 -0.00 -0.01 -0.00S. Panus - Senior Legal Specialist, ENA Legal 0.26 -0.01 0.26 -0.01 -0.01 0.26 -0.00 -0.00 -0.00M. Heard - Senior Legal Specialist, ENA Legal 0.23 -0.01 0.23 -0.01 0.00 0.23 -0.00 0.00 -0.00J. Hodge - Asst General Counsel, ENA Legal 0.13 0.03 0.13 0.03 0.00 0.13 0.03 0.01 -0.00L. Kitchen - President, Enron Online 0.10 0.08 0.11 -0.13 0.53 0.11 -0.09 0.53 0.00S. Dickson - Employee, ENA Legal 0.09 -0.00 0.09 -0.00 0.00 0.09 -0.00 0.00 -0.00E. Sager - VP and Asst Legal Counsel, ENA Legal 0.08 0.04 0.08 0.01 0.06 0.08 0.02 0.07 -0.00J. Dasovich - Employee, Government Relationship Executive -0.01 0.58 -0.02 0.57 0.04 -0.01 0.58 0.06 0.01J. Steffes - VP, Government Affairs -0.00 0.49 -0.01 0.52 -0.08 0.00 0.53 -0.06 -0.01R. Shapiro - VP, Regulatory Affairs -0.01 0.43 -0.01 0.39 0.09 -0.00 0.40 0.10 -0.00S. Kean - VP, Chief of Staff -0.01 0.35 -0.01 0.37 -0.05 -0.00 0.37 -0.04 -0.00R. Sanders - VP, Enron Wholesale Services 0.03 0.16 0.03 0.16 -0.01 0.03 0.16 -0.01 -0.00D. Delainey - CEO, ENA and Enron Energy Services 0.01 0.12 0.01 0.08 0.08 0.01 0.09 0.09 -0.00S. Corman - VP, Regulatory Affairs -0.00 0.08 -0.00 0.08 -0.01 -0.00 0.08 -0.00 0.20M. Carson - Employee, Corporate and Environmental Policy -0.00 0.07 -0.00 0.09 -0.02 -0.00 0.08 -0.02 -0.00S. Scott - Employee, Transwestern Pipeline Company (ETS) -0.00 0.08 -0.00 0.08 -0.00 -0.00 0.08 -0.00 0.04J. Lavorato - CEO, Enron America 0.02 0.12 0.02 -0.08 0.49 0.02 -0.04 0.49 0.00M. Grigsby - Director, West Desk Gas Trading 0.00 0.04 0.00 -0.04 0.20 0.00 -0.03 0.20 -0.00G. Whalley - President, 0.01 0.06 0.01 -0.03 0.19 0.01 -0.01 0.19 0.00J. Steffes - VP, Government Affairs 0.00 0.04 0.00 -0.04 0.19 0.00 -0.02 0.18 0.00K. Presto - VP, East Power Trading 0.01 0.01 0.01 -0.06 0.19 0.01 -0.05 0.18 0.00S. Beck - COO, 0.01 0.02 0.01 -0.05 0.17 0.01 -0.03 0.17 0.00B. Tycholiz - VP, Marketing 0.01 0.04 0.01 -0.03 0.17 0.01 -0.02 0.16 0.00J. Arnold - VP, Financial Enron Online 0.03 0.02 0.03 -0.05 0.16 0.03 -0.04 0.16 -0.00J. Williamson - Executive Assistant, 0.00 0.02 0.00 -0.03 0.14 0.00 -0.02 0.14 0.01K. Watson - Employee, Transwestern Pipeline Company (ETS) -0.00 0.01 -0.00 0.00 0.01 -0.00 -0.00 0.01 0.59M. Lokay - Admin. Asst., Transwestern Pipeline Company (ETS) -0.00 0.01 -0.00 0.01 0.01 -0.00 0.01 0.01 0.42L. Donoho - Employee, Transwestern Pipeline Company (ETS) -0.00 0.01 -0.00 0.01 0.01 -0.00 0.01 0.01 0.35M. McConnell - Employee, Transwestern Pipeline Company (ETS) 0.00 0.00 0.00 -0.00 0.01 0.00 -0.00 0.01 0.26L. Blair - Employee, Northern Natural Gas Pipeline (ETS) -0.00 0.01 -0.00 0.01 0.00 -0.00 0.00 0.00 0.22K. Hyatt - Director, Asset Development TW Pipeline Business (ETS) -0.00 0.02 -0.00 0.02 0.00 -0.00 0.01 0.00 0.20D. Schoolcraft - Employee, Gas Control (ETS) -0.00 0.00 -0.00 0.00 0.00 -0.00 0.00 0.00 0.18T. Geaccone - Manager, (ETS) 0.00 0.00 0.00 -0.00 0.01 0.00 -0.00 0.01 0.17R. Hayslett - VP, Also CFO and Treasurer 0.00 0.01 0.00 -0.00 0.02 0.00 -0.00 0.02 0.16

R matrix 438.3 12.1 440.3 18.6 -0.9 440.2 1.6 -15.0 0.415.3 291.9 19.7 292.5 168.4 1.6 278.3 135.4 1.6

-17.0 104.1 216.4 -29.3 70.7 201.6 -6.21.4 -4.6 -7.5 172.3

Table 2: Three-way DEDICOM results on the Enron email graph for three different decompositions, p = 2, 3, 4.The top 10 entries from all reported columns of A are listed in the table. Entries exceeding a threshold of0.06 are highlighted.

DtRDt

October 2000 22.2 0.1 -0.5 0.00.1 19.0 4.7 0.1-0.9 2.5 3.6 -0.10.0 -0.2 -0.1 3.5

October 2001 14.5 0.0 -0.9 0.00.0 4.1 5.5 0.1-1.8 2.9 22.5 -0.70.1 -0.2 -0.8 19.1

Table 3: DtRDt matrices showing communicationpatterns for October, 2000 and October, 2001.

that it identifies some people who were pretty much purelyof a certain type and other people who had mixed charac-teristics. For example, a given person might “load” on bothan executive and a lawyer component or aspect, and thusshow email exchanges resembling each of these two roles tosome extent.

The entries in matrix R describe the communication pat-terns between groups of the same and different type. They

show how a particular person’s combination of roles or at-tributes influences the pattern of messages he/she exchangeswith particular other employees given the other employee’sroles or attributes. The R matrix is asymmetric and of-fers an idealized version of a directed graph involving thecomponents identified in A.

In addition, three-way DEDICOM shows the associatedcommunication patterns over time in the tensor D. Thescales in each Dt show the strength of participation of aparticular group for time period t.

In the present study, we investigated a semantic graphwith edges labeled by time. As an alternative to time, wepoint out that our semantic graph could have incorporateddifferent types of communication media (e.g., email, phone,and mail communications) instead of time in the third mode.Then an analysis with three-way DEDICOM would repre-sent information about the vertices across all forms of com-munication (appropriately scaled by slices of D) in the Aand R matrices.

Furthermore, DEDICOM is not limited to the analysis ofsociometric and intercommunication data; DEDICOM may

2-Dimensional 3-Dimensional 4-DimensionalSolution Solution Solution

Employee 1 2 1 2 3 1 2 3 4

T. Jones - Employee, Financial Trading Group (ENA Legal) 0.64 -0.02 0.64 -0.02 0.01 0.64 -0.01 0.02 -0.00S. Shackleton - Employee, ENA Legal 0.45 -0.02 0.45 -0.01 -0.02 0.45 -0.00 -0.01 -0.00M. Taylor - Manager, Financial Trading Group ENA Legal 0.38 0.00 0.37 -0.01 0.01 0.37 0.01 0.02 -0.00S. Bailey - Legal Assistant, ENA Legal 0.26 -0.01 0.26 -0.01 -0.01 0.26 -0.00 -0.01 -0.00S. Panus - Senior Legal Specialist, ENA Legal 0.26 -0.01 0.26 -0.01 -0.01 0.26 -0.00 -0.00 -0.00M. Heard - Senior Legal Specialist, ENA Legal 0.23 -0.01 0.23 -0.01 0.00 0.23 -0.00 0.00 -0.00J. Hodge - Asst General Counsel, ENA Legal 0.13 0.03 0.13 0.03 0.00 0.13 0.03 0.01 -0.00L. Kitchen - President, Enron Online 0.10 0.08 0.11 -0.13 0.53 0.11 -0.09 0.53 0.00S. Dickson - Employee, ENA Legal 0.09 -0.00 0.09 -0.00 0.00 0.09 -0.00 0.00 -0.00E. Sager - VP and Asst Legal Counsel, ENA Legal 0.08 0.04 0.08 0.01 0.06 0.08 0.02 0.07 -0.00J. Dasovich - Employee, Government Relationship Executive -0.01 0.58 -0.02 0.57 0.04 -0.01 0.58 0.06 0.01J. Steffes - VP, Government Affairs -0.00 0.49 -0.01 0.52 -0.08 0.00 0.53 -0.06 -0.01R. Shapiro - VP, Regulatory Affairs -0.01 0.43 -0.01 0.39 0.09 -0.00 0.40 0.10 -0.00S. Kean - VP, Chief of Staff -0.01 0.35 -0.01 0.37 -0.05 -0.00 0.37 -0.04 -0.00R. Sanders - VP, Enron Wholesale Services 0.03 0.16 0.03 0.16 -0.01 0.03 0.16 -0.01 -0.00D. Delainey - CEO, ENA and Enron Energy Services 0.01 0.12 0.01 0.08 0.08 0.01 0.09 0.09 -0.00S. Corman - VP, Regulatory Affairs -0.00 0.08 -0.00 0.08 -0.01 -0.00 0.08 -0.00 0.20M. Carson - Employee, Corporate and Environmental Policy -0.00 0.07 -0.00 0.09 -0.02 -0.00 0.08 -0.02 -0.00S. Scott - Employee, Transwestern Pipeline Company (ETS) -0.00 0.08 -0.00 0.08 -0.00 -0.00 0.08 -0.00 0.04J. Lavorato - CEO, Enron America 0.02 0.12 0.02 -0.08 0.49 0.02 -0.04 0.49 0.00M. Grigsby - Director, West Desk Gas Trading 0.00 0.04 0.00 -0.04 0.20 0.00 -0.03 0.20 -0.00G. Whalley - President, 0.01 0.06 0.01 -0.03 0.19 0.01 -0.01 0.19 0.00J. Steffes - VP, Government Affairs 0.00 0.04 0.00 -0.04 0.19 0.00 -0.02 0.18 0.00K. Presto - VP, East Power Trading 0.01 0.01 0.01 -0.06 0.19 0.01 -0.05 0.18 0.00S. Beck - COO, 0.01 0.02 0.01 -0.05 0.17 0.01 -0.03 0.17 0.00B. Tycholiz - VP, Marketing 0.01 0.04 0.01 -0.03 0.17 0.01 -0.02 0.16 0.00J. Arnold - VP, Financial Enron Online 0.03 0.02 0.03 -0.05 0.16 0.03 -0.04 0.16 -0.00J. Williamson - Executive Assistant, 0.00 0.02 0.00 -0.03 0.14 0.00 -0.02 0.14 0.01K. Watson - Employee, Transwestern Pipeline Company (ETS) -0.00 0.01 -0.00 0.00 0.01 -0.00 -0.00 0.01 0.59M. Lokay - Admin. Asst., Transwestern Pipeline Company (ETS) -0.00 0.01 -0.00 0.01 0.01 -0.00 0.01 0.01 0.42L. Donoho - Employee, Transwestern Pipeline Company (ETS) -0.00 0.01 -0.00 0.01 0.01 -0.00 0.01 0.01 0.35M. McConnell - Employee, Transwestern Pipeline Company (ETS) 0.00 0.00 0.00 -0.00 0.01 0.00 -0.00 0.01 0.26L. Blair - Employee, Northern Natural Gas Pipeline (ETS) -0.00 0.01 -0.00 0.01 0.00 -0.00 0.00 0.00 0.22K. Hyatt - Director, Asset Development TW Pipeline Business (ETS) -0.00 0.02 -0.00 0.02 0.00 -0.00 0.01 0.00 0.20D. Schoolcraft - Employee, Gas Control (ETS) -0.00 0.00 -0.00 0.00 0.00 -0.00 0.00 0.00 0.18T. Geaccone - Manager, (ETS) 0.00 0.00 0.00 -0.00 0.01 0.00 -0.00 0.01 0.17R. Hayslett - VP, Also CFO and Treasurer 0.00 0.01 0.00 -0.00 0.02 0.00 -0.00 0.02 0.16

R matrix 438.3 12.1 440.3 18.6 -0.9 440.2 1.6 -15.0 0.415.3 291.9 19.7 292.5 168.4 1.6 278.3 135.4 1.6

-17.0 104.1 216.4 -29.3 70.7 201.6 -6.21.4 -4.6 -7.5 172.3

Table 2: Three-way DEDICOM results on the Enron email graph for three different decompositions, p = 2, 3, 4.The top 10 entries from all reported columns of A are listed in the table. Entries exceeding a threshold of0.06 are highlighted.

DtRDt

October 2000 22.2 0.1 -0.5 0.00.1 19.0 4.7 0.1-0.9 2.5 3.6 -0.10.0 -0.2 -0.1 3.5

October 2001 14.5 0.0 -0.9 0.00.0 4.1 5.5 0.1-1.8 2.9 22.5 -0.70.1 -0.2 -0.8 19.1

Table 3: DtRDt matrices showing communicationpatterns for October, 2000 and October, 2001.

that it identifies some people who were pretty much purelyof a certain type and other people who had mixed charac-teristics. For example, a given person might “load” on bothan executive and a lawyer component or aspect, and thusshow email exchanges resembling each of these two roles tosome extent.

The entries in matrix R describe the communication pat-terns between groups of the same and different type. They

show how a particular person’s combination of roles or at-tributes influences the pattern of messages he/she exchangeswith particular other employees given the other employee’sroles or attributes. The R matrix is asymmetric and of-fers an idealized version of a directed graph involving thecomponents identified in A.

In addition, three-way DEDICOM shows the associatedcommunication patterns over time in the tensor D. Thescales in each Dt show the strength of participation of aparticular group for time period t.

In the present study, we investigated a semantic graphwith edges labeled by time. As an alternative to time, wepoint out that our semantic graph could have incorporateddifferent types of communication media (e.g., email, phone,and mail communications) instead of time in the third mode.Then an analysis with three-way DEDICOM would repre-sent information about the vertices across all forms of com-munication (appropriately scaled by slices of D) in the Aand R matrices.

Furthermore, DEDICOM is not limited to the analysis ofsociometric and intercommunication data; DEDICOM may

Legal

Gov’t affairs

Execs - trading

Pipeline employees

role

stim

e patterns

Identify shared characteristics to

label group

LegalGov’t a

ffairs

Trade execs

Pipeline

Page 27: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Communication Patternsro

les

time patterns

LegalGov’t a

ffairs

Trade execs

PipelineLegal

Trade

execs

Gov't

affairs

Pipe-

line

2-Dimensional 3-Dimensional 4-DimensionalSolution Solution Solution

Employee 1 2 1 2 3 1 2 3 4

T. Jones - Employee, Financial Trading Group (ENA Legal) 0.64 -0.02 0.64 -0.02 0.01 0.64 -0.01 0.02 -0.00S. Shackleton - Employee, ENA Legal 0.45 -0.02 0.45 -0.01 -0.02 0.45 -0.00 -0.01 -0.00M. Taylor - Manager, Financial Trading Group ENA Legal 0.38 0.00 0.37 -0.01 0.01 0.37 0.01 0.02 -0.00S. Bailey - Legal Assistant, ENA Legal 0.26 -0.01 0.26 -0.01 -0.01 0.26 -0.00 -0.01 -0.00S. Panus - Senior Legal Specialist, ENA Legal 0.26 -0.01 0.26 -0.01 -0.01 0.26 -0.00 -0.00 -0.00M. Heard - Senior Legal Specialist, ENA Legal 0.23 -0.01 0.23 -0.01 0.00 0.23 -0.00 0.00 -0.00J. Hodge - Asst General Counsel, ENA Legal 0.13 0.03 0.13 0.03 0.00 0.13 0.03 0.01 -0.00L. Kitchen - President, Enron Online 0.10 0.08 0.11 -0.13 0.53 0.11 -0.09 0.53 0.00S. Dickson - Employee, ENA Legal 0.09 -0.00 0.09 -0.00 0.00 0.09 -0.00 0.00 -0.00E. Sager - VP and Asst Legal Counsel, ENA Legal 0.08 0.04 0.08 0.01 0.06 0.08 0.02 0.07 -0.00J. Dasovich - Employee, Government Relationship Executive -0.01 0.58 -0.02 0.57 0.04 -0.01 0.58 0.06 0.01J. Steffes - VP, Government Affairs -0.00 0.49 -0.01 0.52 -0.08 0.00 0.53 -0.06 -0.01R. Shapiro - VP, Regulatory Affairs -0.01 0.43 -0.01 0.39 0.09 -0.00 0.40 0.10 -0.00S. Kean - VP, Chief of Staff -0.01 0.35 -0.01 0.37 -0.05 -0.00 0.37 -0.04 -0.00R. Sanders - VP, Enron Wholesale Services 0.03 0.16 0.03 0.16 -0.01 0.03 0.16 -0.01 -0.00D. Delainey - CEO, ENA and Enron Energy Services 0.01 0.12 0.01 0.08 0.08 0.01 0.09 0.09 -0.00S. Corman - VP, Regulatory Affairs -0.00 0.08 -0.00 0.08 -0.01 -0.00 0.08 -0.00 0.20M. Carson - Employee, Corporate and Environmental Policy -0.00 0.07 -0.00 0.09 -0.02 -0.00 0.08 -0.02 -0.00S. Scott - Employee, Transwestern Pipeline Company (ETS) -0.00 0.08 -0.00 0.08 -0.00 -0.00 0.08 -0.00 0.04J. Lavorato - CEO, Enron America 0.02 0.12 0.02 -0.08 0.49 0.02 -0.04 0.49 0.00M. Grigsby - Director, West Desk Gas Trading 0.00 0.04 0.00 -0.04 0.20 0.00 -0.03 0.20 -0.00G. Whalley - President, 0.01 0.06 0.01 -0.03 0.19 0.01 -0.01 0.19 0.00J. Steffes - VP, Government Affairs 0.00 0.04 0.00 -0.04 0.19 0.00 -0.02 0.18 0.00K. Presto - VP, East Power Trading 0.01 0.01 0.01 -0.06 0.19 0.01 -0.05 0.18 0.00S. Beck - COO, 0.01 0.02 0.01 -0.05 0.17 0.01 -0.03 0.17 0.00B. Tycholiz - VP, Marketing 0.01 0.04 0.01 -0.03 0.17 0.01 -0.02 0.16 0.00J. Arnold - VP, Financial Enron Online 0.03 0.02 0.03 -0.05 0.16 0.03 -0.04 0.16 -0.00J. Williamson - Executive Assistant, 0.00 0.02 0.00 -0.03 0.14 0.00 -0.02 0.14 0.01K. Watson - Employee, Transwestern Pipeline Company (ETS) -0.00 0.01 -0.00 0.00 0.01 -0.00 -0.00 0.01 0.59M. Lokay - Admin. Asst., Transwestern Pipeline Company (ETS) -0.00 0.01 -0.00 0.01 0.01 -0.00 0.01 0.01 0.42L. Donoho - Employee, Transwestern Pipeline Company (ETS) -0.00 0.01 -0.00 0.01 0.01 -0.00 0.01 0.01 0.35M. McConnell - Employee, Transwestern Pipeline Company (ETS) 0.00 0.00 0.00 -0.00 0.01 0.00 -0.00 0.01 0.26L. Blair - Employee, Northern Natural Gas Pipeline (ETS) -0.00 0.01 -0.00 0.01 0.00 -0.00 0.00 0.00 0.22K. Hyatt - Director, Asset Development TW Pipeline Business (ETS) -0.00 0.02 -0.00 0.02 0.00 -0.00 0.01 0.00 0.20D. Schoolcraft - Employee, Gas Control (ETS) -0.00 0.00 -0.00 0.00 0.00 -0.00 0.00 0.00 0.18T. Geaccone - Manager, (ETS) 0.00 0.00 0.00 -0.00 0.01 0.00 -0.00 0.01 0.17R. Hayslett - VP, Also CFO and Treasurer 0.00 0.01 0.00 -0.00 0.02 0.00 -0.00 0.02 0.16

R matrix 438.3 12.1 440.3 18.6 -0.9 440.2 1.6 -15.0 0.415.3 291.9 19.7 292.5 168.4 1.6 278.3 135.4 1.6

-17.0 104.1 216.4 -29.3 70.7 201.6 -6.21.4 -4.6 -7.5 172.3

Table 2: Three-way DEDICOM results on the Enron email graph for three different decompositions, p = 2, 3, 4.The top 10 entries from all reported columns of A are listed in the table. Entries exceeding a threshold of0.06 are highlighted.

DtRDt

October 2000 22.2 0.1 -0.5 0.00.1 19.0 4.7 0.1-0.9 2.5 3.6 -0.10.0 -0.2 -0.1 3.5

October 2001 14.5 0.0 -0.9 0.00.0 4.1 5.5 0.1-1.8 2.9 22.5 -0.70.1 -0.2 -0.8 19.1

Table 3: DtRDt matrices showing communicationpatterns for October, 2000 and October, 2001.

that it identifies some people who were pretty much purelyof a certain type and other people who had mixed charac-teristics. For example, a given person might “load” on bothan executive and a lawyer component or aspect, and thusshow email exchanges resembling each of these two roles tosome extent.

The entries in matrix R describe the communication pat-terns between groups of the same and different type. They

show how a particular person’s combination of roles or at-tributes influences the pattern of messages he/she exchangeswith particular other employees given the other employee’sroles or attributes. The R matrix is asymmetric and of-fers an idealized version of a directed graph involving thecomponents identified in A.

In addition, three-way DEDICOM shows the associatedcommunication patterns over time in the tensor D. Thescales in each Dt show the strength of participation of aparticular group for time period t.

In the present study, we investigated a semantic graphwith edges labeled by time. As an alternative to time, wepoint out that our semantic graph could have incorporateddifferent types of communication media (e.g., email, phone,and mail communications) instead of time in the third mode.Then an analysis with three-way DEDICOM would repre-sent information about the vertices across all forms of com-munication (appropriately scaled by slices of D) in the Aand R matrices.

Furthermore, DEDICOM is not limited to the analysis ofsociometric and intercommunication data; DEDICOM may

LegalGovernment & regulatory affairs

Trade executivesPipeline employees

135.470.7

• Mostly communication within roles• Some large exchanges• Negative values complicates interpretation- Non-negative factorization being investigated

Page 28: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Temporal Patterns

N D 99 F M A M J J A S O N D 00 F M A M J J A S O N D 01 F M A M J J A S O N D 02 F M A M J0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Month

No

rma

lize

d s

ca

le

Group 1

Group 2

Group 3

Group 4

Figure 2: Scales in D indicate the strength of participation of each group’s communication over time.

derive useful information from any directed graph. New pos-sibilities include analyzing a network of web traffic betweenservers over time or perhaps a web/citation graph, whereedges convey authority among vertices. A third mode enterswhen the 2-way data are categorized by time, demographic,click number, or some other feature of the data.

Finally, we suggest a few extensions to the DEDICOMmodel and its application in data mining that we intend topursue. First, constrained DEDICOM [23] is an extensionof DEDICOM that has been suggested in the 90’s and pur-sued more recently. The idea is to put constraints on theA factors themselves so that the columns of A lie in a pre-scribed column space. For example, in the email graph, onemight want to impose a constraint on the first column ofA so that it contains only the top executives. Many othervariations are possible. This procedure allows for includingdomain knowledge or incorporating human understandinginto the problem. Kiers and Takane [23] offered an algorithmfor handling different subspace constraints on A. More re-cently, Rocci [33] proposed a new algorithm for fitting anyconstrained DEDICOM model.

Second, a nonnegative factorization of DEDICOM, whereA and/or R are nonnegative, would preserve the non-negativityof the data, which could be desirable in some domains andapplications.

Finally, DEDICOM has been applied to skew-symmetricdata [17] and has yielded some benefits. There might beways to apply this technique to semantic graphs as well.

7. REFERENCES[1] E. Acar, S. A. Camtepe, M. S. Krishnamoorthy, and

B. Yener. Modeling and multiway analysis ofchatroom tensors. In Intelligence and SecurityInformatics: IEEE Intl. Conf. on Intelligence andSecurity Informatics, ISI 2005, volume 3495 of LectureNotes in Computer Science, pages 256–268. SpringerVerlag, 2005.

[2] B. W. Bader and T. G. Kolda. MATLAB tensorclasses for fast algorithm prototyping. TechnicalReport SAND2004-5187, Sandia NationalLaboratories, Albquerque, NM 87185 and Livermore,CA 94550, Oct. 2004. Submitted to ACM Trans.Math. Software.

[3] M. W. Berry and M. Browne. Email surveillance usingnonnegative matrix factorization. In Workshop on

Link Analysis, Counterterrorism and Security, SIAMConf. on Data Mining, Newport Beach, CA, 2005.

[4] J. D. Carroll and J. J. Chang. Analysis of individualdifferences in multidimensional scaling via an N-waygeneralization of ‘Eckart-Young’ decomposition.Psychometrika, 35:283–319, 1970.

[5] A. Chapanond, M. S. Krishnamoorthy, and B. Yener.Graph theoretic and spectral analysis of Enron emaildata. In Workshop on Link Analysis,Counterterrorism and Security, SIAM Conf. on DataMining, Newport Beach, CA, 2005.

[6] W. W. Cohen. Enron email dataset. Webpage.http://www.cs.cmu.edu/∼enron/.

[7] J. E. Dennis, Jr. and R. B. Schnabel. NumericalMethods for Unconstrained Optimization andNonlinear Equations. Prentice-Hall, Englewood Cliffs,NJ, 1983.

[8] J. Diesner and K. M. Carley. Exploration ofcommunication networks from the Enron emailcorpus. In Workshop on Link Analysis,Counterterrorism and Security, SIAM Conf. on DataMining, Newport Beach, CA, 2005.

[9] Federal Energy Regulatory Commision. Ferc:Information released in Enron investigation.http://www.ferc.gov/industries/electric/indus-act/wec/enron/info-release.asp.

[10] C. W. Harris and H. F. Kaiser. Oblique factor analyticsolutions by orthogonal transformations.Psychometrika, 29(4):347–362, 1964.

[11] R. A. Harshman. Foundations of the PARAFACprocedure: models and conditions for an“explanatory” multi-modal factor analysis. UCLAworking papers in phonetics, 16:1–84, 1970.

[12] R. A. Harshman. Models for analysis of asymmetricalrelationships among n objects or stimuli. In FirstJoint Meeting of the Psychometric Society and theSociety for Mathematical Psychology, McMasterUniversity, Hamilton, Ontario, August 1978.http://publish.uwo.ca/∼harshman/asym1978.pdf.

[13] R. A. Harshman. Alternating least squares estimationfor the single domain DEDICOM model, 1981.Unpublished technical memorandum, BellLaboratories, Murray Hill, NJhttp://publish.uwo.ca/∼harshman/asym1981.pdf.

[14] R. A. Harshman. DEDICOM: A family of models

Enron crisis breaks; investigation begins

Communication patterns over time

role

s

time patterns

LegalGovernment & regulatory affairsTrade executivesPipeline employee

Filed for bankruptcy

Page 29: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

Summary

• Improvements to DEDICOM- New procedure for finding A- Newton step for finding D

• Modifications to handle large data arrays- Compression

• Novel approach to social network analysis using DEDICOM- Roles of employees- Communication patterns among roles and over time

• Future research- Nonnegative DEDICOM- Constrained DEDICOM- PARAFAC

Page 30: Analysis of Latent Relationships in Semantic Graphs using ...lekheng/meetings/mmds/...Common Graph Analysis Technique Uk Vk Adjacency matrix Σk T Best rank-k matrix filters out noise

More Information

• DEDICOM paper on Social Network Analysis:- Tech report SAND2006-2161 available

• MATLAB Tensor Toolbox:- http://csmr.ca.sandia.gov/~tgkolda/TensorToolbox- Tech report SAND2004-5189 available on website- Paper to appear in ACM Trans. Math. Softw.- sparse_tensor class to be released soon

http://www.cs.sandia.gov/~bwbader/[email protected]