Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

Kai-Wei Chang, Scott Wen-tau Yih, Bishan Yang & Chris MeekMicrosoft Research

Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

• Useful resources for NLP applications• Semantic Parsing & Question Answering [e.g., Berant+,

2014] • Information Extraction [Riedel+, 2013]

Knowledge Base

FreebaseDBpedia

YAGONELL

OpenIE/ReVerb

• Captures world knowledge by storing properties of millions of entities, as well as relations among them

• Knowledge base is never complete!• Extract previously unknown facts from new corpora• Predict new facts via inference

• Modeling multi-relational data• Statistical relational learning [Getoor & Taskar, 2007]• Path ranking methods (e.g., random walk) [e.g., Lao+ 2011]•Knowledge base embedding•Very efficient•Better prediction accuracy

Reasoning with Knowledge Base

• Each entity in a KB is represented by an vector• Predict whether is true by • Linear: or Bilinear:

• Recent work on KB embedding•RESCAL [Nickel+, ICML-11], SME [Bordes+, AISTATS-12], NTN [Socher+, NIPS-13], TransE [Bordes+, NIPS-13]• Train on existing facts (e.g., triples)• Ignore relational domain knowledge available in the KB (e.g., ontology)

Knowledge Base Embedding

• Example – type constraint can be true only if

• Example – common sense can be true only if

Relational Domain Knowledge

• KB embedding via Tensor Decomposition• Entity vector, Relation matrix

• Relational domain knowledge• Type information and constraints•Only legitimate entities are included in the loss

• Benefits of leveraging type information• Faster model training time•Highly scalable to large KB•Higher prediction accuracy

• Application to Relation Extraction

Typed Tensor Decomposition – TRESCAL

• Introduction• KB embedding via Tensor Decomposition• Typed tensor decomposition (TRESCAL)• Experiments• Discussion & Conclusions

Road Map

• Collection of subj-pred-obj triples –

Knowledge Base Representation (1/2)

Subject Predicate Object

Obama Born-in Hawaii

Bill Gates Nationality USA

Bill Clinton

Spouse-of Hillary Clinton

Satya Nadella

Work-at Microsoft

… … …

• Collection of subj-pred-obj triples –


Subject Predicate Object

Obama Born-in Hawaii

Bill Gates Nationality USA

Bill Clinton

Spouse-of Hillary Clinton

Satya Nadella

Work-at Microsoft

… … …

: # entities, : # relations


e1 « en

e 1 «

e n χχk 𝒳𝑘

: born-in

Hawaii

Obama 1

-th slice


e1 « en

e 1 «

e n χχk 𝒳𝑘

: born-in

Hawaii

Obama 1

-th slice

A zero entry means either:• Incorrect (false)• Unknown

• Objective:

Tensor Decomposition Objective

~~ × ×

𝒳𝑘 𝐀𝐀𝑇ℛ𝑘

12 (∑

𝑘‖𝒳𝑘−𝐀ℛ 𝑘𝐀

𝑇‖𝐹2 )+ 1

2 (‖𝐴‖𝐹2+∑

𝑘‖ℛ𝑘‖𝐹

2 )

RESCAL [Nickel+, ICML-11]

Reconstruction Error Regularization

-th relation

Measure the Degree of a Relationship

× ×

𝐀𝐀𝑇ℛborn − in

Hawaii

Obama

• Introduction• KB embedding via Tensor Decomposition• Typed tensor decomposition (TRESCAL)•Basic idea• Training procedure•Complexity analysis• Experiments• Discussion & Conclusions

Road Map

• Reconstruction error:

Typed Tensor Decomposition Objective

12∑𝑘 ‖𝒳𝑘−𝐀ℛ𝑘𝐀

𝑇‖𝐹2

~~ × ×





𝑇‖𝐹2

~~ × ×


Relation: born-in




𝑇‖𝐹2

~~ × ×


people Relation: born-in




𝑇‖𝐹2

~~ × ×


locations

people Relation: born-in



12∑𝑘 ‖𝒳𝑘

′ −𝐀𝑘𝑙ℛ𝑘𝐀𝑘𝑟

𝑇‖𝐹2

~~ × ×

𝒳𝑘′ 𝐀𝑘𝑙 𝐀𝑘𝑟

𝑇ℛ𝑘

Training Procedure – Alternating Least-Squares (ALS) Method

𝐀←[∑𝑘 𝒳𝑘𝐀ℛ𝑘𝑇+𝒳𝑘

𝑇𝐀ℛ 𝑘] [∑𝑘 𝐵𝑘+𝐶𝑘+𝜆𝐈 ]−1

where .

𝐯𝐞𝐜 (ℛ𝑘 ) ← (𝐙T 𝐙+𝜆𝐈 )−1𝐙T 𝐯𝐞𝐜 (𝒳𝑘 )

where is vectorization, and is the Kronecker product.

Fix , update

Fix , update




where .



Fix , update




where .




𝐀←[∑𝑘 𝒳𝑘′ 𝐀𝑘𝑟

ℛ𝑘T+𝒳𝑘

′ T𝐀𝑘𝑙ℛ𝑘] [∑𝑘 𝐵𝑘𝑟

+𝐶𝑘𝑙+𝜆𝐈 ]− 1

where .

𝐯𝐞𝐜 (ℛ𝑘 ) ← (𝐀𝑘𝑟

T 𝐀𝑘𝑟⨂𝐀𝑘𝑙

T 𝐀𝑘𝑙+𝜆𝐈 )−𝟏

× 𝐯𝐞𝐜 (𝐀𝑘𝑙

T 𝒳𝑘′ 𝐀𝑘𝑟

)

• Without Type information (RESCAL): • : # entities• : # non-zero entries• : # dimensions of projected entity vectors

• With Type information (TRESCAL): • : average # entities satisfying the type constraint

Complexity Analysis

• Introduction• KB embedding via Tensor Decomposition• Typed tensor decomposition (TRESCAL)• Experiments•KB Completion•Application to Relation Extraction• Discussion & Conclusions

Road Map

• KB – Never Ending Language Learning (NELL)• Training: version 165•Developing: new facts between v.166 and v.533• Testing: new facts between v.534 and v.745

• Data statistics of the training set

Experiments – KB Completion

# Entities 753k

# Relation Types 229

# Entity Types 300

# Entity-Relation Triples 1.8M

• Entity Retrieval: •One positive entity with 100 negative entities• Relation Retrieval: • Positive entity pairs with equal number of negative pairs

• Baselines:

Tasks & Baselines

RESCAL[Nickel+, ICML-11]

𝑒𝑖 𝑒 𝑗

𝑟𝑘

TransE[Bordes+, NIPS-13]

Training Time Reduction

• Both models finish training in 10 iterations.• TRESCAL filters 96% entity triples with incompatible

types.

1

2

0 5 10 15 20 25

4.46

20.5

Model Training Time (hours)

4.6x speed-up

Training Time Reduction

• # iterations for TransE is set to 500 (the default value).

1

2

0 10 20 30 40 50 60 70 80 90 100

4.46

96

Model Training Time (hours)

21.5x speed-up

Entity Retrieval

1 2 358.0%

60.0%

62.0%

64.0%

66.0%

68.0%

70.0%

72.0%

67.56%

62.91%

69.26%

Mean Average Precision (MAP)

Relation Retrieval

1 2 368.0%

70.0%

72.0%

74.0%

76.0%

78.0%

70.71%

73.08%

75.70%

Mean Average Precision (MAP)

Experiments – Relation Extraction

Satya Nadella is the CEO of Microsoft.

(Satya Nadella , work-at, Microsoft)

• Row: Entity Pair• Column:

Relation

Relation Extraction as Matrix Factorization[Riedel+ 13]

Fig.1 of [Riedel+ 13]

• Raw data: NY Times corpus & Freebase• Entities in NY Times and Freebase are aligned• Raw tensor construction• 80,698 entities & 1,652 relations• Type information from Freebase & NER• Type constraints are derived from training data

• Task – identify FB relations of entity pairs in text• 10,000 entity pairs: 2,048 have both entities in FB• Evaluation metric – Weighted mean average precision (MAP) on 19 relations

Data & Task Description

Relation Extraction

1 2 3 4 50.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.490.52

0.58

0.70.72

Chart Title

• Evaluated using only 2,048 FB entity pairs

[updated version]

Relation Extraction

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.330.36

0.39

0.47

0.57

Chart Title

• Evaluated using all 10,000 entity pairs

• TRESCAL: A KB embedding model via tensor decomposition• Leverages entity type constraint•Faster model training time•Highly scalable to large KB•Higher prediction accuracy•Application to relation extraction

• Challenges & Future Work•Capture more types of relational domain knowledge • Support more sophisticated inferential tasks

Conclusions

Documents

Typed Tensor Decomposition of Knowledge Bases for Relation Extraction