37
Kai-Wei Chang, Scott Wen-tau Yih, Bishan Yang & Chris Meek Microsoft Research Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

Embed Size (px)

DESCRIPTION

Typed Tensor Decomposition of Knowledge Bases for Relation Extraction. Kai-Wei Chang, Scott Wen-tau Yih , Bishan Yang & Chris Meek Microsoft Research. Knowledge Base. Captures world knowledge by storing properties of millions of entities, as well as relations among them. - PowerPoint PPT Presentation

Citation preview

Page 1: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Kai-Wei Chang, Scott Wen-tau Yih, Bishan Yang & Chris MeekMicrosoft Research

Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

Page 2: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Useful resources for NLP applications• Semantic Parsing & Question Answering [e.g., Berant+,

2014] • Information Extraction [Riedel+, 2013]

Knowledge Base

FreebaseDBpedia

YAGONELL

OpenIE/ReVerb

• Captures world knowledge by storing properties of millions of entities, as well as relations among them

Page 3: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Knowledge base is never complete!• Extract previously unknown facts from new corpora• Predict new facts via inference

• Modeling multi-relational data• Statistical relational learning [Getoor & Taskar, 2007]• Path ranking methods (e.g., random walk) [e.g., Lao+ 2011]•Knowledge base embedding•Very efficient•Better prediction accuracy

Reasoning with Knowledge Base

Page 4: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Each entity in a KB is represented by an vector• Predict whether is true by • Linear: or Bilinear:

• Recent work on KB embedding•RESCAL [Nickel+, ICML-11], SME [Bordes+, AISTATS-12], NTN [Socher+, NIPS-13], TransE [Bordes+, NIPS-13]• Train on existing facts (e.g., triples)• Ignore relational domain knowledge available in the KB (e.g., ontology)

Knowledge Base Embedding

Page 5: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Example – type constraint can be true only if

• Example – common sense can be true only if

Relational Domain Knowledge

Page 6: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• KB embedding via Tensor Decomposition• Entity vector, Relation matrix

• Relational domain knowledge• Type information and constraints•Only legitimate entities are included in the loss

• Benefits of leveraging type information• Faster model training time•Highly scalable to large KB•Higher prediction accuracy

• Application to Relation Extraction

Typed Tensor Decomposition – TRESCAL

Page 7: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Introduction• KB embedding via Tensor Decomposition• Typed tensor decomposition (TRESCAL)• Experiments• Discussion & Conclusions

Road Map

Page 8: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Collection of subj-pred-obj triples –

Knowledge Base Representation (1/2)

Subject Predicate Object

Obama Born-in Hawaii

Bill Gates Nationality USA

Bill Clinton

Spouse-of Hillary Clinton

Satya Nadella

Work-at Microsoft

… … …

Page 9: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Collection of subj-pred-obj triples –

Knowledge Base Representation (1/2)

Subject Predicate Object

Obama Born-in Hawaii

Bill Gates Nationality USA

Bill Clinton

Spouse-of Hillary Clinton

Satya Nadella

Work-at Microsoft

… … …

: # entities, : # relations

Page 10: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Knowledge Base Representation (2/2)

e1 « en

e 1 «

e n χχk 𝒳𝑘

: born-in

Hawaii

Obama 1

-th slice

Page 11: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Knowledge Base Representation (2/2)

e1 « en

e 1 «

e n χχk 𝒳𝑘

: born-in

Hawaii

Obama 1

-th slice

A zero entry means either:• Incorrect (false)• Unknown

Page 12: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Objective:

Tensor Decomposition Objective

~~ × ×

𝒳𝑘 𝐀𝐀𝑇ℛ𝑘

12 (∑

𝑘‖𝒳𝑘−𝐀ℛ 𝑘𝐀

𝑇‖𝐹2 )+ 1

2 (‖𝐴‖𝐹2+∑

𝑘‖ℛ𝑘‖𝐹

2 )

RESCAL [Nickel+, ICML-11]

Reconstruction Error Regularization

-th relation

Page 13: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Measure the Degree of a Relationship

× ×

𝐀𝐀𝑇ℛborn − in

Hawaii

Obama

Page 14: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Introduction• KB embedding via Tensor Decomposition• Typed tensor decomposition (TRESCAL)•Basic idea• Training procedure•Complexity analysis• Experiments• Discussion & Conclusions

Road Map

Page 15: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Reconstruction error:

Typed Tensor Decomposition Objective

12∑𝑘 ‖𝒳𝑘−𝐀ℛ𝑘𝐀

𝑇‖𝐹2

~~ × ×

𝒳𝑘 𝐀𝐀𝑇ℛ𝑘

Page 16: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Reconstruction error:

Typed Tensor Decomposition Objective

12∑𝑘 ‖𝒳𝑘−𝐀ℛ𝑘𝐀

𝑇‖𝐹2

~~ × ×

𝒳𝑘 𝐀𝐀𝑇ℛ𝑘

Relation: born-in

Page 17: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Reconstruction error:

Typed Tensor Decomposition Objective

12∑𝑘 ‖𝒳𝑘−𝐀ℛ𝑘𝐀

𝑇‖𝐹2

~~ × ×

𝒳𝑘 𝐀𝐀𝑇ℛ𝑘

people Relation: born-in

Page 18: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Reconstruction error:

Typed Tensor Decomposition Objective

12∑𝑘 ‖𝒳𝑘−𝐀ℛ𝑘𝐀

𝑇‖𝐹2

~~ × ×

𝒳𝑘 𝐀𝐀𝑇ℛ𝑘

locations

people Relation: born-in

Page 19: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Reconstruction error:

Typed Tensor Decomposition Objective

12∑𝑘 ‖𝒳𝑘

′ −𝐀𝑘𝑙ℛ𝑘𝐀𝑘𝑟

𝑇‖𝐹2

~~ × ×

𝒳𝑘′ 𝐀𝑘𝑙 𝐀𝑘𝑟

𝑇ℛ𝑘

Page 20: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Training Procedure – Alternating Least-Squares (ALS) Method

𝐀←[∑𝑘 𝒳𝑘𝐀ℛ𝑘𝑇+𝒳𝑘

𝑇𝐀ℛ 𝑘] [∑𝑘 𝐵𝑘+𝐶𝑘+𝜆𝐈 ]−1

where .

𝐯𝐞𝐜 (ℛ𝑘 ) ← (𝐙T 𝐙+𝜆𝐈 )−1𝐙T 𝐯𝐞𝐜 (𝒳𝑘 )

where is vectorization, and is the Kronecker product.

Fix , update

Fix , update

Page 21: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Training Procedure – Alternating Least-Squares (ALS) Method

𝐀←[∑𝑘 𝒳𝑘𝐀ℛ𝑘𝑇+𝒳𝑘

𝑇𝐀ℛ 𝑘] [∑𝑘 𝐵𝑘+𝐶𝑘+𝜆𝐈 ]−1

where .

𝐯𝐞𝐜 (ℛ𝑘 ) ← (𝐙T 𝐙+𝜆𝐈 )−1𝐙T 𝐯𝐞𝐜 (𝒳𝑘 )

where is vectorization, and is the Kronecker product.

Fix , update

Page 22: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Training Procedure – Alternating Least-Squares (ALS) Method

𝐀←[∑𝑘 𝒳𝑘𝐀ℛ𝑘𝑇+𝒳𝑘

𝑇𝐀ℛ 𝑘] [∑𝑘 𝐵𝑘+𝐶𝑘+𝜆𝐈 ]−1

where .

𝐯𝐞𝐜 (ℛ𝑘 ) ← (𝐙T 𝐙+𝜆𝐈 )−1𝐙T 𝐯𝐞𝐜 (𝒳𝑘 )

where is vectorization, and is the Kronecker product.

Page 23: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Training Procedure – Alternating Least-Squares (ALS) Method

𝐀←[∑𝑘 𝒳𝑘′ 𝐀𝑘𝑟

ℛ𝑘T+𝒳𝑘

′ T𝐀𝑘𝑙ℛ𝑘] [∑𝑘 𝐵𝑘𝑟

+𝐶𝑘𝑙+𝜆𝐈 ]− 1

where .

𝐯𝐞𝐜 (ℛ𝑘 ) ← (𝐀𝑘𝑟

T 𝐀𝑘𝑟⨂𝐀𝑘𝑙

T 𝐀𝑘𝑙+𝜆𝐈 )−𝟏

× 𝐯𝐞𝐜 (𝐀𝑘𝑙

T 𝒳𝑘′ 𝐀𝑘𝑟

)

Page 24: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Without Type information (RESCAL): • : # entities• : # non-zero entries• : # dimensions of projected entity vectors

• With Type information (TRESCAL): • : average # entities satisfying the type constraint

Complexity Analysis

Page 25: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Introduction• KB embedding via Tensor Decomposition• Typed tensor decomposition (TRESCAL)• Experiments•KB Completion•Application to Relation Extraction• Discussion & Conclusions

Road Map

Page 26: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• KB – Never Ending Language Learning (NELL)• Training: version 165•Developing: new facts between v.166 and v.533• Testing: new facts between v.534 and v.745

• Data statistics of the training set

Experiments – KB Completion

# Entities 753k

# Relation Types 229

# Entity Types 300

# Entity-Relation Triples 1.8M

Page 27: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Entity Retrieval: •One positive entity with 100 negative entities• Relation Retrieval: • Positive entity pairs with equal number of negative pairs

• Baselines:

Tasks & Baselines

RESCAL[Nickel+, ICML-11]

𝑒𝑖 𝑒 𝑗

𝑟𝑘

TransE[Bordes+, NIPS-13]

Page 28: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Training Time Reduction

• Both models finish training in 10 iterations.• TRESCAL filters 96% entity triples with incompatible

types.

1

2

0 5 10 15 20 25

4.46

20.5

Model Training Time (hours)

4.6x speed-up

Page 29: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Training Time Reduction

• # iterations for TransE is set to 500 (the default value).

1

2

0 10 20 30 40 50 60 70 80 90 100

4.46

96

Model Training Time (hours)

21.5x speed-up

Page 30: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Entity Retrieval

1 2 358.0%

60.0%

62.0%

64.0%

66.0%

68.0%

70.0%

72.0%

67.56%

62.91%

69.26%

Mean Average Precision (MAP)

Page 31: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Relation Retrieval

1 2 368.0%

70.0%

72.0%

74.0%

76.0%

78.0%

70.71%

73.08%

75.70%

Mean Average Precision (MAP)

Page 32: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Experiments – Relation Extraction

Satya Nadella is the CEO of Microsoft.

(Satya Nadella , work-at, Microsoft)

Page 33: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Row: Entity Pair• Column:

Relation

Relation Extraction as Matrix Factorization[Riedel+ 13]

Fig.1 of [Riedel+ 13]

Page 34: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• Raw data: NY Times corpus & Freebase• Entities in NY Times and Freebase are aligned• Raw tensor construction• 80,698 entities & 1,652 relations• Type information from Freebase & NER• Type constraints are derived from training data

• Task – identify FB relations of entity pairs in text• 10,000 entity pairs: 2,048 have both entities in FB• Evaluation metric – Weighted mean average precision (MAP) on 19 relations

Data & Task Description

Page 35: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Relation Extraction

1 2 3 4 50.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.490.52

0.58

0.70.72

Chart Title

• Evaluated using only 2,048 FB entity pairs

[updated version]

Page 36: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Relation Extraction

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.330.36

0.39

0.47

0.57

Chart Title

• Evaluated using all 10,000 entity pairs

Page 37: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

• TRESCAL: A KB embedding model via tensor decomposition• Leverages entity type constraint•Faster model training time•Highly scalable to large KB•Higher prediction accuracy•Application to relation extraction

• Challenges & Future Work•Capture more types of relational domain knowledge • Support more sophisticated inferential tasks

Conclusions