Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations

Preview:

Citation preview

Leveraging Wikipedia-based Features for Entity Relatedness

and Recommendations

Nitish AggarwalSupervised by Dr. Paul Buitelaar

PhD Viva

Brad Pitt

Motivation

2

MotivationSemantic Web

Technologies:1. RDF2. SPARQL3. Ontology4. Linked data5. Turtle (syntax)

Entity Recommendation

Companies:1. Metaweb2. Ontoprise GmbH3. OpenLink Software4. Ontotext5. Powerset (company)

MyosinProteins and cells:1. Actin2. Muscle contraction3. Sarcomere4. Myofibril5. Cytoskeleton

Biologists:1. Hugh Huxley2. James Spudich3. Ronald Vale4. Manuel Morales5. Brunó Ferenc Straub

3

Determine the degree of relatedness between two entities

Brad Pitt Tom Cruise

?

Entity Relatedness

4

Person, location, organization

Time, date, money, percent

Event, movie, disease, symptom, side effect, law, license and more

Background

Entity• Many such types are covered

in Wikipedia

• More than 2K classes in DBpedia

• More than 350k classes in Yago

• Every Wikipedia article is considered about an entity

5

Motor vehicle

Car Motorcycle

Automobile

AutoCar seat

Car windows

s

h h

m m

Background

Relatedness

Synonyms Similar

Related

Subs

titut

abili

ty

6

Outline• Motivation• Entity Relatedness

• Distributional Semantics for Entity Relatedness (DiSER)• Evaluation

• Entity Recommendation• Wikipedia-based Features for Entity Recommendation (WiFER)• Evaluation

• Text Relatedness• Non-Orthogonal Explicit Semantic Analysis (NESA)• Evaluation

• Application and Industry Use Cases• Conclusion

7

Wikipedia Features for Entity Recommendation

(WiFER)Feature

Extraction

Thesis Overview

Distributional Semantic for Entity Relatedness

(DiSER)

DistributionalRepresentatio

n

Non-Orthogonal Explicit Semantic Analysis

(NESA)

Chapter V

Chapter IV

Chapter VI

8

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

Thesis Overview

Wikipedia Features for Entity Recommendation

(WiFER)Feature

Extraction

Distributional Semantic for Entity Relatedness

(DiSER)

DistributionalRepresentatio

n

Non-Orthogonal Explicit Semantic Analysis

(NESA)

Chapter IV

9

Entity Relatedness

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

10

Entity Relatedness: State of the Art• Graph-based methods

• Path distance in Wikipedia graph (Strube and Ponzetto, 2006)• Normalized Google Distance on Wikipedia graph (Witten and Milne,

2008)• Personalized pagerank on Wikipedia graph (Agirre et. al, 2015)• Path-based measures on DBpedia graph (Hulpus et. al, 2015)

• Corpus-based methods• Key-phrase Overlap for Related Entities (KORE): partial overlaps

between key-phrases in corresponding Wikipedia articles (Hoffart et. al, 2012)

• Text relatedness measures: use colocation information in text

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

11

Explicit Semantic Analysis (ESA)Uses explicit (manually defined) concepts like Wikipedia articles where every article is considered describing a single concept (Gabrilovich and

Markovitch, 2007)

Entity Relatedness: State of the ArtDistributional Semantics

word1 W11 W12 W13 W14 …....... W1n

word2 W21 W22 W23 W24 …....... W2n

word3 W31 W32 W33 W34 …........ W3n

wordm Wm1 Wm2 Wm3 Wm4 …... Wmn

...

doc1 doc2 doc3 doc4 ….... docn

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

12

word1 W11 W12 W13 W14 …....... W1n

word2 W21 W22 W23 W24 …....... W2n

word3 W31 W32 W33 W34 …........ W3n

wordm Wm1 Wm2 Wm3 Wm4 …... Wmn

...

Entity Relatedness: State of the ArtDistributional Semantics

doc1 doc2 doc3 doc4 ….... docn

Implicit/Latent Semantic Analysis (LSA)Transforms sparse document space into a dense latent topic space

Latent Dirichlet Allocation (LDA)(Blei et al., 2003)

Latent Semantic Analysis (LSA)(Deerwester et al., 1990)

Neural Embeddings(Word2Vec)(Mikolov et al., 2013)

n ~ 1M

word1 W11 W12 ……..... W1k

word2 W21 W22 ……..... W2k

wordm Wm1 Wm2 ……..... Wmk

...

topic1 topic2 … topick

Dimensionality

Reduction

k < 1000

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

13

Limitation of Text Relatedness Measures

• Compositionality • Most of the entities are multiword expressions• Vector(Brad Pitt) = Vector(Brad) + Vector(Pitt) ?

• Ambiguity • Vector of an entity with ambiguous name like “Nice” (French city)

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

14

Chapter IV

Distributional Semantics for Entity Relatedness (DiSER)

entity1 W11 W12 W13 W14 …....... W1n

entity2 W21 W22 W23 W24 …....... W2n

entity3 W31 W32 W33 W34 …........ W3n

entityn Wn1 Wn2 Wn3 Wn4 …... Wnn

...

doc1 doc2 doc3 doc4 ….... docn

Wikipedia-based Distributional Semantics for Entity Relatedness In: AAAI-FSS-2014

[Steve Jobs] co-founded Apple in 1976 to sell Wozniak’s [Apple I] [Personal Computer]. [Steve Jobs | Jobs] was CEO of [Apple Inc. | Apple] and largest shareholder of [Pixar]. Jobs is widely recognized as a pioneer of the [Microcomputer Revolution], along [Steve Wozniak | Wozniak].

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

Annotated Wikipedia with entities

One sense per document

Wikipedia entities[Steve Jobs] [Apple Inc.| Apple] [Steve Wozniak | Wozniak]’ [Apple I] [Personal Computer]. [Steve Jobs | Jobs] was CEO of [Apple Inc. | Apple] and largest shareholdef [Pixar]. [Steve Jobs | Jobs] is widely recognizpioneer of the [Microcomputer Revolution], along [Steve Wozniak | Wozniak].

15

The Tree of Life (film)Falmouth, CornwallWorld War Z (film)What Just HappenedA Mighty Heart (film)Plan B EntertainmentJamaican PatoisRichard: A NovelSobriquetI Want a Famous Face

Brad Pitt (DiSER)Damiani (jewelry

company)University of Pittsburgh

BandBrad PittMake It Right FoundationPittsburgh men’s basketballBrangelinaPittsburgh Panthers baseballPitt (Comics)Pitt RiverBrad Pitt filmography

Brad Pitt (ESA)

Wikipedia-based Distributional Semantics for Entity Relatedness In: AAAI-FSS-2014

ESA vs DiSER Vector

Chapter IV

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

16

Entity Relatedness: Evaluation

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

17

• Absolute relatedness score• Relatedness between “Apple Inc.” and “Steve Jobs”• Very low inter-annotator agreement

• Relative relatedness score• Is “Steve Jobs” more related with “Apple Inc.” than “Bill Gates”• High inter-annotator agreement

• KORE (Hoffart et al., 2012)• 21 seed entities• Every entity has list of 20 entities with their relatedness score• 420 entity pairs in total

Entity Relatedness: DatasetMotivation Entity

RelatednessEntity

RecommendationText Relatedness Application Conclusion

18

ApproachesSpearman

Rank Correlation

Graph-based measures

Path-DBpedia (Hulpus et al., 2015) 0.610WLM (Witten and Milne, 2008) 0.659PPR (Agirre et al., 2015) 0.662

Corpus-based measures

Word2Vec (Mikolov et al., 2013) 0.181GloVe (Pennington et al., 2014) 0.194LSA (Landauer et al., 1998) 0.375KORE (Hoffart et al., 2012) 0.679ESA (Gabrilovich and Markovitch, 2007)

0.691

DiSER 0.781

Wikipedia-based Distributional Semantics for Entity Relatedness In: AAAI-FSS-2014

Results: KORE DatasetMotivation Entity

RelatednessEntity

RecommendationText Relatedness Application Conclusion

19

DiSER Vector for non-Wikipedia Entities

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

20

BBC: http://www.bbc.com/news/world-europe-22204377

Article about Savita

Context-DiSER

Noun phrase extraction: StanfordNLP

Entity linking: Prior probability

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

21

AbortionAbortion-

rights movement

The Irish Times

United States pro-life

movement

Vincent Browne

Michael D.

Higgins

Context-DiSER

Irish abortion lawDeath of SavitaGalway University HospitalMiscarriageCatholic Country…….

Savita Halappanavar

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

22

Approaches Spearman Rank Correlation

KORE (state of the art) 0.679Context-ESA 0.684Context-DiSER (Manual linking)

0.769

Context-DiSER (Automatic linking)

0.719

Wikipedia-based Distributional Semantics for Entity Relatedness In: AAAI-FSS-2014

Context-DiSER: Results on KORE Dataset

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

23

Thesis Overview

Wikipedia Features for Entity Recommendation

(WiFER)Feature

Extraction

Distributional Semantic for Entity Relatedness

(DiSER)

DistributionalRepresentatio

n

Non-Orthogonal Explicit Semantic Analysis

(NESA)

Chapter V

Chapter IV

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

24

Entity Recommendation

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

25

• Classical Recommendation Systems• Focus on personalized recommendation• Require user-item preferences

• Entity Recommendation in Web Search (Blanco et al., 2013)• Co-occurrence features: query logs, query session, Flickr tags,

tweets• Graph-based features: shared connections in Yahoo knowledge

graph and others domain specific knowledge bases• Entity and Relation type in Knowledge graph• More than 100 features• Combines features using learning to rank

Entity Recommendation: State of the Art

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

26

Features:

Prior Probability of Entity1

Prior Probability of Entity2

Joint ProbabilityConditional ProbabilityReverse Conditional ProbabilityCosine SimilarityPointwise Mutual InformationDistributional Semantic Model

Learning to Rank

Leveraging Wikipedia Knowledge for Entity Recommendations In: ISWC 2015

Wikipedia-based Features for Entity Recommendation (WiFER)

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

27

Prior Probability of Entity1Prior Probability of Entity2Joint ProbabilityConditional ProbabilityCosine Similarity Pointwise Mutual InformationReverse Conditional ProbabilityDistributional Semantic Model (ESA)

Wikipedia Text Wikipedia EntitiesPrior Probability of Entity1Prior Probability of Entity2Joint ProbabilityConditional ProbabilityCosine Similarity Pointwise Mutual InformationReverse Conditional ProbabilityDistributional Semantic Model (DiSER)

Wikipedia-based Features for Entity Recommendation (WiFER)

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

28

• Learning to Rank• Gradient Boosted Decision Trees (GBDT) (Li Hang, 2011)• It builds the model in a stage-wise fashion

• Dataset: Entity recommendation in web search• 4,797 web search queries (entities)• Every entity query has a list of entity candidates (47,623 entity-

pairs)• All candidates are tagged on 5 label scales: Excellent, Prefer, Good,

Fair, and Bad

Combining Features

Type Total instances Percentage

Location 22,062 46.32People 21,626 45.41Movies 3,031 6.36

TV Shows 280 0.58Album 563 1.18Total 47,623 100

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

29

• Evaluation• Normalized discounted cumulative gain (NDCG@10) • 10 fold cross validation

Features All Person LocationSpark

(Blanco et al., 2013)

0.9276 0.9479 0.8882

WiFER 0.9173 0.9431 0.8795Spark+WiFE

R0.9325 0.9505 0.8987

Insights into Entity Recommendation in Web SearchIn: IESD at ISWC, 2015

Entity Recommendation: Results Motivation Entity

RelatednessEntity

RecommendationText Relatedness Application Conclusion

30

Insights into Entity Recommendation in Web SearchIn: IESD at ISWC, 2015

Entity Recommendation: Feature Analysis in Spark+WiFER

Relation type

Cosine similarity over Flickr tagsProbability of target entity over Wikipedia text corpusCF7 over Flickr tagsDSM over Wikipedia entities corpus (DiSER)Conditional user probability over query termsDSM over Wikipedia text corpus (ESA)Probability of source entity over Wikipedia entities corpusProbability of target entity over Flickr tagsProbability of target entity over Wikipedia entities corpus

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

31

Thesis OverviewMotivation Entity

RelatednessEntity

RecommendationText Relatedness Application Conclusion

Wikipedia Features for Entity Recommendation

(WiFER)Feature

Extraction

Distributional Semantic for Entity Relatedness

(DiSER)

DistributionalRepresentatio

n

Non-Orthogonal Explicit Semantic Analysis

(NESA)

Chapter V

Chapter IV

Chapter VI

32

Text Relatedness:Non-Orthogonal Explicit Semantic Analysis (NESA)

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

33

ESA assumes that related words share highly weighted concepts in their distributional vector

Chapter VIImproving ESA with Document SimilarityIn: ECIR-2013

“soccer”History of Soccer in the United

StatesSoccer in the United States

United States Soccer Federation

North American Soccer League

United Soccer Leagues

“football”

FIFA

FootballHistory of association

footballFootball in England

Association football

ESA(football, soccer) = 0.0

Orthogonality in ESAMotivation Entity

RelatednessEntity

RecommendationText Relatedness Application Conclusion

34

Chapter VIImproving ESA with Document SimilarityIn: ECIR-2013

“soccer”History of Soccer in the United

StatesSoccer in the United States

United States Soccer Federation

North American Soccer League

United Soccer Leagues

“football”

FIFA

FootballHistory of association

footballFootball in England

Association football

NESA(football, soccer) = (FIFA x Soccer in the United States + FIFA x United Soccer Leagues ….) = 0.38

Non-Orthogonal Explicit Semantic Analysis (NESA)

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

35

• ESA: v1 and v2 are the n-dimensional vectors for words w1 and w2

• relESA (w1, w2) = v1T . v2

• NESA: Correlation between vector dimensions

• relNESA (w1,w2) = v1T . C . v2

• C(n,n) = ET . E

• Dimension correlation methods• DiSER scores between corresponding Wikipedia article

Non-Orthogonal Explicit Semantic Analysis (NESA)

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

36

• WN353• 353 word pairs annotated by 13-15 experts on a scale of 1-10.

• RG65• 65 word pairs annotated by 51 experts on scale of 0-4

• MC30• 30 word pairs annotated by 38 experts on scale of 0-1

• MT287• 287 word pairs annotated by 10-12 experts on scale of 0-1

Word Relatedness DatasetsMotivation Entity

RelatednessEntity

RecommendationText Relatedness Application Conclusion

37

Non-Orthogonal Explicit Semantic AnalysisIn: *SEM-2015 Chapter VI

WN353 MC30 RG65 MT287

LSA 0.579 0.667 0.616 0.555

LSA (Wiki) 0.538 0.744 0.697 0.353

Word2Vec 0.663 0.824 0.751 0.560

ESA 0.66 0.765 0.826 0.507

NESA 0.696 0.784 0.839 0.572

Spearman rank correlation with word similarity gold standard datasets

NESA: ResultsMotivation Entity

RelatednessEntity

RecommendationText Relatedness Application Conclusion

38

Non-Orthogonal Explicit Semantic AnalysisIn: *SEM-2015 Chapter VI

NESA: Results

• Word similarity vs relatedness (Agirre et al., 2009)• WN353Rel: 202 word pairs from WN353• WN353Sim: 252 word pairs from WN353Spearman rank correlation with word similarity vs relatedness

datasets WN353Rel

WN353Sim

LSA 0.521 0.662

LSA (Wiki) 0.506 0.559

Word2Vec 0.601 0.741

ESA 0.643 0.663

NESA 0.663 0.719

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

39

Outline• Motivation• Entity Relatedness

• Distributional Semantics for Entity Relatedness (DiSER)• Evaluation

• Entity Recommendation• Wikipedia-based Features for Entity Recommendation (WiFER)• Evaluation

• Text Relatedness• Non-Orthogonal Explicit Semantic Analysis (NESA)• Evaluation

• Application and Industry Use Cases• Conclusion

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

40

Chapter VIIhttp://enrg.insight-centre.org/

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

41

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

42

EnRG SPARQL Endpoint

National University of Ireland, Galway

Industrial Use Cases

Medical entity linking for question-answering and relationship explanation in Knowledge Graph

Entity Recommendation in Web Search

Company name disambiguation for social profiling

Motivation Entity Relatedness

Entity Recommendation

Text Relatedness Application Conclusion

43

• Entity Relatedness• Distributional Semantics for Entity Relatedness (DiSER)• Outperformed state of the art entity relatedness measures

• Entity Recommendation• Wikipedia-based Features for Entity Recommendation (WiFER)• Effective features for entity recommendation in web search

• Text Relatedness• Non-Orthogonal Explicit Semantic Analysis (NESA)• Outperformed other existing word relatedness measures

• Entity Relatedness Graph (EnRG)• Contains all Wikipedia entities and their pre-computed relatedness

scores• Contains distributional vectors for all Wikipedia entities

ConclusionMotivation Entity

RelatednessEntity

RecommendationText Relatedness Application Conclusion

45

• Relationship explanation for recommended entities• Best path in knowledge graph• Best natural language description

• Knowledge discovery• Analogy querying over knowledge graph

e.g. Google to Motorola => Microsoft to ?• Example based querying

e.g. Google to Motorola => ? to ?

Future Research Directions

46

Related Queries?

Recommended