32
Entity Recommendations Using Hierarchical Knowledge Bases Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data (KNOW@LOD ) at ESWC2015, May 31, 2015 Siva Kumar Cheekula, Pavan Kapanipathi, Derek Doran, Prateek Jain, Amit Sheth Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton - OH

Entity Recommendations Using Hierarchical Knowledge Bases

Embed Size (px)

Citation preview

Entity Recommendations Using Hierarchical Knowledge Bases

Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data (KNOW@LOD)at ESWC2015, May 31, 2015

Siva Kumar Cheekula, Pavan Kapanipathi, Derek Doran, Prateek Jain, Amit Sheth

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing

Wright State University, Dayton - OH

Entity Recommendations

Entity Recommendations

Entity Recommendations

• Content Based (Users’ Ratings)– Types• Term based • Entity based • Knowledge based/Graph based

– Provides the most personalized results• Collaborative (Other users’ ratings)• Hybrid (Both content and collaborative)– Works best

Entity Recommendations

6

• Structured representation of relationships among concepts

• Allows better understanding of the user interests

• Cold start problem is addressed well

Knowledge Base Enabled Recommendations

7

Interstellar (film)

Gravity (film)

Relationships

Directe

d by

Starring

Produced by

Distrib

uted by

Knowledge Base Enabled Recommendations

8

Interstellar (film)

Gravity (film)

Relationships

Directe

d by

Starring

Produced by

Distrib

uted by

Knowledge Base Enabled Recommendations

Too many properties to address

9

Ball Games

Football

Games

Is-a

Focus specifically on hierarchical relationships

Hierarchical Relations

10

• Human memory has been argued to be structured as a hierarchy of concepts (Semantic Network)

• Spreading activation theory has been utilized to simulate search on semantic network

• This theory has not been well explored for user interest modeling and entity recommendations

Hierarchical Relations

11

• Domain Coverage• Crowd-sourced knowledge base–Regular updates–Driven by strict guidelines

Wikipedia

12

Wikipedia Category Graph

13Parent Categories

Wikipedia Category Graph

14

Spreading Activation

Hierarchical Interest Graphs

1

2 3

3 4

Entity Un-rated by user

23

Reverse Spreading

Wikipedia Category Hierarchy

Approach

15

Hierarchical Interest Graphs

1

2 3

3 4

Entity Un-rated by user

23

Reverse Spreading

Approach

Spreading Activation

Wikipedia Category Hierarchy

16

Internet

Semantic Search Linked Data Metadata

Technology

World Wide Web

Semantic Web

Structured Information

0.8 0.2 0.6Scores for Interests

User Interests

0.7

0.5

0.4

0.3

Activation Function

17

• Wikipedia Category Structure is transformed to a hierarchy (Wikipedia Hierarchy)– 0.8 Million Categories with approx 2 Million relationships

between them – Category “Main Topic Classification” is root of the hierarchy

as it subsumes 98% of the categories– Each Category is assigned a Hierarchy level based on

shortest path from root node• Spreading Activation Theory on Wikipedia Hierarchy• Experimented with different Activation Functions

– Bell, Bell Log, Priority Intersect

Hierarchical Interest Graphs

18

Romance Films

Titanic Anna_Karenina Forever Young

Films

Romantic Drama Films

1990s romantic drama films

Structured Information

5 4 4Ratings of

Movies

Movies

4

3

3

1

19

Spreading Activation

Hierarchical Interest Graphs

1

2 3

3 4

Entity Un-rated by user

23

Reverse Spreading

Wikipedia Category Hierarchy

Approach

20

1990s Romantic

Drama Films

Romantic Drama Films

4

3

Function to rank the entities based on category scores

The Remains

of the DayJude

5 4

Ranking Unrated Entities

21

The Terminator

1984 Films

1980s Action Films

Films Directed by James Cameron

Action Horror Films

Robot Films

4

3

2

3

4

Reverse Spreading

• Sum the activation values of each parent of an entity 22

The Terminator

1984 Films

1980s Action Films

Films Directed by James Cameron

Action Horror Films

Robot Films

4

3

2

3

4

16

Summing

Sum

23

• Ranking is biased towards categories having larger number of entities

• Example:– Category “English Language Films” has good

interesting score for user – Results are topped by the entities connected to

“English Language Films” for user

Summing – Drawbacks

24

• Parent category activation value is normalized by it’s out degree• Bias towards generic categories is addressed

The Terminator

1984 Films

Footloose

3

1 1 1

Ghostbusters

Degree Based Normalization

Out Degree = 3Spread Weight =

3/3 = 1

25

1 2 3 4

Category Priority

26

• Priority of a category to an entity as edge weight

The Terminator

1984 Films

Footloose

3

1 2 1

3 1.5 3

Ghostbusters

Category Priority

27

Ai = Σ(Aj / (Oj * Pij))

Reverse Spreading Activation Function

Activation Value of the Entity i (Movie)

Activation Value of the Categories of “i”

Outdegree of Category j

Priority of Category j to Entity “i”

28

• Focus on “5” rated movies for test as per Cremonesi et al. – New approach for evaluation of recommendation systems

• Dataset - Movielens– Movies -- 3158– Users -- 1476– 0.9 m ratings

• Baseline – Ziegler et al.– Hybrid(Content & Collaborative) technique to recommend books

using Amazon book taxonomy– We compared with the content based part of Ziegler et al

Evaluation

29

Summing Summing with Degree Normalization

Summing with Degree Normalization and Priority

Decay BellLog Top 1 -- 0.01 Top 20 -- 0.15

Top 1 -- 0.01 Top 20 -- 0.18

Top 1 -- 0.07 Top 20 -- 0.21

Bell Log Intersect Top 1 -- 0.0 Top 20 -- 0.09

Top 1 -- 0.01Top 20 -- 0.20

Top 1 -- 0.07 Top 20 -- 0.24

Ziegler et al. Top 1 -- 0.0Top 20 -- 0.002

Recall @Top 1 – Top 20

30

• While normalization is necessary, we are looking for better features for normalization

• Priority works, however we find the category-entity priority added by users on Wikipedia needs a quality check

• We are looking into collaborative Hierarchical Interest Graphs

Future Work

Paper at http://knoesis.org/library/resource.php?id=2150

Thank you

ContactSiva Kumar ([email protected])Pavan Kapanipathi ([email protected]) @pavankapsAmit Sheth ([email protected]) @amit_p

32

Top N Evaluation

If: Rank(T) <= NHits++

Else Miss++

Recall = Hits/Total

User Ratings – Test Set

Pick top rated entity ‘T’

Pick 1000 random entities

Generate Rank for 1001 entities

Cremonesi et al. Top N Evaluation