47
Mendeley: Recommendation Systems for Academic Literature Kris Jack, PhD Data Mining Team Lead

Mendeley: Recommendation Systems for Academic Literature

Embed Size (px)

DESCRIPTION

I gave this talk to an MSc class about Semantic Technologies at the Technical University of Graz (TUG) on 2012/01/12. It presents what recommendation systems are and how they are often used before delving into how they are used at Mendeley. Real-world results from Mendeley’s article recommendation system are also presented. The work presented here has been partially funded by the European Commission as part of the TEAM IAPP project (grant no. 251514) within the FP7 People Programme (Marie Curie).

Citation preview

Page 1: Mendeley: Recommendation Systems for Academic Literature

Mendeley:Recommendation

Systems for AcademicLiterature

Kris Jack, PhDData Mining Team Lead

Page 2: Mendeley: Recommendation Systems for Academic Literature

“All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...].

But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“

Page 3: Mendeley: Recommendation Systems for Academic Literature

➔ what's a recommender and what does it look like?

➔ what's Mendeley?

➔ the secrets behind recommenders

➔ recommenders @ Mendeley

Overview

Page 4: Mendeley: Recommendation Systems for Academic Literature

What's a recommender and

what does it look like?

Page 5: Mendeley: Recommendation Systems for Academic Literature

Definition:

A recommendation system (recommender) is a subclass of information filtering system that aims to predict a user's interest in items.

What's a recommender?

Page 6: Mendeley: Recommendation Systems for Academic Literature

Recommendation Systems in the Wild

Page 7: Mendeley: Recommendation Systems for Academic Literature

Recommendation Vs. Search

➔ search is a pull strategyvs.

➔ recommendation is a push strategy

Page 8: Mendeley: Recommendation Systems for Academic Literature

Recommendation Vs. Search

search is like following a path...

Page 9: Mendeley: Recommendation Systems for Academic Literature

Recommendation Vs. Search

recommendation is like being on a roller coaster...

A differentsense ofcontrol

Page 10: Mendeley: Recommendation Systems for Academic Literature

What's Mendeley?

Page 11: Mendeley: Recommendation Systems for Academic Literature

...a large data technology startup company

...and it's on a mission to change the way that

research is done!

What is Mendeley?

Page 12: Mendeley: Recommendation Systems for Academic Literature

works like this:

1) Install “Audioscrobbler”

2) Listen to music

3) Last.fm builds your music profile and recommends you music you also could like... and it’s the world‘s biggest open music database

Last.fmMendeley

Page 13: Mendeley: Recommendation Systems for Academic Literature

research libraries

researchers

papers

disciplines

music libraries

artists

songs

genres

Last.fmMendeley

Page 14: Mendeley: Recommendation Systems for Academic Literature

...organise their research

Mendeley provides tools to help users...

...organise their research

Page 15: Mendeley: Recommendation Systems for Academic Literature

...organise their research

...collaborate with one another

Mendeley provides tools to help users...

...organise their research

Page 16: Mendeley: Recommendation Systems for Academic Literature

Tools of scientific discovery

Clean energyClean water

Sustainable food supplies

Pandemic diseases

Terrorist violence

Climate change

US National Academy of Engineering “Grand Challenges”:

Artificial Intelligence

Page 17: Mendeley: Recommendation Systems for Academic Literature

...organise their research

...collaborate with one another

...discover new research

Mendeley provides tools to help users...

...organise their research

Page 18: Mendeley: Recommendation Systems for Academic Literature
Page 19: Mendeley: Recommendation Systems for Academic Literature

...organise their research

...collaborate with one another

...discover new research

Mendeley provides tools to help users...

...organise their research

Page 20: Mendeley: Recommendation Systems for Academic Literature

1.4 million+ users; the 20 largest userbases:

University of CambridgeStanford University

MITUniversity of Michigan

Harvard UniversityUniversity of OxfordSao Paulo University

Imperial College LondonUniversity of Edinburgh

Cornell UniversityUniversity of California at Berkeley

RWTH AachenColumbia University

Georgia TechUniversity of Wisconsin

UC San DiegoUniversity of California at LA

University of FloridaUniversity of North Carolina

Page 21: Mendeley: Recommendation Systems for Academic Literature

Real-time data on 28m unique papers:

Thomson Reuters’ Web of Knowledge(dating from 1934)

Mendeley after 16 months:

50m

Page 22: Mendeley: Recommendation Systems for Academic Literature

Q1/2: How can a tool generate recommendations?

Q2/2: How can you measure the tool's performance?

The secrets behind recommenders

Page 23: Mendeley: Recommendation Systems for Academic Literature

Q1/2: How can a tool generate recommendations?

Content-based Filtering Collaborative Filtering

Find items with similar characteristics (e.g. title, discipline) to what the user previously liked

Find items that users who are similar to you also liked (wisdom of the crowds)

TF-IDF, BM25, Bayesian classifiers, decision trees, artificial neural networks

User-based and item-based variations, matrix factorisation

Quickly absorbs new items (ovecomes cold start problem)

No need to understand item characteristics

Can make good recommendations from very few examples

Tends to give more novel recommendations

Hybrid tools too...

Page 24: Mendeley: Recommendation Systems for Academic Literature

Q2/2: How can you measure the tool's performance?

➔ Cross validation with hold outs➔ get yourself a good ground truth➔ hide a fraction of your data from the system➔ try to predict the hidden fraction from the

remaining data➔ calculate precision and recall

➔ Let users decide➔ set up evaluations with real users (experimental)➔ track tool usage by users

Page 25: Mendeley: Recommendation Systems for Academic Literature

2) Personalised Recommendations● given a user's profile (e.g. interests)● find new articles of interest to them

1) Related Research● given 1 research article● find other related articles

Recommenders@ Mendeley

Page 26: Mendeley: Recommendation Systems for Academic Literature
Page 27: Mendeley: Recommendation Systems for Academic Literature

Use Case 1: Related Research

Strategy

content-based approach (tf-idf with lucene implementation)search for articles with same metadata (e.g. title, tags)

Evaluation

cross-validation with hold outs on a ground truth data set

Page 28: Mendeley: Recommendation Systems for Academic Literature
Page 29: Mendeley: Recommendation Systems for Academic Literature

Use Case 1: Related Research

Q2/2 What are our results?

tag abstract mesh-term title general-keyword author keyword0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

tf-idf Precision per Field when Field is Available

metadata field

Pre

cisi

on

@ 5

Results 1) tags are the most informative field for finding related research

Page 30: Mendeley: Recommendation Systems for Academic Literature

Use Case 1: Related Research

tag bestCombo abstract mesh-term title general-keyword author keyword0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

tf-idf Precision for Field Combos when Field is Available

metadata field(s)

pre

cisi

on

@ 5

abstract+author+general-keyword+tag+title

Results 2) tags outperform combinations of fields

Page 31: Mendeley: Recommendation Systems for Academic Literature

How does Mendeley use recommendation

technologies?

Personalised Recommendations

2/2

2) Personalised Recommendations● given a user's profile (e.g. interests)● find new articles of interest to them

Page 32: Mendeley: Recommendation Systems for Academic Literature
Page 33: Mendeley: Recommendation Systems for Academic Literature

Use Case 2: Perso Recommendations

Strategy

collaborative filtering (item-based with apache mahout)recommend articles to researchers that would interest them

Evaluation

cross-validation with hold outs on a ground truth data set

Page 34: Mendeley: Recommendation Systems for Academic Literature
Page 35: Mendeley: Recommendation Systems for Academic Literature

Use Case 2: Perso Recommendations

Strategy

collaborative filtering (item-based with apache mahout)recommend articles to researchers that would interest them

Evaluation

cross-validation with hold outs on a ground truth data set

Page 36: Mendeley: Recommendation Systems for Academic Literature

Output:Recommend 10 articles to each user

Input:User libraries

Page 37: Mendeley: Recommendation Systems for Academic Literature

16 months ago

Test:10-fold cross validation50,000 user libraries

Results:<0.025 precision at 10

Page 38: Mendeley: Recommendation Systems for Academic Literature

Test:10-fold cross validation50,000 user libraries

10 months ago (i.e. + 6 months)

Results:~0.1 precision at 10

Page 39: Mendeley: Recommendation Systems for Academic Literature

Test:Release to a subset of users

10 months ago (i.e. + 6 months)

Results:~0.4 precision at 10

Page 40: Mendeley: Recommendation Systems for Academic Literature

Article Recommendation Acceptance RatesA

ccep

tan

ce r

ate

(i.e

. acc

ept/

reje

ct c

l ick

s)

Number of months live

Page 41: Mendeley: Recommendation Systems for Academic Literature

Pre

cis i

on a

t 10

art

icle

s

Number of articles in user library

Precision by Library Size

Page 42: Mendeley: Recommendation Systems for Academic Literature

Test:10-fold cross validation50,000 user libraries

So, results comparable to non-distributed recommender

Completely distributed, so can easily run on EC2 within 24 hours...

Page 43: Mendeley: Recommendation Systems for Academic Literature
Page 44: Mendeley: Recommendation Systems for Academic Literature

➔ Recommendations can be complementary to search

➔ They can help users to discover interesting items

➔ They can exploit item metadata (content-based)

➔ They can exploit the 'wisdom of the crowds' (CF)

SummaryConclusions

Page 45: Mendeley: Recommendation Systems for Academic Literature

➔ Crowd-sourced metadata can have a poweful informative value (e.g. article tags)

➔ Sometimes you need to let data grow

➔ Evaluations under lab conditions don't always predict real world results well

➔ Recommenders don't just have to be about making money … remember where we started...?

SummaryConclusions

Page 46: Mendeley: Recommendation Systems for Academic Literature

“All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...].

But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“

Page 47: Mendeley: Recommendation Systems for Academic Literature

www.mendeley.com