16
The plista Dataset ACM RecSys 2013, Hong Kong Authors: Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias Speaker: Brodt, Torben International News Recommender Systems Workshop and Challenge October 13th, 2013

Paper the plista dataset

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Paper  the plista dataset

The plista Dataset

ACM RecSys 2013, Hong Kong

Authors:Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias

Speaker: Brodt, Torben

International News RecommenderSystems Workshop and Challenge

October 13th, 2013

Page 2: Paper  the plista dataset

Introduction and Motivation

● Context: News Article Recommendation

Page 3: Paper  the plista dataset

Introduction and Motivation

● Do we need another recommendation data set?we have

● What features are those data sets missing?● What requirements entail news articles for

recommendation?

...

Page 4: Paper  the plista dataset

Introduction and Motivation

● Features that had not been available in existing data sets:○ contextual features: device, operating system,

browser, etc.○ cross-domain features: 13 different news providers

included○ different interaction types: interactions with

recommendations (clicks), as well as news items (impressions)

○ content features: headline, URL, images, text snippets, etc.

Page 5: Paper  the plista dataset

Introduction and Motivation

● Additional requirements for recommending news articles○ real-time → recommendations must be provided within a

short time interval (< 200ms)○ changing relevancy → items’ relevancy decreases with

time○ dynamics → new news items are being continuously

added● Requirements inherent to existing recommender systems:

○ sparsity → users typically read only few news articles○ cold start → systems refrain from requesting users to

create profiles; this results in a majority of small user profiles

Page 6: Paper  the plista dataset

{ // json

"type": "impression",

"context": {

"simple": {

"27": 418, // publisher

"14": 31721, // widget

...

},

"lists": {

"10": [100, 101] // channel

}

...

}api specs hosted at http://orp.plista.com

Dataset characteristics

Page 7: Paper  the plista dataset

Dataset characteristics

● object types○ impressions → users reading news articles○ clicks → users clicking recommendations○ creates → news articles being created○ updates → news articles being updated

api specs hosted at http://orp.plista.com

Page 8: Paper  the plista dataset

Dataset usage

Page 9: Paper  the plista dataset

Dataset usage● Evaluation based on

Click-Through-Rate (CTR)

● ~ 84 million impressions

● ~ 1 million clicks

Page 10: Paper  the plista dataset

Dataset usage

● evaluation cross-news portal recommenders

● 10 - 36 % user overlap in between different news portals

Page 11: Paper  the plista dataset

Dataset usage

● news portal comparisons● do we observe similar user

behaviour on news portals offering similar content?

Page 12: Paper  the plista dataset

Dataset usage

● evaluating contextual recommendation algorithms

● sensitive to○ weekday○ hour of day○ ...

Page 13: Paper  the plista dataset

Dataset usage

When using the data set you may consider…● … we identify users by session IDs

○ individual users may have several IDs○ users sharing their device might be mapped to one ID

● … interactions (clicks, impressions) and content dynamics (creates, updates) differ between news portals

● … contents are restricted to German● … preferences are represented on a binary scale (user

read article, user clicked recommendation)● … clicking on recommendations might not reveal the

actual relevancy of an item

Page 14: Paper  the plista dataset

Conclusions

● we introduce a new data set intended to support recommender systems research

● we outlined novel features which existing data sets lacked

● we presented scenarios which can be evaluated using the data set

● we pointed to critical aspects which ought to be considered when working with the data set

Page 15: Paper  the plista dataset

Summary

● news articles○ of ~13 publishers

● transactional data○ Impressions○ Clicks

● contextual data○ of ~50 attributes

● cross domain application

Page 16: Paper  the plista dataset

The plista Dataset@inproceedings{Kille:2013,

title = {The plista Dataset},author = {

Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias

},booktitle = {

NRS'13: Proceedings of the International Workshop and Challenge on News Recommender Systems

},year = {2013},month = {10},location = {Hong Kong, China},publisher = {ACM},pages={14--21}

}