20
WP 3 User profiling & Recommenda6on (Part 3) BBC, Prone+cs, VUA 1 Wednesday, March 28, 12

NoTube: Recommendations (Collaborative)

Embed Size (px)

Citation preview

Page 1: NoTube: Recommendations (Collaborative)

WP  3User  profiling  &  

Recommenda6on  (Part  3)BBC,  Pro-­‐ne+cs,  VUA

1

Wednesday, March 28, 12

Page 2: NoTube: Recommendations (Collaborative)

Contents

26-27 March 2012 2NoTube 3rd Review

Overview

User profilingGeneral goal & approachFrom activity streams to profileIssuesAnalyticsBeancounter

RecommendationsGeneral goal & approachSemantic recommendationStatistical recommendationHybrid recommendation

Exploitation

Conclusions

Wednesday, March 28, 12

Page 3: NoTube: Recommendations (Collaborative)

Overview

26-27 March 2012 3

TV Program Enrichment

SemanticPattern-based

Recommendation Strategy

RDF GraphTV

Programs

Semantic ContentPatterns for

TV Programs

HybridRecommendation

Strategy

StatisticalSimilarity-based

Recommendation StrategyUser Ratings &

Demographics(BBC EPG

Data)

EPG Metadata(BBC)

Recommendation Service

SimilarityClusters

of Programs

User Data Analysis

End-UsersEnd Users

NoTube 3rd Review

Wednesday, March 28, 12

Page 4: NoTube: Recommendations (Collaborative)

Overview

26-27 March 2012 3

TV Program Enrichment

SemanticPattern-based

Recommendation Strategy

RDF GraphTV

Programs

Semantic ContentPatterns for

TV Programs

HybridRecommendation

Strategy

StatisticalSimilarity-based

Recommendation StrategyUser Ratings &

Demographics(BBC EPG

Data)

EPG Metadata(BBC)

Recommendation Service

SimilarityClusters

of Programs

User Data Analysis

End-UsersEnd Users

BEANCOUNTER

NoTube 3rd Review

Wednesday, March 28, 12

Page 5: NoTube: Recommendations (Collaborative)

Statistical recommendations

26-27 March 2012 4NoTube 3rd Review

• We had privileged access to two bulk user ratings datasets from BBC

• From these, used Apache Mahout toolkit to derive "item to item" similarity measures between each pair of items

• With larger (20k users) this worked well; with a smaller (1k) dataset, less well

• With BBC, investigating publication of these behaviour-derived similarity measures

Wednesday, March 28, 12

Page 6: NoTube: Recommendations (Collaborative)

31

Hybrid models:

factual paths and statistical similarity

(and not to mention ‘@wossy’ is on Twitter with 1 million followers...)

Wednesday, March 28, 12

Page 7: NoTube: Recommendations (Collaborative)

Statistical recommendation

26-27 March 2012 6NoTube 3rd Review

89 05 2 9

00 88 8 6

23 97 9 8

20k

12k

Wednesday, March 28, 12

Page 8: NoTube: Recommendations (Collaborative)

Statistical recommendation

26-27 March 2012 7NoTube 3rd Review

09 00 0 9

00 88 0 0

00 97 0 8

Wednesday, March 28, 12

Page 9: NoTube: Recommendations (Collaborative)

99

Wednesday, March 28, 12

Page 10: NoTube: Recommendations (Collaborative)

1010

Wednesday, March 28, 12

Page 11: NoTube: Recommendations (Collaborative)

1111

Wednesday, March 28, 12

Page 12: NoTube: Recommendations (Collaborative)

1212

Wednesday, March 28, 12

Page 13: NoTube: Recommendations (Collaborative)

TV Preference Data is very sparse

26-27 March 2012 12NoTube 3rd Review

• Even for a single service (e.g. Netflix), data is ‘overwhelmingly sparse’

• For NoTube’s open systems, challenges multiply:– often no global view, only per-user data

– many ways of identifying the same content item

– many ways of identifying the same user

– never mind other entities (actors, directors, ...)

• Q: Can we tell a story about how organizations with such privileged overviews can contribute in a privacy respecting way to the public commons of linked data? (A: yes! see WP4)

Wednesday, March 28, 12

Page 14: NoTube: Recommendations (Collaborative)

Fragmentation by site

26-27 March 2012 13NoTube 3rd Review

Wednesday, March 28, 12

Page 15: NoTube: Recommendations (Collaborative)

29

Wednesday, March 28, 12

Page 16: NoTube: Recommendations (Collaborative)

30

Wednesday, March 28, 12

Page 17: NoTube: Recommendations (Collaborative)

Statistical recommendation: Process

26-27 March 2012 16NoTube 3rd Review

• Build on best-in-class opensource code, rather than re-invent

• Big-data ready (Hadoop-based)

• Of various options, LogLikelihoodSimilarity generally gave best results (standard 'withold some ratings' evaluation strategy)

• Other explorations: including large scale (1/2 billion tweet) Twitter analysis, Spectral Clustering, using demographics, ...

Wednesday, March 28, 12

Page 18: NoTube: Recommendations (Collaborative)

Exploitation & Further Development

26-27 March 2012 17NoTube 3rd Review

Beancounter: •Pronetics’ user profiling SaaS•integration in the e-commerce technological solution

• making it more general purpose• making it capable of big data management a SaaS playground for Semantic Web researcher

•open source licensing•community extensions

Wednesday, March 28, 12

Page 19: NoTube: Recommendations (Collaborative)

Exploitation & Further Development

26-27 March 2012 18NoTube 3rd Review

Recommendations: •explore further the combination of demographic stereotypes & semantics in a hybrid approach to learn a prediction model for the shows a user is most likely interested in•integrate in personalized semantic search frameworks•extend with additional LOD sources•test further the measures for diversity, serendipity and predictability

•open source licensing•community extensions

Wednesday, March 28, 12

Page 20: NoTube: Recommendations (Collaborative)

Acknowledgements

26-27 March 2012 19NoTube 3rd Review

Wednesday, March 28, 12