20
WP 3 User profiling & Recommenda6on (Part 3) BBC, Prone+cs, VUA 1 Wednesday, March 28, 12

NoTube: Pattern-based Recommendations (part 3)

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: NoTube: Pattern-based Recommendations (part 3)

WP  3User  profiling  &  

Recommenda6on  (Part  3)BBC,  Pro-­‐ne+cs,  VUA

1

Wednesday, March 28, 12

Page 2: NoTube: Pattern-based Recommendations (part 3)

Contents

26-27 March 2012 2NoTube 3rd Review

Overview

User profilingGeneral goal & approachFrom activity streams to profileIssuesAnalyticsBeancounter

RecommendationsGeneral goal & approachSemantic recommendationStatistical recommendationHybrid recommendation

Exploitation

Conclusions

Wednesday, March 28, 12

Page 3: NoTube: Pattern-based Recommendations (part 3)

Overview

26-27 March 2012 3

TV Program Enrichment

SemanticPattern-based

Recommendation Strategy

RDF GraphTV

Programs

Semantic ContentPatterns for

TV Programs

HybridRecommendation

Strategy

StatisticalSimilarity-based

Recommendation StrategyUser Ratings &

Demographics(BBC EPG

Data)

EPG Metadata(BBC)

Recommendation Service

SimilarityClusters

of Programs

User Data Analysis

End-UsersEnd Users

NoTube 3rd Review

Wednesday, March 28, 12

Page 4: NoTube: Pattern-based Recommendations (part 3)

Overview

26-27 March 2012 3

TV Program Enrichment

SemanticPattern-based

Recommendation Strategy

RDF GraphTV

Programs

Semantic ContentPatterns for

TV Programs

HybridRecommendation

Strategy

StatisticalSimilarity-based

Recommendation StrategyUser Ratings &

Demographics(BBC EPG

Data)

EPG Metadata(BBC)

Recommendation Service

SimilarityClusters

of Programs

User Data Analysis

End-UsersEnd Users

BEANCOUNTER

NoTube 3rd Review

Wednesday, March 28, 12

Page 5: NoTube: Pattern-based Recommendations (part 3)

Statistical recommendations

26-27 March 2012 4NoTube 3rd Review

• We had privileged access to two bulk user ratings datasets from BBC

• From these, used Apache Mahout toolkit to derive "item to item" similarity measures between each pair of items

• With larger (20k users) this worked well; with a smaller (1k) dataset, less well

• With BBC, investigating publication of these behaviour-derived similarity measures

Wednesday, March 28, 12

Page 6: NoTube: Pattern-based Recommendations (part 3)

31

Hybrid models:

factual paths and statistical similarity

(and not to mention ‘@wossy’ is on Twitter with 1 million followers...)

Wednesday, March 28, 12

Page 7: NoTube: Pattern-based Recommendations (part 3)

Statistical recommendation

26-27 March 2012 6NoTube 3rd Review

89 05 2 9

00 88 8 6

23 97 9 8

20k

12k

Wednesday, March 28, 12

Page 8: NoTube: Pattern-based Recommendations (part 3)

Statistical recommendation

26-27 March 2012 7NoTube 3rd Review

09 00 0 9

00 88 0 0

00 97 0 8

Wednesday, March 28, 12

Page 9: NoTube: Pattern-based Recommendations (part 3)

99

Wednesday, March 28, 12

Page 10: NoTube: Pattern-based Recommendations (part 3)

1010

Wednesday, March 28, 12

Page 11: NoTube: Pattern-based Recommendations (part 3)

1111

Wednesday, March 28, 12

Page 12: NoTube: Pattern-based Recommendations (part 3)

1212

Wednesday, March 28, 12

Page 13: NoTube: Pattern-based Recommendations (part 3)

TV Preference Data is very sparse

26-27 March 2012 12NoTube 3rd Review

• Even for a single service (e.g. Netflix), data is ‘overwhelmingly sparse’

• For NoTube’s open systems, challenges multiply:– often no global view, only per-user data

– many ways of identifying the same content item

– many ways of identifying the same user

– never mind other entities (actors, directors, ...)

• Q: Can we tell a story about how organizations with such privileged overviews can contribute in a privacy respecting way to the public commons of linked data? (A: yes! see WP4)

Wednesday, March 28, 12

Page 14: NoTube: Pattern-based Recommendations (part 3)

Fragmentation by site

26-27 March 2012 13NoTube 3rd Review

Wednesday, March 28, 12

Page 15: NoTube: Pattern-based Recommendations (part 3)

29

Wednesday, March 28, 12

Page 16: NoTube: Pattern-based Recommendations (part 3)

30

Wednesday, March 28, 12

Page 17: NoTube: Pattern-based Recommendations (part 3)

Statistical recommendation: Process

26-27 March 2012 16NoTube 3rd Review

• Build on best-in-class opensource code, rather than re-invent

• Big-data ready (Hadoop-based)

• Of various options, LogLikelihoodSimilarity generally gave best results (standard 'withold some ratings' evaluation strategy)

• Other explorations: including large scale (1/2 billion tweet) Twitter analysis, Spectral Clustering, using demographics, ...

Wednesday, March 28, 12

Page 18: NoTube: Pattern-based Recommendations (part 3)

Exploitation & Further Development

26-27 March 2012 17NoTube 3rd Review

Beancounter: •Pronetics’ user profiling SaaS•integration in the e-commerce technological solution

• making it more general purpose• making it capable of big data management a SaaS playground for Semantic Web researcher

•open source licensing•community extensions

Wednesday, March 28, 12

Page 19: NoTube: Pattern-based Recommendations (part 3)

Exploitation & Further Development

26-27 March 2012 18NoTube 3rd Review

Recommendations: •explore further the combination of demographic stereotypes & semantics in a hybrid approach to learn a prediction model for the shows a user is most likely interested in•integrate in personalized semantic search frameworks•extend with additional LOD sources•test further the measures for diversity, serendipity and predictability

•open source licensing•community extensions

Wednesday, March 28, 12

Page 20: NoTube: Pattern-based Recommendations (part 3)

Acknowledgements

26-27 March 2012 19NoTube 3rd Review

Wednesday, March 28, 12