Danbri 2nd Review WP3 Challenges

Embed Size (px)

Citation preview

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    1/30

    WP3

    Challenges & Hybrid models

    Dan Brickley, VUA

    Pro-netics & BBC

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    2/30

    2

    Overview

    Challenges for TV in Social Web

    theory and practice of our hybrid approach

    3 Interconnected problems: Privacy, Sparsity and Heterogeneity

    What we built (and why)

    different kinds of recommender ways of integrating them

    Plans and options for final developments

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    3/30

    3

    Theory and Practice

    89 05 2 9

    00 88 8 6

    23 97 9 8

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    4/30

    4

    More likely ...

    09 00 0 9

    00 88 0 0

    00 97 0 8

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    5/30

    5

    5

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    6/30

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    7/30

    77

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    8/30

    88

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    9/30

    9

    TV preference data is very sparse!

    Even for a single service (eg. Netflix), data is

    overwhelmingly sparse

    For NoTubes open systems, challengesmultiply:

    often no global view, only per-user data

    many ways of identifying the same content item

    many ways of identifying the same user

    never mind other entities (actors, directors, ...)

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    10/30

    10

    Challenges: Sparsity, Fragmentation

    Content identifiers (WP1)

    Wikipedia/DBpedia URLs? Freebase?

    RottenTomatoes.com, IMDB.com, broadcaster IDs

    Social Web interoperability

    Bobs on Facebook, Charlies on Twitter

    negotiating access to non-public data (OAuth)

    reconciling metadata models, rating models

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    11/30

    11

    Fragmentation by site

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    12/30

    12

    A hybrid approach to sparsity

    Find patterns and paths in factual data

    Collaborative filtering - from bulk rating data

    Experiments with big data (e.g. Twitter

    crawl)

    Models for combining recommenders

    Strategies for inferring sameAs links

    ...or grouping items together (by series,

    brand)

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    13/30

    13

    Challenge: Privacy

    TV preferences are very personal data

    Relevant standards (OAuth) are new

    deployed widely in Social Web during NoTube slower adoption in TV and broadcast world

    We can use OAuth to request permission to

    read a users closed data (eg. FacebookLikes)

    limits ability to find general trends across an

    entire audience (except public data - twitter?)

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    14/30

    14

    Diversity and Fragmentation

    Diversity of the Web

    reading lists: bookcrossing, librarything, amazon

    music on last.fm, spotify, ... news sites, social networks, blogs ...

    How to integrate while respecting privacy?

    Good news:O

    Auth deployment growing &social sites expose their recommendations

    Bad news: user-by-user data makes large-

    scale analysis of trends harder

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    15/30

    15

    OAuth? RDFa?

    OAuth lets sites negotiate access with users

    e.g., Facebook knows lots of movies I like.

    NoTube can use OAuth to ask me to sharethat data with TV services

    RDFa data from movie pages (IMDB, Rotten

    tomatoes) is consumed at Facebook This makes certain pages attractive as content

    identifiers, a taste graph alongside social

    graph

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    16/30

    16

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    17/30

    17

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    18/30

    18

    RDFa in IMDB and

    RottenTomatoes HTML

    Aggregated by Facebook (and then, by us...

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    19/30

    19

    What we built

    Main WP3 work: beancounter and pattern

    recommender

    Aggregate, normalize and merge social Webactivity streams, then match against enriched

    TV metadata to produce recommendations

    We also have a Mahout-based collaborative

    filtering recommender, with item to item

    recommendations based on bulk ratings data

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    20/30

    20

    LOD challenges

    Linked Open Data for TV is new

    datasets evolving, changing

    qualityv

    aries modelling styles vary

    lumpy, uneven coverage

    Pattern recommender finds paths

    from items in user profile to new content

    handles variation between Linked Data sources

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    21/30

    Content Pattern-based

    Recommendations Paths in Linked Open Data

    Diversity & Serendipity measures

    21

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    22/30

    Participation Pattern

    Person Xplayed role Y

    in TV program Z

    194,649lmdb:actor triples

    53,180 lmdb:director triples

    28,549 lmdb:writer triples

    1,262

    lmdb:film_story_contributortriples

    22

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    23/30

    Influence Pattern

    Person X influenced

    byperson Y (direct)

    Person X andY

    influenced byperson

    Z (in-direct)

    6,562dbpedia:influencedtriples

    11,776dbpedia:influencedBy

    triples

    23

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    24/30

    Analysis of Patterns in Dataset

    Dataset (BBC EPG metadata): 12,777 (7,756 title enrichment) programmes

    1260 (401 enriched) brands (unique titles)

    35,227 (19,394 enriched) person names in metadata

    9,315 (4,590 enriched) unique person names in metadata

    24

    # items

    recommendations 1266

    - Individual brands 411

    paths 17,001

    - with linkedmdb:actor 15,257

    - with linkedmdb:director 1155

    - with linkedmdb:writer 569

    - with linkedmdb:film_story_contributor 20

    # items

    recommendations 222

    - Individual brands 100

    paths

    - influencedBy (all) 1202

    - influencedBy (unique) 521

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    25/30

    25

    Collaborative filtering

    (item similarity measures from bulk ratings data)

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    26/30

    26

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    27/30

    27

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    28/30

    28

    Hybrid models:

    factual paths and statistical similarity

    (and not to mention @wossy is on Twitter with 1 million followers...)

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    29/30

    29

    Status

    We can show a standards-based system that

    integrates TV preference data from diverse Web

    matches this with enriched TV metadata

    finds graph patterns linking users to content

    integrates with classic recommender approaches

    builds on opensource (Cliopatria,Mahout)

    supports real-time multi-screen exploration

  • 8/6/2019 Danbri 2nd Review WP3 Challenges

    30/30

    30

    Plans and challenges

    Richer integration between components

    currently this occurs in the application; can we

    exploit LOD patterns prior toMahout analysis?

    Polish & packaging; more patterns and rules

    Track and influence evolving standards (W3C)

    Work-in-progress with big data analysis -

    what kinds of TV links are shared by the kind

    of people who follow @stephenfry on

    Twitter?