Exploiting Twitter’s Collective Knowledge for Music Recommendations

Exploiting Twitter’s

Collective Knowledge for Music Recommendations

Eva Zangerle, Wolfgang Gassler and Günther Specht

1

Outline

• Motivation

• Data set creation

• Artist and track resolution

• Music recommendation

• Future directions

• Conclusion

2

Motivation

• Twitter as source for music recommendations

• User stream: set of #nowplaying tweets of one user (representing his preferences in music)

3

Data Set Creation

• Crawling Twitter for keywords nowplaying, listeningto, listento, etc.

• 5 mio tweets from 07/2011 to 02/12

• User stream analysis

4

Tweets in Stream Users

1 457,657

> 3 196,422

> 10 63,017

> 100 3,190

> 1,000 253

> 10,000 5

Track Resolution

listening to Hey Hey My My (Out Of The Blue) by Neil Young on @Grooveshark: #nowplaying #musicmonday http://t.co/7os3eeA

#nowplaying @Lloyd_YG ft. @LilTunechi - You

• Problem: extraction of – Title of the track – Artist performing the track – Metainformation (links, Twitter accounts, etc.)

– > Reference Database (FreeDB or MusicBrainz)

5

http://t.co/7os3eeA

http://t.co/7os3eeA

http://t.co/7os3eeA

Track Resolution

• Matching of tweet content to reference DB • Fulltext index (Lucene, tf/idf + cosine sim) • Custom similarity measure:

𝑠𝑖𝑚 𝑡𝑤𝑒𝑒𝑡, 𝑡𝑟𝑎𝑐𝑘 = 𝑡𝑤𝑒𝑒𝑡 ∩ 𝑡𝑟𝑎𝑐𝑘

𝑡𝑟𝑎𝑐𝑘

• Query: listening to Hey Hey My My (Out Of The Blue) by Neil Young on @Grooveshark: #nowplaying #musicmonday http://t.co/7os3eeA

MusicBrainz track = Hey Hey My My (Out of the Blue) MusicBrainz artist = Neil Young

6

http://t.co/7os3eeA

http://t.co/7os3eeA

Evaluation of Resolution Process

• Ground truth data set (100 tweets)

• MusicBrainz and FreeDB tracks assigned manually

• Automatically assigned tracks (custom similarity > 0.8)

• Matched tracks:

• FreeDB very noisy -> many false positives

7

RefDB Manually Automatically False Positive

MusicBrainz 59 43 (73%) 5 (10%)

FreeDB 57 31 (54%) 18 (36%)

Use Case: Music Recommendation

• Recommendation based on co-occurrence analysis on user streams

• Evaluation through comparison with last.fm

• 79% coverage of co-occurence rules

• Top 10 recs: only 1% coverage

• Sparsity!

8

Problems & Future Directions

• Sparsity of Data

– User stream crawling (more complete user streams)

– Exploit Metadata (URLs, …)

• Matching Process

– Tweets <-> refDB tracks

– refDB tracks <-> last.fm tracks

– Last.fm similar tracks <-> refDB tracks

9

Conclusion

• Data Set Creation

– Crawl Twitter

– Reference database

– Matching of tracks

• Music recommendation by co-occurrence analysis

10

11

Entertainment & Humor

Exploiting Twitter’s Collective Knowledge for Music Recommendations