Upload
evazangerle
View
1.130
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Exploiting Twitter’s
Collective Knowledge for Music Recommendations
Eva Zangerle, Wolfgang Gassler and Günther Specht
1
Outline
• Motivation
• Data set creation
• Artist and track resolution
• Music recommendation
• Future directions
• Conclusion
2
Motivation
• Twitter as source for music recommendations
• User stream: set of #nowplaying tweets of one user (representing his preferences in music)
3
Data Set Creation
• Crawling Twitter for keywords nowplaying, listeningto, listento, etc.
• 5 mio tweets from 07/2011 to 02/12
• User stream analysis
4
Tweets in Stream Users
1 457,657
> 3 196,422
> 10 63,017
> 100 3,190
> 1,000 253
> 10,000 5
Track Resolution
listening to Hey Hey My My (Out Of The Blue) by Neil Young on @Grooveshark: #nowplaying #musicmonday http://t.co/7os3eeA
#nowplaying @Lloyd_YG ft. @LilTunechi - You
• Problem: extraction of – Title of the track – Artist performing the track – Metainformation (links, Twitter accounts, etc.)
– > Reference Database (FreeDB or MusicBrainz)
5
Track Resolution
• Matching of tweet content to reference DB • Fulltext index (Lucene, tf/idf + cosine sim) • Custom similarity measure:
𝑠𝑖𝑚 𝑡𝑤𝑒𝑒𝑡, 𝑡𝑟𝑎𝑐𝑘 = 𝑡𝑤𝑒𝑒𝑡 ∩ 𝑡𝑟𝑎𝑐𝑘
𝑡𝑟𝑎𝑐𝑘
• Query: listening to Hey Hey My My (Out Of The Blue) by Neil Young on @Grooveshark: #nowplaying #musicmonday http://t.co/7os3eeA
MusicBrainz track = Hey Hey My My (Out of the Blue) MusicBrainz artist = Neil Young
6
Evaluation of Resolution Process
• Ground truth data set (100 tweets)
• MusicBrainz and FreeDB tracks assigned manually
• Automatically assigned tracks (custom similarity > 0.8)
• Matched tracks:
• FreeDB very noisy -> many false positives
7
RefDB Manually Automatically False Positive
MusicBrainz 59 43 (73%) 5 (10%)
FreeDB 57 31 (54%) 18 (36%)
Use Case: Music Recommendation
• Recommendation based on co-occurrence analysis on user streams
• Evaluation through comparison with last.fm
• 79% coverage of co-occurence rules
• Top 10 recs: only 1% coverage
• Sparsity!
8
Problems & Future Directions
• Sparsity of Data
– User stream crawling (more complete user streams)
– Exploit Metadata (URLs, …)
• Matching Process
– Tweets <-> refDB tracks
– refDB tracks <-> last.fm tracks
– Last.fm similar tracks <-> refDB tracks
9
Conclusion
• Data Set Creation
– Crawl Twitter
– Reference database
– Matching of tracks
• Music recommendation by co-occurrence analysis
10
11