Upload
sten-govaerts
View
414
Download
0
Tags:
Embed Size (px)
DESCRIPTION
My presentation at the adMIRe workshop on ISM 2009 in San Diego. The presentation is about our study on the use of search engines to classify genres.
Citation preview
USING SEARCH ENGINES FOR CLASSIFICATION: DOES IT
STILL WORK?Sten Govaerts, Nik Corthaut, Erik Duval
•Our problem
•Classification using search engines
•The setup
•The evaluation
•Conclusion
TUNIFY
TUNIFY
TUNIFY
HOW DOES IT WORK?
• manually annotated metadata
• 5 music experts at Aristo Music and different consultants
• almost 80,000 songs
• but, not enough...
PROBLEMS
• satisfying the music choice of all customers
• retail and catering differ from you and me!
• new markets
• react fast on emerging music trends
• adding the full Belgian library catalog
GENERATE THE METADATA
• from different sources:
• the audio signal• web sources• the Aristo database• attention metadata
• using our metadata generation framework: SamgI
GENRE...
• our master thesis looked at different ways to generate genre...
ONE APPROACH...
• M. Schedl, T. Pohle, P. Knees, G. Widmer, “Assigning and Visualizing Music Genres by Web-based Co-occurrence Analysis”, Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 260-265.
• G. Geleijnse, J. Korst, "Web-based Artist Categorization", Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 266 - 271.
CLASSIFICATION WITH SEARCH ENGINES
using co-occurrence
CLASSIFICATION WITH SEARCH ENGINES
using co-occurrence
CLASSIFICATION WITH SEARCH ENGINES
Artist + Genre + Schema
using co-occurrence
CLASSIFICATION WITH SEARCH ENGINES
Artist + Genre + Schema
using co-occurrence
CLASSIFICATION WITH SEARCH ENGINES
Artist + Genre + Schema
using co-occurrence
CLASSIFICATION WITH SEARCH ENGINES
Artist + Genre + Schema
using co-occurrence
Rock:
Blues:
Country:
Jazz:
Pop:
Metal:
Rock:
Blues:
Country:
Jazz:
Pop:
Metal:
0,013
0,009
0,013
0,005
0,0150,009
RESULTS
• master thesis student’s results were much worse
• what happened?
• did Google search result count change?
• has Google Search API different results?
• is the student’s implementation correct?
HOW TO EVALUATE THIS?
• re-run the original experiment
• evaluate on the same data set: 1995 artists and 9 genres.
• different search engines: Google, Yahoo! and Live! Search.
• over time: 8 times over a period of 36 days.
THE DATA SET
Blues Country ElectronicFolk Jazz MetalRap Reggae RnB
THE DATA SET
9%
12%
5%4%
41%
13%
2%3%10%
Blues Country ElectronicFolk Jazz MetalRap Reggae RnB
THE DATA SET
Blues Country ElectronicFolk Jazz MetalRap Reggae RnB
MOTION CHART
• http://hmdb.cs.kuleuven.be/muzik/gapminder.html
MORE FINE-GRAINED...
• 18 artists
• more search engines: Google.co.uk/.fr/.be, uk/fr.search.yahoo.com
• twice a day for 53 days
• 250,000 queries!
2 Pac Rap
Alan Lomax Folk
Art Pepper Jazz
Cradle of Filth Metal
David Parsons Electronic
Desmond Dekker Reggae
Downpour Metal
IceT Rap
Jerry Butler RnB
Joy Lynn White Country
Louisiana Red Blues
Lou Rawls RnB
LTJ Bukem Electronic
Peter Tosh Reggae
Pinetop Smith Jazz
Robert Johnson Blues
Roy Rogers Country
Steeleye Span Folk
MAIN SEARCH ENGINE RESULTS
REGIONAL GOOGLES
WHAT TO USE?
• use Google when it’s stable else rely on Yahoo!
• when is it stable? test with a small set
• some artists get classified incorrectly on bad days
• compare the accuracy achieved with the test set to the average.
CONCLUSION
• still works after 3 years
• Google -> Yahoo! -> Live! Search
• why does Google fluctuate?
• a generic version of an all purpose classifier is implemented in metadata generation framework
FUTURE WORK
• understand the performance differences of regional search engines
• use alternative search engines
• tweak the genre taxonomy depending on the search engine
Q & A.
DEMO METADATA GENERATION
• http://ariadne.cs.kuleuven.be/samgi-service/