Automatic music classification and the importance of instrument identification Cory McKay and Ichiro Fujinaga Music Technology Area Faculty of Music McGill

Automatic music classification Automatic music classification and the importance of and the importance of

instrument identificationinstrument identification

Cory McKay and Ichiro FujinagaCory McKay and Ichiro FujinagaMusic Technology Area Music Technology Area

Faculty of MusicFaculty of MusicMcGill UniversityMcGill UniversityMontreal, CanadaMontreal, Canada

10 March 200510 March 2005 CIM Montreal / McKay & FujinagaCIM Montreal / McKay & Fujinaga 22/25/25

OverviewOverview

Examination of the relative importance of Examination of the relative importance of different high-level features in automatic different high-level features in automatic music classificationmusic classification

Performed an experiment involving Performed an experiment involving automatic genre classification of MIDI filesautomatic genre classification of MIDI files

Found that features based on Found that features based on instrumentation (an abstraction of timbre) instrumentation (an abstraction of timbre) were of particular importancewere of particular importance


TopicsTopics

Introduction to automatic music classificationIntroduction to automatic music classificationRelated researchRelated researchDetails of experiment performedDetails of experiment performed

Features usedFeatures usedFeature weightingFeature weightingTaxonomies usedTaxonomies usedClassifiers and training data usedClassifiers and training data usedResultsResults

ConclusionsConclusions


Introduction to automatic music Introduction to automatic music classificationclassification

There are many ways in which computers can There are many ways in which computers can classify musicclassify music

GenreGenre

ComposerComposer

PerformerPerformer

Geographical/temporal/cultural originGeographical/temporal/cultural origin

etc.etc.

Music classification can be difficult for both Music classification can be difficult for both humans and computershumans and computers

Rarely have precise, clear and consistent guidelines Rarely have precise, clear and consistent guidelines delineating the musical characteristics of categoriesdelineating the musical characteristics of categories


Applications of automatic music Applications of automatic music classificationclassification

Discovery of probable authorship of anonymous Discovery of probable authorship of anonymous compositionscompositions

Sociological and psychological research into how Sociological and psychological research into how humans construct the notion of musical similarity and humans construct the notion of musical similarity and form musical groupingsform musical groupings

Automatic sorting of large databasesAutomatic sorting of large databases

Music recommendation systemsMusic recommendation systems

Sorting of personal music collectionsSorting of personal music collections

e.g. based on mood or listening scenariose.g. based on mood or listening scenarios

Automated transcription Automated transcription

Detection of pirated recordingsDetection of pirated recordings


Advantages of automatic music Advantages of automatic music classificationclassification

Computers can perform classifications Computers can perform classifications faster and more consistently than humansfaster and more consistently than humans

Computers can analyze music in novel Computers can analyze music in novel and non-intuitive ways that might not occur and non-intuitive ways that might not occur to humansto humans

Computers can avoid human Computers can avoid human preconceptions that might contaminate preconceptions that might contaminate experimental resultsexperimental results


How automatic classification worksHow automatic classification works

““Feature” extractionFeature” extractionProperties or characteristics of recordingsProperties or characteristics of recordings

Percepts that classifiers base decisions onPercepts that classifiers base decisions on

Can be extracted from audio (e.g. MP3) or symbolic Can be extracted from audio (e.g. MP3) or symbolic (e.g. MIDI) recordings(e.g. MIDI) recordings

Good features are essential to successful classificationGood features are essential to successful classification

Classification can be done usingClassification can be done usingExpert systems: utilize pre-set heuristicsExpert systems: utilize pre-set heuristics

Machine learning (AI): supervised or unsupervised Machine learning (AI): supervised or unsupervised learninglearning


FeaturesFeatures

Low-level featuresLow-level featuresSignal processing quantitiesSignal processing quantities

e.g. spectral centroid and spectral fluxe.g. spectral centroid and spectral flux

Can be effective practicallyCan be effective practicallyCan have psychoacoustic significanceCan have psychoacoustic significanceHave little direct theoretical meaning musicologically or Have little direct theoretical meaning musicologically or sociologicallysociologically

High-level featuresHigh-level featuresBased on musical abstractionsBased on musical abstractions

e.g. tempo and metere.g. tempo and meter

Currently difficult or impossible to extract from audio Currently difficult or impossible to extract from audio recordingsrecordingsHave more theoretical relevance than low-level featuresHave more theoretical relevance than low-level features


Overview of this experimentOverview of this experiment

Empirical examination of which features Empirical examination of which features are most useful to classifiersare most useful to classifiers

Used high-level features because of their Used high-level features because of their theoretical significancetheoretical significance

Used test task of genre classificationUsed test task of genre classificationA particularly difficult type of classificationA particularly difficult type of classification

Related to many other types of classificationRelated to many other types of classification

Features useful for this task likely to be Features useful for this task likely to be particularly robustparticularly robust


Related researchRelated research

Relatively little work has been done on features Relatively little work has been done on features that could be useful for arbitrary types of musicthat could be useful for arbitrary types of music

Cantometrics project (Lomax 1968) Cantometrics project (Lomax 1968) Tagg (1982) Tagg (1982) Cope (1991)Cope (1991)Arden and Huron (2001)Arden and Huron (2001)

Studied the correlation between musical features and Studied the correlation between musical features and geographical regionsgeographical regions

Automatic genre classification has received Automatic genre classification has received considerable attention recentlyconsiderable attention recently

Audio classification work of Tzanetakis and Cook (2002) Audio classification work of Tzanetakis and Cook (2002) is often citedis often citedBest results to date with symbolic data have been Best results to date with symbolic data have been achieved by McKay and Fujinaga (2004) achieved by McKay and Fujinaga (2004)


BodhidharmaBodhidharma

Experiments carried Experiments carried out with the out with the Bodhidharma systemBodhidharma systemA general-purpose A general-purpose symbolic feature symbolic feature extraction and extraction and classification systemclassification systemEasy-to-useEasy-to-usePortablePortableApplicable to a wide Applicable to a wide range of research tasksrange of research tasks


Features studiedFeatures studied

111 high-level features implemented:111 high-level features implemented:InstrumentationInstrumentation

e.g. whether modern instruments are presente.g. whether modern instruments are presentMusicalMusical TextureTexture

e.g. standard deviation of the average melodic leap of different linese.g. standard deviation of the average melodic leap of different linesRhythmRhythm

e.g. standard deviation of note durationse.g. standard deviation of note durationsDynamicsDynamics

e.g. average note to note change in loudnesse.g. average note to note change in loudnessPitch StatisticsPitch Statistics

e.g. fraction of notes in the bass registere.g. fraction of notes in the bass registerMelodyMelody

e.g. fraction of melodic intervals comprising a tritonee.g. fraction of melodic intervals comprising a tritone

Largest available set of implemented high-level featuresLargest available set of implemented high-level features42 more features have been proposed, but have not been 42 more features have been proposed, but have not been implemented yetimplemented yetMore information available in Cory McKay’s master’s thesis (2004)More information available in Cory McKay’s master’s thesis (2004)


Features to useFeatures to use

An insufficient number of features can fail to An insufficient number of features can fail to provide classifiers with enough information to provide classifiers with enough information to make good decisionsmake good decisions

Too many features can overwhelm and Too many features can overwhelm and confuse classifiersconfuse classifiers

Can be difficult to predict in advance which Can be difficult to predict in advance which features will work well togetherfeatures will work well together

Individual performance of a feature is not necessarily Individual performance of a feature is not necessarily indicative of its performance in combination with other indicative of its performance in combination with other featuresfeatures


Feature weightingFeature weighting

Feature weighting is a technique for Feature weighting is a technique for experimentally determining the importance of experimentally determining the importance of various features by assigning weights to themvarious features by assigning weights to them

Used genetic algorithms hereUsed genetic algorithms here““Evolves” a good set of weightsEvolves” a good set of weights

The weights produced by the genetic The weights produced by the genetic algorithm provides an indication of the algorithm provides an indication of the importance of particular features in particular importance of particular features in particular contextscontexts


Types of classification performedTypes of classification performed

The choice of “best” features is context-dependantThe choice of “best” features is context-dependante.g. best features for distinguishing between Baroque e.g. best features for distinguishing between Baroque and Romantic different than when comparing Punk and and Romantic different than when comparing Punk and Heavy MetalHeavy Metal

Performed three types of classification:Performed three types of classification:FlatFlat

HierarchicalHierarchical

Round-robinRound-robin

Hierarchical and round-robin feature weighting Hierarchical and round-robin feature weighting allowed classifiers to use specialized weightings in allowed classifiers to use specialized weightings in order to improve performanceorder to improve performance


Taxonomies usedTaxonomies used

Used hierarchical taxonomiesUsed hierarchical taxonomiesA recording could belong to more than one A recording could belong to more than one categorycategoryA category could be a child of multiple parents in A category could be a child of multiple parents in the taxonomical hierarchythe taxonomical hierarchy

Performed experiments with two taxonomies:Performed experiments with two taxonomies:Large (38 leaf categories):Large (38 leaf categories):

Used to test system under realistic conditionsUsed to test system under realistic conditions

Small (9 leaf categories):Small (9 leaf categories):Used to loosely compare system to existing sytemsUsed to loosely compare system to existing sytems


Large taxonomyLarge taxonomy


Small taxonomySmall taxonomy

JazzJazzBebopBebopJazz SoulJazz SoulSwing Swing

PopularPopularRapRapPunkPunkCountryCountry

Western ClassicalWestern ClassicalBaroqueBaroqueModern ClassicalModern ClassicalRomanticRomantic


Training and testingTraining and testing

Used ensembles of k-nearest neighbour Used ensembles of k-nearest neighbour and neural network classifiersand neural network classifiers950 MIDI files950 MIDI files

Hand-classified for training based on a variety Hand-classified for training based on a variety of on-line databasesof on-line databases

5 fold cross-validation5 fold cross-validation80% training, 20% testing80% training, 20% testing


Average success ratesAverage success rates

9 Category 9 Category TaxonomyTaxonomy

Leaf: 90% Leaf: 90%

Root: 98%Root: 98%

38 Category 38 Category TaxonomyTaxonomy

Leaf: 57%Leaf: 57%

Root: 81%Root: 81%

Classification Performance

0

10

20

30

40

50

60

70

80

90

100

Classical Jazz Pop Average Chance

Suc

cess

Rat

e (%

)

Root Genres

Leaf Genres


Success rates achieved in previous Success rates achieved in previous researchresearch

Audio results:Audio results: Many systems have been implementedMany systems have been implemented

Generally only used 10 categories or lessGenerally only used 10 categories or less

Success rates generally below 80% for more than 5 Success rates generally below 80% for more than 5 categoriescategories

Symbolic results:Symbolic results:84% for 2-way classifications (Shan & Kuo 2003)84% for 2-way classifications (Shan & Kuo 2003)

89% for 2-way classifications (Ponce de Leon & Inesta 2004)89% for 2-way classifications (Ponce de Leon & Inesta 2004)

63% for 3-way classifications (Chai & Vercoe 2001)63% for 3-way classifications (Chai & Vercoe 2001)

60-70% for 6-way classifications (Basili, Serafini & Stellato 60-70% for 6-way classifications (Basili, Serafini & Stellato 2004)2004)


Feature performanceFeature performance

Feature GroupFeature Group Number of FeaturesNumber of Features Weighting Scaled by Number of Features (%)Weighting Scaled by Number of Features (%)

InstrumentationInstrumentation 2020 46.146.1

PitchPitch 2525 24.524.5

RhythmRhythm 3030 14.314.3

MelodyMelody 1818 11.611.6

TextureTexture 1414 1.71.7

DynamicsDynamics 44 1.61.6

Features based on instrumentation were assigned Features based on instrumentation were assigned 46.1% of all weightings (after scaling)46.1% of all weightings (after scaling)At least one instrumentation feature played a major role At least one instrumentation feature played a major role in almost every classifierin almost every classifier Two of the top three features were based on Two of the top three features were based on instrumentationinstrumentation


Importance of instrumentationImportance of instrumentation

Features based on instrumentation clearly dominantFeatures based on instrumentation clearly dominantA high-level abstraction of timbreA high-level abstraction of timbre

Implies that audio classification systems could Implies that audio classification systems could benefit from instrument identification modulesbenefit from instrument identification modules

Caveat:Caveat:These results present the overall averages of weightingsThese results present the overall averages of weightings

Other features played a dominant role in certain stages of Other features played a dominant role in certain stages of classificationclassification

The best results were achieved by including a wide variety The best results were achieved by including a wide variety of features and applying feature weightingof features and applying feature weighting


ConclusionsConclusions

Features based on instrumentation can play an Features based on instrumentation can play an essential role in automatic music classification, essential role in automatic music classification, and should be used if possibleand should be used if possibleHigh-level features can produce good results, High-level features can produce good results, and should not be neglected in favour of low-and should not be neglected in favour of low-level featureslevel featuresBodhidharma’s large feature library combined Bodhidharma’s large feature library combined with feature weighting is an effective approachwith feature weighting is an effective approachVery good genre classification success rates can Very good genre classification success rates can be achieved with small taxonomies, and we are be achieved with small taxonomies, and we are at least approaching a point where large at least approaching a point where large taxonomies can be dealt with effectivelytaxonomies can be dealt with effectively


Documents

Automatic music classification and the importance of instrument identification Cory McKay and Ichiro Fujinaga Music Technology Area Faculty of Music McGill