If you can't read please download the document
Upload
niles
View
29
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Description et Classification automatique des sons instrumentaux . Geoffroy Peeters Ircam (Analysis/Synthesis Team) [email protected] . 1. Introduction. trumpet. Musical Instrument Sound Classification numerous studies on sound classification - PowerPoint PPT Presentation
Citation preview
Description et Classification automatique des sons instrumentaux Geoffroy Peeters Ircam (Analysis/Synthesis Team) [email protected]
1. IntroductionMusical Instrument Sound Classificationnumerous studies on sound classificationfew of them address the problem of generalization of sound sources (recognition of the same source possibly recorded in different conditions with various instrument manufacturers and players)Evaluation of the system performancetraining on a subset of the database, evaluation on the rest of the databasedoes not prove any applicability for the classification of sounds which do not belong to the database
Martin [1999] 76% (family)39% for 14 instrumentsEronen [2001] 77% (family)35% for 16 instruments
Goal of this studystudy large database classificationHow ? New classification systemExtract a large amount of featuresNew feature selection algorithmCompare flat and hierarchical gaussian classifier
trumpet
Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization
2. Feature extraction
Features for sound recognition: speech recognition community, previous studies on musical instrument sounds classification, results of psycho-acoustical studies.each feature set is supposed to perform well for a specific task
Principle:1) extract a large set of features 2) filter the feature set a posteriori by a Feature Selection Algorithm
2. Feature extraction Audio features Taxonomy Global descriptorsInstantaneous descriptorsTemporal modelingMean, VarianceModulation (pitch, energy)
2. Feature extraction Audio features TaxonomyDT: temporal descriptorsDE: energy descriptorsDS: spectral descriptorsDH: harmonic descriptorsDP: perceptual descriptors
2. Feature extraction DT/DE: Temporal/Energy descriptorssoundEnergyEnvelopDT.zero-crossing rateDT.auto-correlationDT.log-attack timeDT.temporal increaseDT.temporal decreaseDT.temporal centroid DT.effective duration
DE.total energyDE.energy of harmonic partDE.energy of noise part
2. Feature extraction DS: Spectral descriptorssoundWindowFFTDS.centroid, DS.spread, DS.skewness, DS.kurtosisDS.slope, DS.decrease, DS.roll-offDS.variation
2. Feature extraction DH: Harmonic descriptorsDH.Centroid, DH.Spread, DH.Skewness, DH.KurtosisDH.Slope, DH.Decrease, DH.Roll-offDH.Variation
DH.Fundamental frequencyDH.Noisiness, DH.OddEvenRatio, DH.InharmonicityDH.TristimulusDH.DeviationsoundWindowFFTSinudoidal model
2. Feature extraction DP: Perceptual descriptors / DV: Various descriptorsDP.Centroid, DP.Spread, DP.Skewness, DP.KurtosisDP.Slope, DP.Decrease, DP.Roll-offDP.Variation
DP.Loudness, RelativeSpecific LoudnessDP.Sharpness, DP.SpreadDP.Roughness, DP.FluctuationStrength
DV.MFCC, DV.Delta-MFCC, DV.Delta-Delta-MFCCDV.SpectralFlatness, DV.SpectralCrestsoundWindowFFTPerceptionMid-ear fileringBark scaleMel scale
2. Feature extraction Audio features designNo consensus on the use of amplitude and frequency scaleAll features are computed using the following scale:Frequency scale: linear / log / bark-bandsAmplitude scale: linear / power / log note: log(0.0)=-infty-> normalization 24bits
Features must be independent of the recording level
Normalization in linear, in power scale
Normalization in logarithmic scale
Features must be independent of the sampling rateMaximum frequency taken into account: 11025/2 HzResampling (for zcr, xcorr)
Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization
3. Feature selection algorithm (FSA)Problem: using a high number of featuressome features can be irrelevant for the given taskover fitting of the model to the training set (especially with LDA)classification models are difficult to interpret by human
Goal of feature selection algorithm (FTA)find the minimal set of criterion 1) informative features with respect to the classes criterion 2) features that provide non redundant information
Forms of feature selection algorithmembedded: the FSA is part of the classifierfilter: the FSA is distinct from the classifier and used before the classifierwrapper: the FSA makes use of the classification results
3. Feature selection algorithm: IRMFSPCriterion 1 informative features with respect to the classesprinciple: feature values for sounds belonging to a specific class should be separated from the values for all the other classesmeasure: for a specific feature i ratio of the Between-class inertia B to the Total class inertia T
Criterion 2 features that provide non redundant informationapply an orthogonalization process of the feature space after the selection of each new feature (Gram-Schmidt Orthogonalization)
Inertia Ratio Maximization using Feature Space Projection
3. Feature selection algorithm: IRMFSPExample : sustained/non-sustained sound separationcomputation of the BT ratio for each feature
feature with the weakest ratio (r=6.9e-6) Specific loudness m8 mean
feature with the highest ratio (r=0.58)Energy temporal decrease
first three selected dimensions 1st dim: temporal decrease2nd dim: spectral centroid3rd dim: temporal increase
Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization
4. Feature transformation: LDALinear Discriminant Analysisfind linear combination among features in order to maximize discrimination between classes: F -> F
Total inertia
Between Class Inertia
Transform initial feature space F by a transformation matrix U in order to maximize the ratio
Solution: eigen vectors of associated to the eigen values (discriminative power)
Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization
5. Class modeling: flat classifiersFlat classifiersFlat gaussian classifier (F-GC)Flat= all classes considered on a same levelTraining: model each class k by a multi-dimensional gaussian pdf (mean vector, covariance matrix)Evaluation: Bayes formula
Flat KNN classifier (F-KNN)instance-based algorithmassign to the input sound the majority class among its K Nearest Neighbors in the Feature SpaceEuclidean distance => weighting of the axes ?Apply to the output of the LDA (implicit weighting of the axes)
5. Class modeling: hierarchical classifiersHierarchical classifiers (F-GC)Hierarchical gaussian classifier (H-GC)Training: a tree of flat gaussian classifier each node has its own FSA, FTA and F-GCTree construction is supervised (>< decision tree)Only the subset of sounds belonging to the classes of the current node are usedEvaluation: local probability decides which branch of the tree to followAdvantages of H-GCLearning facilities: it is easier to learn differences in a small subset of classesReduced class confusion: benefit from the higher recognition rate at the higher levels of the tree
Hierarchical KNN classifier (H-KNN)
5. Class modeling: hierarchical classifiersHierarchical classifiers (F-GC)Hierarchical gaussian classifier (H-GC)Training: a tree of flat gaussian classifier each node has its own FSA, FTA and F-GCTree construction is supervised (>< decision tree)Only the subset of sounds belonging to the classes of the current node are usedEvaluation: local probability decides which branch of the tree to followAdvantages of H-GCLearning facilities: it is easier to learn differences in a small subset of classesReduced class confusion: benefit from the higher recognition rate at the higher levels of the tree
Hierarchical KNN classifier (H-KNN)
Decision Trees: Binary Entropy Reduction Tree (BERT)C4.5.Partial Decision Tree (PART)
Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization
6. EvaluationTaxonomy used Three different levelsT1: sustained/non-sustained soundsT2: instrument familiesT3: instrument names
6. EvaluationTest set 6 databasesIrcam Studio OnLine (1323 sounds, 16 instruments), Iowa University database (816 sounds, 12 instruments), McGill University database (585 sounds, 23 instruments), Microsoft Musical Instruments CD-ROM (216 sounds, 20 instruments),two commercial databases Pro (532 sounds, 20 instruments) Vi databases (691 sounds, 18 instruments),
total = 4163 sounds.
notes:27 instrument have been considereda large pitch range has been considered (4 octaves on average)no muted, martele/staccato sounds
Graph1
piano864812pianopiano
121guitar2012guitarguitar
80harp201218harp
viola-pizzviola-pizz11viola-pizz2914
double-pizz1002042933
cello-pizz1071442619
violin-pizzviolin-pizz1942450
103viola32viola3456
999818121934
9611020124097
92violin43124770
803218125149
cornetcornet101231cornet
773319123922
62trumpet33122822
69tuba16121627
767876123447
piccolopiccolo29piccolo648
recorderrecorder2712recorderrecorder
75accordeonaccordeonaccordeonaccordeonaccordeon
794116121441
844234121822
english-hornenglish-horn16english-horn1213
673526121727
63saxsopsaxsopsaxsopsaxsopsaxsop
saxalto54saxaltosaxaltosaxaltosaxalto
saxtenorsaxtenorsaxtenor12saxtenorsaxtenor
SOL
Iowa
McGill
Microsoft
Pro
Vi
Graph2
086481200
20104024180
02076412108116
39020811336140257
288659660165120
7678132244095
368172924861103
SOL
Iowa
McGill
Microsoft
Pro
Vi
Graph3
086481200
20104024180
02076412108116
39020811336140257
288659660165120
7678132244095
368172924861103
SOL
Iowa
McGill
Microsoft
Pro
Vi
Graph4
piano864812pianopiano
121guitar2012guitarguitar
80harp201218harp
viola-pizzviola-pizz11viola-pizz2914
double-pizz1002042933
cello-pizz1071442619
violin-pizzviolin-pizz1942450
103viola32viola3456
999818121934
9611020124097
92violin43124770
803218125149
cornetcornet101231cornet
773319123922
62trumpet33122822
69tuba16121627
767876123447
piccolopiccolo29piccolo648
recorderrecorder2712recorderrecorder
75accordeonaccordeonaccordeonaccordeonaccordeon
794116121441
844234121822
english-hornenglish-horn16english-horn1213
673526121727
63saxsopsaxsopsaxsopsaxsopsaxsop
saxalto54saxaltosaxaltosaxaltosaxalto
saxtenorsaxtenorsaxtenor12saxtenorsaxtenor
SOL
Iowa
McGill
Microsoft
Pro
Vi
Feuil1
SOLIowaMcGillMicrosoftProVi
piano864812146
struckedstr086481200146
guitar1212012153
harp80201218130
pluckedstr20104024180283
viola-pizz11291454
double-pizz1002042933186
cello-pizz1071442619170
violin-pizz194245097
pizzstr02076412108116
viola103323456225
double999818121934280
cello9611020124097375
violin9243124770264
bowedstr39020811336140257
french horn803218125149242
cornet10123153
trombone773319123922202
trumpet6233122822157
tuba6916121627140
brass288659660165120
flute767876123447
piccolo2964883
recorder271239
airreed7678132244095
accordeon7575
bassoon794116121441203
clarinet844234121822212
english-horn16121341
oboe673526121727184
saxsop6363
saxalto5454
saxtenor1212
singledoublereed368172924861103
227814601078
Feuil2
Feuil3
6. EvaluationEvaluation process 1) Random 66%/33% partition of database (50 sets)
2) One to One (O2O) [Livshin2003]: each database is used in turns to classify all other databases
3) Leave One Database Out (LODO) [Livshin 2003]: all database except one are used in turns to classify the remaining one
6. EvaluationResults O2O (II)
_1120545457.xls
Graph1
Feuil1
LAST UPDATE 2003/04/18
TODOOno2One en fat gaussian nogaussianity,noanadisc (ca risque d'tre long: 36 expriences * 3 niveaux)
LODO fatgaussian et hiergaussian gaussianity,noanadisc
Diffrence avec Classresult2: erase des pizz cello dans sustained dans Prosonus
66/33
Test Feature Selection
nogaussianty, noanadiscMETAeval_Fsinertiecfs_Dbsol (_manudesc)
sol
CFS99.53 (0.47)92.56 (0.63)10.95 (3.68)
IRMFSP t=0.01 maxdesc=2099.65 (0.31)96.2 (1.28)95.95 (0.91)
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmCFS99.01 (0.50)93.25 (0.80)60.76 (12.87)
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmIRMFSP t=0.01 maxdesc=2099.25 (0.37)95.81 (1.22)95.17 (1.26)
Ono2one
Test paramtre FSinertie
Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_10005
NEW IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus
sol by15151931390023.8
iowa by33351546450034.8
mcgill by31111931410026.6
microsoft by3393049350031.2
prosonus27232715460027.6
vitus by23202614390024.4
0029.40015.60026.60016.40039.20041.20028.0666666667
Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_20005
NEW IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus
sol by216023320016.4
iowa by35161124360024.4
mcgill by43141033430028.6
microsoft by37121840310027.6
prosonus3826180390024.2
vitus by3524140370022
0037.60019.40014.4004.20031.40036.20023.8666666667
Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_40001
NEW IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus
sol by1150912007.4
iowa by3822114220019.4
mcgill by42151013210020.2
microsoft by44121215260021.8
prosonus30880250014.2
vitus by23108090010
0035.40011.20011004.200100021.20015.5
Test paramtre Fsinertie avec remove Dei,
Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_10005_manudesc _manudesc1 _manudesc2
NEW IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by95452298631690651994603185563692.457.824.8
iowa by96513396653594501597654283655493.259.235.8
mcgill by7242278642399644199264308471418652.631.2
microsoft by755736944836987031946444847139896237.2
prosonus8644229530279477289256158875439156.427
vitus by44341283432588772685541499584079.853.223.4
74.645.62690.641.629.894.870.427.291.453.816.495.262.237.484.867.642.688.566666666756.866666666729.9
Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_20005_manudesc
NEW IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by268024350018.6
iowa by43241126350027.8
mcgill by34431034400032.2
microsoft by34312340360032.8
prosonus2431180420023
vitus by2730160380022.2
0032.40032.20017.8004.20032.40037.60026.1
Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_40001_manudesc
NEW IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by1950916009.8
iowa by4222114220020.2
mcgill by39341013200023.2
microsoft by4426109250022.8
prosonus314180250021
vitus by18227030010
0034.80028.40010.4004.2007.60021.60017.8333333333
OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10005
IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus
sol by10035219869399653208659321007944965931.2
iowa by974122914638985728191069067367944.226
mcgill by715021843514100583697694494724889.256.832.6
microsoft by7054389938141006848997345978769936442.8
prosonus7954259637229675419358339680459260.833.2
vitus by352713843215100694094583295634181.649.828.2
70.445.223.892.635.417.29765.441.296.256.829.879.254.833.695.47748.488.466666666755.766666666732.3333333333
OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10001
IRMFSP t=0.01 maxdesc=10soliowamcgillmicrosoftprosonusvitus
sol by99352198693998612595603210079439860.832
iowa by99432298494198422481402291683893.448.429.4
mcgill by916227883312100563399694298755195.25933
microsoft by94653897341410068491007444998966986642.2
prosonus92632596362094744092542998824794.461.832.2
vitus by84602493381997694095532898643893.456.829.8
9258.627.294.635.217.297.465.841.896.653.227.894.661.435.697.278.64995.458.833.1
OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_20005
IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus
sol by100362198703896542286593310078429659.431.2
iowa by97462591493698482619849070367944.225.4
mcgill by715021843510100543697704294734789.256.431.2
microsoft by705331993861007650997449978969936641
prosonus795625963722964823936035968447925730.4
vitus by352714843314100462094573095663981.645.823.4
70.446.423.292.635.814.69757.833.496.254.629.879.255.433.495.478.848.288.466666666754.830.4333333333
OK Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_40001
IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus
sol by99532698592498492095551610081229859.421.6
iowa by9954269857139837218140591753093.452.619
mcgill by915219883910100321699521598703595.24919
microsoft by946935975119100712410065149990379869.225.8
prosonus92491796452694411192271198813194.448.619.2
vitus by84511993522397431095291398591693.446.816.2
925523.294.64820.897.454.216.496.634.816.294.654.213.297.279.43195.454.266666666720.1333333333
Test paramtre Fsinertie avec remove Dei,
OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10005_manudesc
IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by1006047986942906029986438100794497.266.440
iowa by95462296574093593399552586653793.856.431.4
mcgill by66503084584196563696684294694287.260.238.2
microsoft by76553999734798754999744999916894.273.650.4
prosonus83602596644794774191603490744790.86738.8
vitus by3427148454388865328550279965387852.229.8
70.847.62692.661.84494.868.640.8915731.898.265.238.493.875.647.690.262.633333333338.1
NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10001_manudesc
IRMFSP t=0.01 maxdesc=10soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by996046996943946634956034100784697.466.640.6
iowa by96482298574098482799572688684095.855.631
mcgill by69492988574193523197684196714388.659.437
microsoft by78563997644199765099734799906994.471.849.2
prosonus90622496644994774192633496805393.669.240.2
vitus by55382193604094693491532998643886.256.832.4
77.650.62794.66143.496.869.641.693.656.43197.664.437.295.877.450.292.666666666763.233333333338.4
OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_20005_manudesc
IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by1006350986943905729986535100844397.267.640
iowa by95623996523193543199552586633393.857.231.8
mcgill by66513084523596523296693794724787.259.236.2
microsoft by76574599664298814899754899936794.274.450
prosonus83502396423394813591593390764790.861.634.2
vitus by3427138443278867328555319967387851.828.2
70.849.43092.653.237.494.87037.89155.431.298.266.236.693.877.647.490.261.966666666736.7333333333
OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_40001_manudesc
IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by995842995821944522955711100792397.459.423.8
iowa by96633898642098342299581488712695.85824
mcgill by69472688513593351997531796703888.651.227
microsoft by78573797573499752599641699883994.468.230.2
prosonus90511996433294481492391096783293.651.821.4
vitus by55311693512994491191402098572286.245.619.6
77.649.827.294.65234.496.858.818.293.638.618.697.657.81695.877.231.692.666666666755.724.3333333333
LODO
Flat GaussianMETAeval_CLfg_Dbmixed_LODO_nomanudesc _gauss _anadisc _gaussanadisc
LODOsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=40without9972449770371007048100785995816098834998.166666666775.666666666749.5
IRMFSP t=0,01, max=40 + gaussianitywithout987038865727996952100825999926199865496.83333333337648.5
IRMFSP t=0.01, max=40 + anadiscwithout100744699663810078571008357947957100845798.833333333377.333333333352
IRMFSP t=0.01, max=40 + gaussianity + anadiscwithout997744936233997958100886499886299875898.166666666780.166666666753.1666666667
Flat GaussianMETAeval_CLfg_Dbmixed_LODO _gauss _anadisc _gaussanadisc
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmLODOsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=40without987642100725098735397855898835996785497.833333333377.833333333352.6666666667
IRMFSP t=0,01, max=40 + gaussianitywithout98763699724099715110081599993629984569979.550.6666666667
IRMFSP t=0.01, max=40 + anadiscwithout98674399664899775799855693815996765897.333333333375.333333333353.5
IRMFSP t=0.01, max=40 + gaussianity + anadiscwithout9981411006541998151100905599896095796498.666666666780.833333333352
Hierarchical GaussianMETAeval_CLhg_Dbmixed_LODO 10-001 20-001 gaussanadisc
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmLODOsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=10without987453100574898685097795798785196755297.833333333371.833333333351.8333333333
IRMFSP t=0,01, max=10 + gaussianity + anadiscwithout9973531007362996648100846599825395745898.666666666775.333333333356.5
IRMFSP t=0.01 max=20without988155100635398755497907098855596845697.833333333379.666666666757.1666666667
IRMFSP t=0.01 max=20 + gaussianity + anadiscwithout9983591007366997756100917399926295836498.666666666783.166666666763.3333333333
Hierarchical GaussianMETAeval_CLhg_Dbmixed_LODO _gauss _anadisc _gaussanadisc
LODOsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=40without9977499766541007049100937295865798855398.166666666779.555.6666666667
IRMFSP t=0,01, max=40 + gaussianitywithout987549856441997651100906399896199825496.666666666779.333333333353.1666666667
IRMFSP t=0.01 max=40 + anadiscwithout100785699716110081591009275948864100926998.833333333383.666666666764
IRMFSP t=0.01 max=40 + gaussianity + anadiscwithout998458936854998560100896999936699876798.166666666784.333333333362.3333333333
Hierarchical GaussianMETAeval_CLhg_Dbmixed_LODO _gauss _anadisc _gaussanadisc
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmLODOsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=40without987552100705498755297866498865996866097.833333333379.666666666756.8333333333
IRMFSP t=0,01, max=40 + gaussianitywithout98785499735699815710088619989599978499981.166666666756
IRMFSP t=0.01, max=40 + anadiscwithout98755999675899815899876893866496896897.333333333380.833333333362.5
IRMFSP t=0.01, max=40 + gaussianity + anadiscwithout9984601007364998462100886399936595866998.666666666784.666666666763.8333333333
Flat KPPV (IRMFSP t=0.01 max=40) + anadiscdans newresult
LODOsoliowamcgillmicrosoftprosonusvitus
K=1without99724997614510074451007762947540100836098.333333333373.666666666750.1666666667
K=3without100735098634510075471008165947938100846498.666666666775.833333333351.5
K=5without100745099624510076471008065947937100856498.83333333337651.3333333333
K=10without100744899624610077461008363948039100856698.833333333376.833333333351.3333333333
K=20without100744599614510077441008564948039100866898.833333333377.166666666750.8333333333
rem: son invers de cor et corn
Hier KPPV (IRMFSP t=0.01 max=40) + anadisc
LODOsoliowamcgillmicrosoftprosonusvitus
K=1without99755297746010081591008970948764100916598.333333333382.833333333361.6666666667
K=3without100765398736310081591009072948766100926898.666666666783.166666666763.5
K=5without100755399736410081581009272948865100927098.833333333383.563.6666666667
K=10without100765499746510080581009374948864100927098.833333333383.833333333364.1666666667
gaussianity K=10 * IRMFSP t=0.01 max=20 remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scm + gaussianity + anadiscwithout9882591007565997754100917199916195826398.58362.1666666667
K=20without100765299746610079571009375948862100927198.833333333383.666666666763.8333333333
Decision Treedans newresult
LODOsoliowamcgillmicrosoftprosonusvitus
* BERT >0.01without96524197553890643192693998673795694194.666666666762.666666666737.8333333333
weka-j48 (-M10)without583259375836674173396649063.539
weka-PART (-M10)without592761446234684583417144067.333333333339.1666666667
Flat gaussian
ALLsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=401009795999594100979410099959995901009795
IRMFSP t=0,01, max=40 + gaussianity
IRMFSP t=0,01, max=40 + gaussianity + anadiscwith
Hier gaussian
ALLsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=40with
IRMFSP t=0,01, max=40 + gaussianitywith
IRMFSP t=0,01, max=40 + gaussianity + anadiscwith
soliowamcgillmicrosoftprosonusvitusao
gausshier
kppvhier K=10
Feuil2
0soliowamcgillmicrosoftprosonusvitus
sol by099999495100
iowa by96098989988
mcgill by69880939796
microsoft by78979909999
prosonus90969492096
vitus by55939491980
0soliowamcgillmicrosoftprosonusvitus
sol by05858455779
iowa by63064345871
mcgill by47510355370
microsoft by57577506488
prosonus51434839078
vitus by31514940570
0soliowamcgillmicrosoftprosonusvitus
sol by04221221123
iowa by38020221426
mcgill by26350191738
microsoft by37342501639
prosonus19321410032
vitus by16291120220
Feuil2
000000
000000
000000
000000
000000
000000
sol by
iowa by
mcgill by
microsoft by
prosonus
vitus by
Feuil3
000000
000000
000000
000000
000000
000000
sol by
iowa by
mcgill by
microsoft by
prosonus
vitus by
000000
000000
000000
000000
000000
000000
sol by
iowa by
mcgill by
microsoft by
prosonus
vitus by
Database:SOL
Evaluation:66%/33% (50 sets)
Algorithm:FGC manudesc
T1T2T3
LDA968986
CFS weka99.0 (0.5)93.2 (0.8)60.8 (12.9)
IRMFSP (t=0.01, nbdescmax=20)99.2 (0.4)95.8 (1.2)95.1 (1.2)
Databases:all
Evaluation:O2O
Feature Selection:IRMFSP t=0.05, maxdesc=10 manudesc
T1T2T3
F-GC895730
H-GC936338
Databases:all
Evaluation:LODO
Feature Selection:IRMFSP t=0.01, maxdesc=40 manudesc
T1T2T3
F-GC987853
F-GC (G+LDA)998152
H-GC988057
H-GC (G+LDA)998564
6. EvaluationResults O2O (II) O2O (mean value over the 30 (6*5) experiments)Discussionlow recognition rate for O2O compared to 66%/33% -> problem of generalization ? system mainly learns the instrument instance instead of the instrument (each database contains a single instance of an instrument)
LODO (mean value over the 6 Left Out databases)Goal: to increase the number of instances of each instrument How: by combining several databases
Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization
5. EvaluationConfusion matrix Low confusion between sustained / non-sustained sounds
_1120556391.xls
exportexcelmeanall
original class
pianoguitarharpviola-pizzbass-pizzcello-pizzviolin-pizzviolabasscelloviolinfrench-horncornettrombonetrumpettubaflutepiccolorecorderbassoonclarinetenglish-hornoboe
recognized classpiano3634422112
guitar29481218
harp2422682352
viola-pizz16851942
bass-pizz137612
cello-pizz22024187111
violin-pizz316188243
viola44514
bass293461213
cello1375681612115
violin143551101
french-horn115013115452
cornet2130313214
trombone1515497112
trumpet1471061322
tuba2223779
flute125234177101012324
piccolo1471558
recorder12459
bassoon14512212381121
clarinet11217245110461020
english-horn111331124
oboe41493131144958
number of sounds146159130541861709722528035626424253202157140323833920321241184
5. EvaluationConfusion matrix Largest confusions inside each instrument family
_1120556391.xls
exportexcelmeanall
original class
pianoguitarharpviola-pizzbass-pizzcello-pizzviolin-pizzviolabasscelloviolinfrench-horncornettrombonetrumpettubaflutepiccolorecorderbassoonclarinetenglish-hornoboe
recognized classpiano3634422112
guitar29481218
harp2422682352
viola-pizz16851942
bass-pizz137612
cello-pizz22024187111
violin-pizz316188243
viola44514
bass293461213
cello1375681612115
violin143551101
french-horn115013115452
cornet2130313214
trombone1515497112
trumpet1471061322
tuba2223779
flute125234177101012324
piccolo1471558
recorder12459
bassoon14512212381121
clarinet11217245110461020
english-horn111331124
oboe41493131144958
number of sounds146159130541861709722528035626424253202157140323833920321241184
5. EvaluationConfusion matrix Lowest recognition rates -> smallest training sets
_1120556391.xls
exportexcelmeanall
original class
pianoguitarharpviola-pizzbass-pizzcello-pizzviolin-pizzviolabasscelloviolinfrench-horncornettrombonetrumpettubaflutepiccolorecorderbassoonclarinetenglish-hornoboe
recognized classpiano3634422112
guitar29481218
harp2422682352
viola-pizz16851942
bass-pizz137612
cello-pizz22024187111
violin-pizz316188243
viola44514
bass293461213
cello1375681612115
violin143551101
french-horn115013115452
cornet2130313214
trombone1515497112
trumpet1471061322
tuba2223779
flute125234177101012324
piccolo1471558
recorder12459
bassoon14512212381121
clarinet11217245110461020
english-horn111331124
oboe41493131144958
number of sounds146159130541861709722528035626424253202157140323833920321241184
5. EvaluationConfusion matrix Confusion piano / guitar-harp
_1120556391.xls
exportexcelmeanall
original class
pianoguitarharpviola-pizzbass-pizzcello-pizzviolin-pizzviolabasscelloviolinfrench-horncornettrombonetrumpettubaflutepiccolorecorderbassoonclarinetenglish-hornoboe
recognized classpiano3634422112
guitar29481218
harp2422682352
viola-pizz16851942
bass-pizz137612
cello-pizz22024187111
violin-pizz316188243
viola44514
bass293461213
cello1375681612115
violin143551101
french-horn115013115452
cornet2130313214
trombone1515497112
trumpet1471061322
tuba2223779
flute125234177101012324
piccolo1471558
recorder12459
bassoon14512212381121
clarinet11217245110461020
english-horn111331124
oboe41493131144958
number of sounds146159130541861709722528035626424253202157140323833920321241184
5. EvaluationConfusion matrix Cross-family confusions
_1120556391.xls
exportexcelmeanall
original class
pianoguitarharpviola-pizzbass-pizzcello-pizzviolin-pizzviolabasscelloviolinfrench-horncornettrombonetrumpettubaflutepiccolorecorderbassoonclarinetenglish-hornoboe
recognized classpiano3634422112
guitar29481218
harp2422682352
viola-pizz16851942
bass-pizz137612
cello-pizz22024187111
violin-pizz316188243
viola44514
bass293461213
cello1375681612115
violin143551101
french-horn115013115452
cornet2130313214
trombone1515497112
trumpet1471061322
tuba2223779
flute125234177101012324
piccolo1471558
recorder12459
bassoon14512212381121
clarinet11217245110461020
english-horn111331124
oboe41493131144958
number of sounds146159130541861709722528035626424253202157140323833920321241184
5. EvaluationConfusion matrixCross-family confusions
Cornet -> BassoonCornet -> English-horn Flute -> ClarinetOboe -> FluteTrombone -> Flute
Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization
5. EvaluationMain selected features
Par FSA (IRMFSP)
Graph1
Feuil1
LAST UPDATE 2003/04/18
TODOOno2One en fat gaussian nogaussianity,noanadisc (ca risque d'tre long: 36 expriences * 3 niveaux)
LODO fatgaussian et hiergaussian gaussianity,noanadisc
Diffrence avec Classresult2: erase des pizz cello dans sustained dans Prosonus
66/33
Test Feature Selection
nogaussianty, noanadiscMETAeval_Fsinertiecfs_Dbsol (_manudesc)
sol
CFS99.53 (0.47)92.56 (0.63)10.95 (3.68)
IRMFSP t=0.01 maxdesc=2099.65 (0.31)96.2 (1.28)95.95 (0.91)
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmCFS99.01 (0.50)93.25 (0.80)60.76 (12.87)
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmIRMFSP t=0.01 maxdesc=2099.25 (0.37)95.81 (1.22)95.17 (1.26)
Ono2one
Test paramtre FSinertie
Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_10005
NEW IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus
sol by15151931390023.8
iowa by33351546450034.8
mcgill by31111931410026.6
microsoft by3393049350031.2
prosonus27232715460027.6
vitus by23202614390024.4
0029.40015.60026.60016.40039.20041.20028.0666666667
Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_20005
NEW IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus
sol by216023320016.4
iowa by35161124360024.4
mcgill by43141033430028.6
microsoft by37121840310027.6
prosonus3826180390024.2
vitus by3524140370022
0037.60019.40014.4004.20031.40036.20023.8666666667
Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_40001
NEW IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus
sol by1150912007.4
iowa by3822114220019.4
mcgill by42151013210020.2
microsoft by44121215260021.8
prosonus30880250014.2
vitus by23108090010
0035.40011.20011004.200100021.20015.5
Test paramtre Fsinertie avec remove Dei,
Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_10005_manudesc _manudesc1 _manudesc2
NEW IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by95452298631690651994603185563692.457.824.8
iowa by96513396653594501597654283655493.259.235.8
mcgill by7242278642399644199264308471418652.631.2
microsoft by755736944836987031946444847139896237.2
prosonus8644229530279477289256158875439156.427
vitus by44341283432588772685541499584079.853.223.4
74.645.62690.641.629.894.870.427.291.453.816.495.262.237.484.867.642.688.566666666756.866666666729.9
Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_20005_manudesc
NEW IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by268024350018.6
iowa by43241126350027.8
mcgill by34431034400032.2
microsoft by34312340360032.8
prosonus2431180420023
vitus by2730160380022.2
0032.40032.20017.8004.20032.40037.60026.1
Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_40001_manudesc
NEW IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by1950916009.8
iowa by4222114220020.2
mcgill by39341013200023.2
microsoft by4426109250022.8
prosonus314180250021
vitus by18227030010
0034.80028.40010.4004.2007.60021.60017.8333333333
OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10005
IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus
sol by10035219869399653208659321007944965931.2
iowa by974122914638985728191069067367944.226
mcgill by715021843514100583697694494724889.256.832.6
microsoft by7054389938141006848997345978769936442.8
prosonus7954259637229675419358339680459260.833.2
vitus by352713843215100694094583295634181.649.828.2
70.445.223.892.635.417.29765.441.296.256.829.879.254.833.695.47748.488.466666666755.766666666732.3333333333
OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10001
IRMFSP t=0.01 maxdesc=10soliowamcgillmicrosoftprosonusvitus
sol by99352198693998612595603210079439860.832
iowa by99432298494198422481402291683893.448.429.4
mcgill by916227883312100563399694298755195.25933
microsoft by94653897341410068491007444998966986642.2
prosonus92632596362094744092542998824794.461.832.2
vitus by84602493381997694095532898643893.456.829.8
9258.627.294.635.217.297.465.841.896.653.227.894.661.435.697.278.64995.458.833.1
OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_20005
IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus
sol by100362198703896542286593310078429659.431.2
iowa by97462591493698482619849070367944.225.4
mcgill by715021843510100543697704294734789.256.431.2
microsoft by705331993861007650997449978969936641
prosonus795625963722964823936035968447925730.4
vitus by352714843314100462094573095663981.645.823.4
70.446.423.292.635.814.69757.833.496.254.629.879.255.433.495.478.848.288.466666666754.830.4333333333
OK Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_40001
IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus
sol by99532698592498492095551610081229859.421.6
iowa by9954269857139837218140591753093.452.619
mcgill by915219883910100321699521598703595.24919
microsoft by946935975119100712410065149990379869.225.8
prosonus92491796452694411192271198813194.448.619.2
vitus by84511993522397431095291398591693.446.816.2
925523.294.64820.897.454.216.496.634.816.294.654.213.297.279.43195.454.266666666720.1333333333
Test paramtre Fsinertie avec remove Dei,
OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10005_manudesc
IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by1006047986942906029986438100794497.266.440
iowa by95462296574093593399552586653793.856.431.4
mcgill by66503084584196563696684294694287.260.238.2
microsoft by76553999734798754999744999916894.273.650.4
prosonus83602596644794774191603490744790.86738.8
vitus by3427148454388865328550279965387852.229.8
70.847.62692.661.84494.868.640.8915731.898.265.238.493.875.647.690.262.633333333338.1
NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10001_manudesc
IRMFSP t=0.01 maxdesc=10soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by996046996943946634956034100784697.466.640.6
iowa by96482298574098482799572688684095.855.631
mcgill by69492988574193523197684196714388.659.437
microsoft by78563997644199765099734799906994.471.849.2
prosonus90622496644994774192633496805393.669.240.2
vitus by55382193604094693491532998643886.256.832.4
77.650.62794.66143.496.869.641.693.656.43197.664.437.295.877.450.292.666666666763.233333333338.4
OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_20005_manudesc
IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by1006350986943905729986535100844397.267.640
iowa by95623996523193543199552586633393.857.231.8
mcgill by66513084523596523296693794724787.259.236.2
microsoft by76574599664298814899754899936794.274.450
prosonus83502396423394813591593390764790.861.634.2
vitus by3427138443278867328555319967387851.828.2
70.849.43092.653.237.494.87037.89155.431.298.266.236.693.877.647.490.261.966666666736.7333333333
OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_40001_manudesc
IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by995842995821944522955711100792397.459.423.8
iowa by96633898642098342299581488712695.85824
mcgill by69472688513593351997531796703888.651.227
microsoft by78573797573499752599641699883994.468.230.2
prosonus90511996433294481492391096783293.651.821.4
vitus by55311693512994491191402098572286.245.619.6
77.649.827.294.65234.496.858.818.293.638.618.697.657.81695.877.231.692.666666666755.724.3333333333
LODO
Flat GaussianMETAeval_CLfg_Dbmixed_LODO_nomanudesc _gauss _anadisc _gaussanadisc
LODOsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=40without9972449770371007048100785995816098834998.166666666775.666666666749.5
IRMFSP t=0,01, max=40 + gaussianitywithout987038865727996952100825999926199865496.83333333337648.5
IRMFSP t=0.01, max=40 + anadiscwithout100744699663810078571008357947957100845798.833333333377.333333333352
IRMFSP t=0.01, max=40 + gaussianity + anadiscwithout997744936233997958100886499886299875898.166666666780.166666666753.1666666667
Flat GaussianMETAeval_CLfg_Dbmixed_LODO _gauss _anadisc _gaussanadisc
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmLODOsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=40without987642100725098735397855898835996785497.833333333377.833333333352.6666666667
IRMFSP t=0,01, max=40 + gaussianitywithout98763699724099715110081599993629984569979.550.6666666667
IRMFSP t=0.01, max=40 + anadiscwithout98674399664899775799855693815996765897.333333333375.333333333353.5
IRMFSP t=0.01, max=40 + gaussianity + anadiscwithout9981411006541998151100905599896095796498.666666666780.833333333352
Hierarchical GaussianMETAeval_CLhg_Dbmixed_LODO 10-001 20-001 gaussanadisc
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmLODOsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=10without987453100574898685097795798785196755297.833333333371.833333333351.8333333333
IRMFSP t=0,01, max=10 + gaussianity + anadiscwithout9973531007362996648100846599825395745898.666666666775.333333333356.5
IRMFSP t=0.01 max=20without988155100635398755497907098855596845697.833333333379.666666666757.1666666667
IRMFSP t=0.01 max=20 + gaussianity + anadiscwithout9983591007366997756100917399926295836498.666666666783.166666666763.3333333333
Hierarchical GaussianMETAeval_CLhg_Dbmixed_LODO _gauss _anadisc _gaussanadisc
LODOsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=40without9977499766541007049100937295865798855398.166666666779.555.6666666667
IRMFSP t=0,01, max=40 + gaussianitywithout987549856441997651100906399896199825496.666666666779.333333333353.1666666667
IRMFSP t=0.01 max=40 + anadiscwithout100785699716110081591009275948864100926998.833333333383.666666666764
IRMFSP t=0.01 max=40 + gaussianity + anadiscwithout998458936854998560100896999936699876798.166666666784.333333333362.3333333333
Hierarchical GaussianMETAeval_CLhg_Dbmixed_LODO _gauss _anadisc _gaussanadisc
remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmLODOsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=40without987552100705498755297866498865996866097.833333333379.666666666756.8333333333
IRMFSP t=0,01, max=40 + gaussianitywithout98785499735699815710088619989599978499981.166666666756
IRMFSP t=0.01, max=40 + anadiscwithout98755999675899815899876893866496896897.333333333380.833333333362.5
IRMFSP t=0.01, max=40 + gaussianity + anadiscwithout9984601007364998462100886399936595866998.666666666784.666666666763.8333333333
Flat KPPV (IRMFSP t=0.01 max=40) + anadiscdans newresult
LODOsoliowamcgillmicrosoftprosonusvitus
K=1without99724997614510074451007762947540100836098.333333333373.666666666750.1666666667
K=3without100735098634510075471008165947938100846498.666666666775.833333333351.5
K=5without100745099624510076471008065947937100856498.83333333337651.3333333333
K=10without100744899624610077461008363948039100856698.833333333376.833333333351.3333333333
K=20without100744599614510077441008564948039100866898.833333333377.166666666750.8333333333
rem: son invers de cor et corn
Hier KPPV (IRMFSP t=0.01 max=40) + anadisc
LODOsoliowamcgillmicrosoftprosonusvitus
K=1without99755297746010081591008970948764100916598.333333333382.833333333361.6666666667
K=3without100765398736310081591009072948766100926898.666666666783.166666666763.5
K=5without100755399736410081581009272948865100927098.833333333383.563.6666666667
K=10without100765499746510080581009374948864100927098.833333333383.833333333364.1666666667
gaussianity K=10 * IRMFSP t=0.01 max=20 remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scm + gaussianity + anadiscwithout9882591007565997754100917199916195826398.58362.1666666667
K=20without100765299746610079571009375948862100927198.833333333383.666666666763.8333333333
Decision Treedans newresult
LODOsoliowamcgillmicrosoftprosonusvitus
* BERT >0.01without96524197553890643192693998673795694194.666666666762.666666666737.8333333333
weka-j48 (-M10)without583259375836674173396649063.539
weka-PART (-M10)without592761446234684583417144067.333333333339.1666666667
Flat gaussian
ALLsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=401009795999594100979410099959995901009795
IRMFSP t=0,01, max=40 + gaussianity
IRMFSP t=0,01, max=40 + gaussianity + anadiscwith
Hier gaussian
ALLsoliowamcgillmicrosoftprosonusvitus
IRMFSP t=0,01, max=40with
IRMFSP t=0,01, max=40 + gaussianitywith
IRMFSP t=0,01, max=40 + gaussianity + anadiscwith
soliowamcgillmicrosoftprosonusvitusao
gausshier
kppvhier K=10
Feuil2
0soliowamcgillmicrosoftprosonusvitus
sol by099999495100
iowa by96098989988
mcgill by69880939796
microsoft by78979909999
prosonus90969492096
vitus by55939491980
0soliowamcgillmicrosoftprosonusvitus
sol by05858455779
iowa by63064345871
mcgill by47510355370
microsoft by57577506488
prosonus51434839078
vitus by31514940570
0soliowamcgillmicrosoftprosonusvitus
sol by04221221123
iowa by38020221426
mcgill by26350191738
microsoft by37342501639
prosonus19321410032
vitus by16291120220
Feuil2
000000
000000
000000
000000
000000
000000
sol by
iowa by
mcgill by
microsoft by
prosonus
vitus by
Feuil3
000000
000000
000000
000000
000000
000000
sol by
iowa by
mcgill by
microsoft by
prosonus
vitus by
Feuil4
000000
000000
000000
000000
000000
000000
sol by
iowa by
mcgill by
microsoft by
prosonus
vitus by
Database:SOL
Evaluation:66%/33% (50 sets)
Algorithm:FGC manudesc
T1T2T3
LDA968986
CFS weka99.0 (0.5)93.2 (0.8)60.8 (12.9)
IRMFSP (t=0.01, nbdescmax=20)99.2 (0.4)95.8 (1.2)95.1 (1.2)
Databases:all
Evaluation:O2O
Feature Selection:IRMFSP t=0.05, maxdesc=10 manudesc
T1T2T3
F-GC895730
H-GC936338
Databases:all
Evaluation:LODO
Feature Selection:IRMFSP t=0.01, maxdesc=40 manudesc
T1T2T3
F-GC987853
F-GC (G+LDA)998152
H-GC988057
H-GC (G+LDA)998564
soliowamcgillmicrosoftprosonusvitus manudesc inertie maxdesc-t=0.01 nogaussnoanadisc
sust/non-sustnon-sustpluckstringpizzstringsustbowedstringbrassreedairreedsingledouble
temporal increasetemporal decreasetemporal decreasetemporal decrease
temporal decreasetemporal centroid
temporal log-attack
spectral centroidspectral centroidspectal spread +stdspectral spreadspectral spreadspectral centroidspectral centroidspectral skewnessspectral centroid
spectral spreadspectral spreadspectral slopesharpnessspectral skewnessspectral spreadspectral skewnessspectral kurtosis + stdspectral spread
spectral skewnessspectral variation + stdspectral kurtosisspectral kurtosis + stdsharpnessspectrall kurtosis stdspectral slopespectral skewness
spectral slopespectral variationspectrall skewness stdspectral variation std
spectral variationspectral decrease stdspectral kurtosis
harmonic deviationtristimulus + stdtristimulusharmonic deviationtristimulusnoisinessharmonic deviationtristimulus
tristimuls stdharmonic deviation
mfcc2,6 stdvarious mfccvarious mfccvarious mfccmfcc3,4,6xcorr 3, 6, 8xcorr3xcorr3
sust./non-sustsamong non-sust.among sust.among bowed-stringamong brassamong air reedsamong sing/dbl reeds
temporal increasetemporal decreasetemporal decreasetemporal decrease
temporal decreasetemporal centroid
temporal log-attack
spectral centroidspectral centroidspectral spreadspectral centroidspectral centroidspectral skewnessspectral centroid
spectral spreadspectral spreadspectral skewnessspectral spreadspectral skewnessspectral kurtosis + stdspectral spread
spectral skewnessspectral kurtosis + stdsharpnessspectrall kurtosis stdspectral slopespectral skewness
spectral variationspectrall skewness stdspectral variation std
spectral decrease stdspectral kurtosis
harmonic deviationharmonic deviationtristimulusnoisinessharmonic deviationtristimulus
tristimuls stdharmonic deviation
mfcc2,6 stdvarious mfccmfcc3,4,6xcorr 3, 6, 8xcorr3xcorr3
5. EvaluationMain selected featuresPar arbre de dcision (C4.5)
DTg_decr 0.079858 AND
DHg_mod_am > 0.000158 AND
DPi_specloud_m21-mm > 0.026443 AND
DPi_specloud_m5-mm -0.000202: vln (66.0/8.0)
Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization
7. Instrument Class Similarity
Goal: check that the proposed tree structure corresponds to natural class organization How ?Most people use Martin hierarchy1) check the grouping among the decision trees leaves2) MDS ?
?
MDS on acoustic features ? [Herrera AES114th] Compute the dissimilarity between each class How ?Compute the between-group F-matrix between class modelsObserve the dissimilarity between the classesHow ? MDS (Multi-dimensional scaling) analysisMDS preserve as much as possible distances between the data and allows representing them into a lower dimensional spaceusually MDS is used for representing dissimilarity judgements (Timbre similarity), used here on acoustic featuresMDS (Kruskals STRESS formula 1 scaling method)3 dimensional space
7. Instrument Class Similarity
Clusters ?non-sustained sounds
Feuil1
PIANPiano
GUIGuitar
HARPHarp
VLNPViolin pizz
VLAPViola pizz
CELLPCello pizz
DBLPDouble pizz
VLNViolin pizz
VLAViola pizz
CELLCello pizz
DBLDouble pizz
TRPUTrumpet
CORCornet
TBTBTrombone
FHORFrench-horn
TUBBTuba
FLTUFlute
PICCPiccolo
RECORecorder
CLAClarinet
SAXTETenor sax
SAXALAlto sax
SAXSOSoprano sax
ACCAccordeon
OBOEOboe
BSBassoon
EHOREnglish-horn
Feuil2
Feuil3
7. Instrument Class Similarity
Clusters ?non-sustained soundsBowed-strings sounds
Feuil1
PIANPiano
GUIGuitar
HARPHarp
VLNPViolin pizz
VLAPViola pizz
CELLPCello pizz
DBLPDouble pizz
VLNViolin pizz
VLAViola pizz
CELLCello pizz
DBLDouble pizz
TRPUTrumpet
CORCornet
TBTBTrombone
FHORFrench-horn
TUBBTuba
FLTUFlute
PICCPiccolo
RECORecorder
CLAClarinet
SAXTETenor sax
SAXALAlto sax
SAXSOSoprano sax
ACCAccordeon
OBOEOboe
BSBassoon
EHOREnglish-horn
Feuil2
Feuil3
7. Instrument Class Similarity
Clusters ?non-sustained soundsBowed-strings soundsBrass sounds (TRPU ?)
Feuil1
PIANPiano
GUIGuitar
HARPHarp
VLNPViolin pizz
VLAPViola pizz
CELLPCello pizz
DBLPDouble pizz
VLNViolin pizz
VLAViola pizz
CELLCello pizz
DBLDouble pizz
TRPUTrumpet
CORCornet
TBTBTrombone
FHORFrench-horn
TUBBTuba
FLTUFlute
PICCPiccolo
RECORecorder
CLAClarinet
SAXTETenor sax
SAXALAlto sax
SAXSOSoprano sax
ACCAccordeon
OBOEOboe
BSBassoon
EHOREnglish-horn
Feuil2
Feuil3
7. Instrument Class Similarity
Clusters ?non-sustained soundsBowed-strings soundsBrass sounds (TRPU ?)mix between single/double reeds and brass instruments
Feuil1
PIANPiano
GUIGuitar
HARPHarp
VLNPViolin pizz
VLAPViola pizz
CELLPCello pizz
DBLPDouble pizz
VLNViolin pizz
VLAViola pizz
CELLCello pizz
DBLDouble pizz
TRPUTrumpet
CORCornet
TBTBTrombone
FHORFrench-horn
TUBBTuba
FLTUFlute
PICCPiccolo
RECORecorder
CLAClarinet
SAXTETenor sax
SAXALAlto sax
SAXSOSoprano sax
ACCAccordeon
OBOEOboe
BSBassoon
EHOREnglish-horn
Feuil2
Feuil3
7. Instrument Class Similarity
Dimension 1: separate sustained sounds / non sustained soundsnegative values: PIAN, GUI, HARP, VLNP, VLAP, CELLP, DBLP-> attack-time, decrease time
Feuil1
PIANPiano
GUIGuitar
HARPHarp
VLNPViolin pizz
VLAPViola pizz
CELLPCello pizz
DBLPDouble pizz
VLNViolin pizz
VLAViola pizz
CELLCello pizz
DBLDouble pizz
TRPUTrumpet
CORCornet
TBTBTrombone
FHORFrench-horn
TUBBTuba
FLTUFlute
PICCPiccolo
RECORecorder
CLAClarinet
SAXTETenor sax
SAXALAlto sax
SAXSOSoprano sax
ACCAccordeon
OBOEOboe
BSBassoon
EHOREnglish-horn
Feuil2
Feuil3
7. Instrument Class Similarity
Dimension 1: separate sustained sounds / non sustained soundsnegative values: PIAN, GUI, HARP, VLNP, VLAP, CELLP, DBLP-> attack-time, decrease timeDimension 2: brightnessdark sounds: TUBB, BSN, TBTB, FHORbright sounds: PICC, CLA, FLUTproblem DBL ?
Feuil1
PIANPiano
GUIGuitar
HARPHarp
VLNPViolin pizz
VLAPViola pizz
CELLPCello pizz
DBLPDouble pizz
VLNViolin pizz
VLAViola pizz
CELLCello pizz
DBLDouble pizz
TRPUTrumpet
CORCornet
TBTBTrombone
FHORFrench-horn
TUBBTuba
FLTUFlute
PICCPiccolo
RECORecorder
CLAClarinet
SAXTETenor sax
SAXALAlto sax
SAXSOSoprano sax
ACCAccordeon
OBOEOboe
BSBassoon
EHOREnglish-horn
Feuil2
Feuil3
7. Instrument Class Similarity
Dimension 1: separate sustained sounds / non sustained soundsnegative values: PIAN, GUI, HARP, VLNP, VLAP, CELLP, DBLP-> attack-time, decrease timeDimension 2: brightnessdark sounds TUBB, BSN, TBTB, FHORbright sounds: PICC, CLA, FLUTproblem DBL ?Dimension 3: ?Separation of bowed stings (VLN, VLA, CELL, DBL)amount of modulation ?
Feuil1
PIANPiano
GUIGuitar
HARPHarp
VLNPViolin pizz
VLAPViola pizz
CELLPCello pizz
DBLPDouble pizz
VLNViolin pizz
VLAViola pizz
CELLCello pizz
DBLDouble pizz
TRPUTrumpet
CORCornet
TBTBTrombone
FHORFrench-horn
TUBBTuba
FLTUFlute
PICCPiccolo
RECORecorder
CLAClarinet
SAXTETenor sax
SAXALAlto sax
SAXSOSoprano sax
ACCAccordeon
OBOEOboe
BSBassoon
EHOREnglish-horn
Feuil2
Feuil3
Conclusion ?
ConclusionState of the artMartin [1999] 76% (family)39% for 14 instrumentsEronen [2001] 77% (family)35% for 16 instruments
This study 85% (family)64% for 23 instrumentsincreased recognition rates mainly explained by the use of new features
Perspectivesderive automatically the tree structure (analysis of decision tree ?)test other classification algorithm (GMM, SVM, )test the system for other sound classes (non-instrumental sounds, sound FX)extend the system to musical phrasesextend the system to polyphonic soundsextend the system to multi-sources sounds
Links:http://www.cuidado.mu http://www.cs.waikato.ac.nz/ml/weka/
Ircam - CUIDADO I.S.T.Ircam - CUIDADO I.S.T.