Description et Classification automatique des sons instrumentaux

  • Upload
    niles

  • View
    29

  • Download
    4

Embed Size (px)

DESCRIPTION

Description et Classification automatique des sons instrumentaux . Geoffroy Peeters Ircam (Analysis/Synthesis Team) [email protected] . 1. Introduction. trumpet. Musical Instrument Sound Classification numerous studies on sound classification - PowerPoint PPT Presentation

Citation preview

  • Description et Classification automatique des sons instrumentaux Geoffroy Peeters Ircam (Analysis/Synthesis Team) [email protected]

    [email protected] *

    1. IntroductionMusical Instrument Sound Classificationnumerous studies on sound classificationfew of them address the problem of generalization of sound sources (recognition of the same source possibly recorded in different conditions with various instrument manufacturers and players)Evaluation of the system performancetraining on a subset of the database, evaluation on the rest of the databasedoes not prove any applicability for the classification of sounds which do not belong to the database

    Martin [1999] 76% (family)39% for 14 instrumentsEronen [2001] 77% (family)35% for 16 instruments

    Goal of this studystudy large database classificationHow ? New classification systemExtract a large amount of featuresNew feature selection algorithmCompare flat and hierarchical gaussian classifier

    trumpet

    [email protected] *

    Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization

    [email protected] *

    2. Feature extraction

    Features for sound recognition: speech recognition community, previous studies on musical instrument sounds classification, results of psycho-acoustical studies.each feature set is supposed to perform well for a specific task

    Principle:1) extract a large set of features 2) filter the feature set a posteriori by a Feature Selection Algorithm

    [email protected] *

    2. Feature extraction Audio features Taxonomy Global descriptorsInstantaneous descriptorsTemporal modelingMean, VarianceModulation (pitch, energy)

    [email protected] *

    2. Feature extraction Audio features TaxonomyDT: temporal descriptorsDE: energy descriptorsDS: spectral descriptorsDH: harmonic descriptorsDP: perceptual descriptors

    [email protected] *

    2. Feature extraction DT/DE: Temporal/Energy descriptorssoundEnergyEnvelopDT.zero-crossing rateDT.auto-correlationDT.log-attack timeDT.temporal increaseDT.temporal decreaseDT.temporal centroid DT.effective duration

    DE.total energyDE.energy of harmonic partDE.energy of noise part

    [email protected] *

    2. Feature extraction DS: Spectral descriptorssoundWindowFFTDS.centroid, DS.spread, DS.skewness, DS.kurtosisDS.slope, DS.decrease, DS.roll-offDS.variation

    [email protected] *

    2. Feature extraction DH: Harmonic descriptorsDH.Centroid, DH.Spread, DH.Skewness, DH.KurtosisDH.Slope, DH.Decrease, DH.Roll-offDH.Variation

    DH.Fundamental frequencyDH.Noisiness, DH.OddEvenRatio, DH.InharmonicityDH.TristimulusDH.DeviationsoundWindowFFTSinudoidal model

    [email protected] *

    2. Feature extraction DP: Perceptual descriptors / DV: Various descriptorsDP.Centroid, DP.Spread, DP.Skewness, DP.KurtosisDP.Slope, DP.Decrease, DP.Roll-offDP.Variation

    DP.Loudness, RelativeSpecific LoudnessDP.Sharpness, DP.SpreadDP.Roughness, DP.FluctuationStrength

    DV.MFCC, DV.Delta-MFCC, DV.Delta-Delta-MFCCDV.SpectralFlatness, DV.SpectralCrestsoundWindowFFTPerceptionMid-ear fileringBark scaleMel scale

    [email protected] *

    2. Feature extraction Audio features designNo consensus on the use of amplitude and frequency scaleAll features are computed using the following scale:Frequency scale: linear / log / bark-bandsAmplitude scale: linear / power / log note: log(0.0)=-infty-> normalization 24bits

    Features must be independent of the recording level

    Normalization in linear, in power scale

    Normalization in logarithmic scale

    Features must be independent of the sampling rateMaximum frequency taken into account: 11025/2 HzResampling (for zcr, xcorr)

    [email protected] *

    Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization

    [email protected] *

    3. Feature selection algorithm (FSA)Problem: using a high number of featuressome features can be irrelevant for the given taskover fitting of the model to the training set (especially with LDA)classification models are difficult to interpret by human

    Goal of feature selection algorithm (FTA)find the minimal set of criterion 1) informative features with respect to the classes criterion 2) features that provide non redundant information

    Forms of feature selection algorithmembedded: the FSA is part of the classifierfilter: the FSA is distinct from the classifier and used before the classifierwrapper: the FSA makes use of the classification results

    [email protected] *

    3. Feature selection algorithm: IRMFSPCriterion 1 informative features with respect to the classesprinciple: feature values for sounds belonging to a specific class should be separated from the values for all the other classesmeasure: for a specific feature i ratio of the Between-class inertia B to the Total class inertia T

    Criterion 2 features that provide non redundant informationapply an orthogonalization process of the feature space after the selection of each new feature (Gram-Schmidt Orthogonalization)

    Inertia Ratio Maximization using Feature Space Projection

    [email protected] *

    3. Feature selection algorithm: IRMFSPExample : sustained/non-sustained sound separationcomputation of the BT ratio for each feature

    feature with the weakest ratio (r=6.9e-6) Specific loudness m8 mean

    feature with the highest ratio (r=0.58)Energy temporal decrease

    first three selected dimensions 1st dim: temporal decrease2nd dim: spectral centroid3rd dim: temporal increase

    [email protected] *

    Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization

    [email protected] *

    4. Feature transformation: LDALinear Discriminant Analysisfind linear combination among features in order to maximize discrimination between classes: F -> F

    Total inertia

    Between Class Inertia

    Transform initial feature space F by a transformation matrix U in order to maximize the ratio

    Solution: eigen vectors of associated to the eigen values (discriminative power)

    [email protected] *

    Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization

    [email protected] *

    5. Class modeling: flat classifiersFlat classifiersFlat gaussian classifier (F-GC)Flat= all classes considered on a same levelTraining: model each class k by a multi-dimensional gaussian pdf (mean vector, covariance matrix)Evaluation: Bayes formula

    Flat KNN classifier (F-KNN)instance-based algorithmassign to the input sound the majority class among its K Nearest Neighbors in the Feature SpaceEuclidean distance => weighting of the axes ?Apply to the output of the LDA (implicit weighting of the axes)

    [email protected] *

    5. Class modeling: hierarchical classifiersHierarchical classifiers (F-GC)Hierarchical gaussian classifier (H-GC)Training: a tree of flat gaussian classifier each node has its own FSA, FTA and F-GCTree construction is supervised (>< decision tree)Only the subset of sounds belonging to the classes of the current node are usedEvaluation: local probability decides which branch of the tree to followAdvantages of H-GCLearning facilities: it is easier to learn differences in a small subset of classesReduced class confusion: benefit from the higher recognition rate at the higher levels of the tree

    Hierarchical KNN classifier (H-KNN)

    [email protected] *

    5. Class modeling: hierarchical classifiersHierarchical classifiers (F-GC)Hierarchical gaussian classifier (H-GC)Training: a tree of flat gaussian classifier each node has its own FSA, FTA and F-GCTree construction is supervised (>< decision tree)Only the subset of sounds belonging to the classes of the current node are usedEvaluation: local probability decides which branch of the tree to followAdvantages of H-GCLearning facilities: it is easier to learn differences in a small subset of classesReduced class confusion: benefit from the higher recognition rate at the higher levels of the tree

    Hierarchical KNN classifier (H-KNN)

    Decision Trees: Binary Entropy Reduction Tree (BERT)C4.5.Partial Decision Tree (PART)

    [email protected] *

    Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization

    [email protected] *

    6. EvaluationTaxonomy used Three different levelsT1: sustained/non-sustained soundsT2: instrument familiesT3: instrument names

    [email protected] *

    6. EvaluationTest set 6 databasesIrcam Studio OnLine (1323 sounds, 16 instruments), Iowa University database (816 sounds, 12 instruments), McGill University database (585 sounds, 23 instruments), Microsoft Musical Instruments CD-ROM (216 sounds, 20 instruments),two commercial databases Pro (532 sounds, 20 instruments) Vi databases (691 sounds, 18 instruments),

    total = 4163 sounds.

    notes:27 instrument have been considereda large pitch range has been considered (4 octaves on average)no muted, martele/staccato sounds

    Graph1

    piano864812pianopiano

    121guitar2012guitarguitar

    80harp201218harp

    viola-pizzviola-pizz11viola-pizz2914

    double-pizz1002042933

    cello-pizz1071442619

    violin-pizzviolin-pizz1942450

    103viola32viola3456

    999818121934

    9611020124097

    92violin43124770

    803218125149

    cornetcornet101231cornet

    773319123922

    62trumpet33122822

    69tuba16121627

    767876123447

    piccolopiccolo29piccolo648

    recorderrecorder2712recorderrecorder

    75accordeonaccordeonaccordeonaccordeonaccordeon

    794116121441

    844234121822

    english-hornenglish-horn16english-horn1213

    673526121727

    63saxsopsaxsopsaxsopsaxsopsaxsop

    saxalto54saxaltosaxaltosaxaltosaxalto

    saxtenorsaxtenorsaxtenor12saxtenorsaxtenor

    SOL

    Iowa

    McGill

    Microsoft

    Pro

    Vi

    Graph2

    086481200

    20104024180

    02076412108116

    39020811336140257

    288659660165120

    7678132244095

    368172924861103

    SOL

    Iowa

    McGill

    Microsoft

    Pro

    Vi

    Graph3

    086481200

    20104024180

    02076412108116

    39020811336140257

    288659660165120

    7678132244095

    368172924861103

    SOL

    Iowa

    McGill

    Microsoft

    Pro

    Vi

    Graph4

    piano864812pianopiano

    121guitar2012guitarguitar

    80harp201218harp

    viola-pizzviola-pizz11viola-pizz2914

    double-pizz1002042933

    cello-pizz1071442619

    violin-pizzviolin-pizz1942450

    103viola32viola3456

    999818121934

    9611020124097

    92violin43124770

    803218125149

    cornetcornet101231cornet

    773319123922

    62trumpet33122822

    69tuba16121627

    767876123447

    piccolopiccolo29piccolo648

    recorderrecorder2712recorderrecorder

    75accordeonaccordeonaccordeonaccordeonaccordeon

    794116121441

    844234121822

    english-hornenglish-horn16english-horn1213

    673526121727

    63saxsopsaxsopsaxsopsaxsopsaxsop

    saxalto54saxaltosaxaltosaxaltosaxalto

    saxtenorsaxtenorsaxtenor12saxtenorsaxtenor

    SOL

    Iowa

    McGill

    Microsoft

    Pro

    Vi

    Feuil1

    SOLIowaMcGillMicrosoftProVi

    piano864812146

    struckedstr086481200146

    guitar1212012153

    harp80201218130

    pluckedstr20104024180283

    viola-pizz11291454

    double-pizz1002042933186

    cello-pizz1071442619170

    violin-pizz194245097

    pizzstr02076412108116

    viola103323456225

    double999818121934280

    cello9611020124097375

    violin9243124770264

    bowedstr39020811336140257

    french horn803218125149242

    cornet10123153

    trombone773319123922202

    trumpet6233122822157

    tuba6916121627140

    brass288659660165120

    flute767876123447

    piccolo2964883

    recorder271239

    airreed7678132244095

    accordeon7575

    bassoon794116121441203

    clarinet844234121822212

    english-horn16121341

    oboe673526121727184

    saxsop6363

    saxalto5454

    saxtenor1212

    singledoublereed368172924861103

    227814601078

    Feuil2

    Feuil3

    [email protected] *

    6. EvaluationEvaluation process 1) Random 66%/33% partition of database (50 sets)

    2) One to One (O2O) [Livshin2003]: each database is used in turns to classify all other databases

    3) Leave One Database Out (LODO) [Livshin 2003]: all database except one are used in turns to classify the remaining one

    [email protected] *

    6. EvaluationResults O2O (II)

    _1120545457.xls

    Graph1

    Feuil1

    LAST UPDATE 2003/04/18

    TODOOno2One en fat gaussian nogaussianity,noanadisc (ca risque d'tre long: 36 expriences * 3 niveaux)

    LODO fatgaussian et hiergaussian gaussianity,noanadisc

    Diffrence avec Classresult2: erase des pizz cello dans sustained dans Prosonus

    66/33

    Test Feature Selection

    nogaussianty, noanadiscMETAeval_Fsinertiecfs_Dbsol (_manudesc)

    sol

    CFS99.53 (0.47)92.56 (0.63)10.95 (3.68)

    IRMFSP t=0.01 maxdesc=2099.65 (0.31)96.2 (1.28)95.95 (0.91)

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmCFS99.01 (0.50)93.25 (0.80)60.76 (12.87)

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmIRMFSP t=0.01 maxdesc=2099.25 (0.37)95.81 (1.22)95.17 (1.26)

    Ono2one

    Test paramtre FSinertie

    Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_10005

    NEW IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus

    sol by15151931390023.8

    iowa by33351546450034.8

    mcgill by31111931410026.6

    microsoft by3393049350031.2

    prosonus27232715460027.6

    vitus by23202614390024.4

    0029.40015.60026.60016.40039.20041.20028.0666666667

    Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_20005

    NEW IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus

    sol by216023320016.4

    iowa by35161124360024.4

    mcgill by43141033430028.6

    microsoft by37121840310027.6

    prosonus3826180390024.2

    vitus by3524140370022

    0037.60019.40014.4004.20031.40036.20023.8666666667

    Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_40001

    NEW IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus

    sol by1150912007.4

    iowa by3822114220019.4

    mcgill by42151013210020.2

    microsoft by44121215260021.8

    prosonus30880250014.2

    vitus by23108090010

    0035.40011.20011004.200100021.20015.5

    Test paramtre Fsinertie avec remove Dei,

    Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_10005_manudesc _manudesc1 _manudesc2

    NEW IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by95452298631690651994603185563692.457.824.8

    iowa by96513396653594501597654283655493.259.235.8

    mcgill by7242278642399644199264308471418652.631.2

    microsoft by755736944836987031946444847139896237.2

    prosonus8644229530279477289256158875439156.427

    vitus by44341283432588772685541499584079.853.223.4

    74.645.62690.641.629.894.870.427.291.453.816.495.262.237.484.867.642.688.566666666756.866666666729.9

    Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_20005_manudesc

    NEW IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by268024350018.6

    iowa by43241126350027.8

    mcgill by34431034400032.2

    microsoft by34312340360032.8

    prosonus2431180420023

    vitus by2730160380022.2

    0032.40032.20017.8004.20032.40037.60026.1

    Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_40001_manudesc

    NEW IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by1950916009.8

    iowa by4222114220020.2

    mcgill by39341013200023.2

    microsoft by4426109250022.8

    prosonus314180250021

    vitus by18227030010

    0034.80028.40010.4004.2007.60021.60017.8333333333

    OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10005

    IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus

    sol by10035219869399653208659321007944965931.2

    iowa by974122914638985728191069067367944.226

    mcgill by715021843514100583697694494724889.256.832.6

    microsoft by7054389938141006848997345978769936442.8

    prosonus7954259637229675419358339680459260.833.2

    vitus by352713843215100694094583295634181.649.828.2

    70.445.223.892.635.417.29765.441.296.256.829.879.254.833.695.47748.488.466666666755.766666666732.3333333333

    OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10001

    IRMFSP t=0.01 maxdesc=10soliowamcgillmicrosoftprosonusvitus

    sol by99352198693998612595603210079439860.832

    iowa by99432298494198422481402291683893.448.429.4

    mcgill by916227883312100563399694298755195.25933

    microsoft by94653897341410068491007444998966986642.2

    prosonus92632596362094744092542998824794.461.832.2

    vitus by84602493381997694095532898643893.456.829.8

    9258.627.294.635.217.297.465.841.896.653.227.894.661.435.697.278.64995.458.833.1

    OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_20005

    IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus

    sol by100362198703896542286593310078429659.431.2

    iowa by97462591493698482619849070367944.225.4

    mcgill by715021843510100543697704294734789.256.431.2

    microsoft by705331993861007650997449978969936641

    prosonus795625963722964823936035968447925730.4

    vitus by352714843314100462094573095663981.645.823.4

    70.446.423.292.635.814.69757.833.496.254.629.879.255.433.495.478.848.288.466666666754.830.4333333333

    OK Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_40001

    IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus

    sol by99532698592498492095551610081229859.421.6

    iowa by9954269857139837218140591753093.452.619

    mcgill by915219883910100321699521598703595.24919

    microsoft by946935975119100712410065149990379869.225.8

    prosonus92491796452694411192271198813194.448.619.2

    vitus by84511993522397431095291398591693.446.816.2

    925523.294.64820.897.454.216.496.634.816.294.654.213.297.279.43195.454.266666666720.1333333333

    Test paramtre Fsinertie avec remove Dei,

    OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10005_manudesc

    IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by1006047986942906029986438100794497.266.440

    iowa by95462296574093593399552586653793.856.431.4

    mcgill by66503084584196563696684294694287.260.238.2

    microsoft by76553999734798754999744999916894.273.650.4

    prosonus83602596644794774191603490744790.86738.8

    vitus by3427148454388865328550279965387852.229.8

    70.847.62692.661.84494.868.640.8915731.898.265.238.493.875.647.690.262.633333333338.1

    NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10001_manudesc

    IRMFSP t=0.01 maxdesc=10soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by996046996943946634956034100784697.466.640.6

    iowa by96482298574098482799572688684095.855.631

    mcgill by69492988574193523197684196714388.659.437

    microsoft by78563997644199765099734799906994.471.849.2

    prosonus90622496644994774192633496805393.669.240.2

    vitus by55382193604094693491532998643886.256.832.4

    77.650.62794.66143.496.869.641.693.656.43197.664.437.295.877.450.292.666666666763.233333333338.4

    OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_20005_manudesc

    IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by1006350986943905729986535100844397.267.640

    iowa by95623996523193543199552586633393.857.231.8

    mcgill by66513084523596523296693794724787.259.236.2

    microsoft by76574599664298814899754899936794.274.450

    prosonus83502396423394813591593390764790.861.634.2

    vitus by3427138443278867328555319967387851.828.2

    70.849.43092.653.237.494.87037.89155.431.298.266.236.693.877.647.490.261.966666666736.7333333333

    OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_40001_manudesc

    IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by995842995821944522955711100792397.459.423.8

    iowa by96633898642098342299581488712695.85824

    mcgill by69472688513593351997531796703888.651.227

    microsoft by78573797573499752599641699883994.468.230.2

    prosonus90511996433294481492391096783293.651.821.4

    vitus by55311693512994491191402098572286.245.619.6

    77.649.827.294.65234.496.858.818.293.638.618.697.657.81695.877.231.692.666666666755.724.3333333333

    LODO

    Flat GaussianMETAeval_CLfg_Dbmixed_LODO_nomanudesc _gauss _anadisc _gaussanadisc

    LODOsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=40without9972449770371007048100785995816098834998.166666666775.666666666749.5

    IRMFSP t=0,01, max=40 + gaussianitywithout987038865727996952100825999926199865496.83333333337648.5

    IRMFSP t=0.01, max=40 + anadiscwithout100744699663810078571008357947957100845798.833333333377.333333333352

    IRMFSP t=0.01, max=40 + gaussianity + anadiscwithout997744936233997958100886499886299875898.166666666780.166666666753.1666666667

    Flat GaussianMETAeval_CLfg_Dbmixed_LODO _gauss _anadisc _gaussanadisc

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmLODOsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=40without987642100725098735397855898835996785497.833333333377.833333333352.6666666667

    IRMFSP t=0,01, max=40 + gaussianitywithout98763699724099715110081599993629984569979.550.6666666667

    IRMFSP t=0.01, max=40 + anadiscwithout98674399664899775799855693815996765897.333333333375.333333333353.5

    IRMFSP t=0.01, max=40 + gaussianity + anadiscwithout9981411006541998151100905599896095796498.666666666780.833333333352

    Hierarchical GaussianMETAeval_CLhg_Dbmixed_LODO 10-001 20-001 gaussanadisc

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmLODOsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=10without987453100574898685097795798785196755297.833333333371.833333333351.8333333333

    IRMFSP t=0,01, max=10 + gaussianity + anadiscwithout9973531007362996648100846599825395745898.666666666775.333333333356.5

    IRMFSP t=0.01 max=20without988155100635398755497907098855596845697.833333333379.666666666757.1666666667

    IRMFSP t=0.01 max=20 + gaussianity + anadiscwithout9983591007366997756100917399926295836498.666666666783.166666666763.3333333333

    Hierarchical GaussianMETAeval_CLhg_Dbmixed_LODO _gauss _anadisc _gaussanadisc

    LODOsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=40without9977499766541007049100937295865798855398.166666666779.555.6666666667

    IRMFSP t=0,01, max=40 + gaussianitywithout987549856441997651100906399896199825496.666666666779.333333333353.1666666667

    IRMFSP t=0.01 max=40 + anadiscwithout100785699716110081591009275948864100926998.833333333383.666666666764

    IRMFSP t=0.01 max=40 + gaussianity + anadiscwithout998458936854998560100896999936699876798.166666666784.333333333362.3333333333

    Hierarchical GaussianMETAeval_CLhg_Dbmixed_LODO _gauss _anadisc _gaussanadisc

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmLODOsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=40without987552100705498755297866498865996866097.833333333379.666666666756.8333333333

    IRMFSP t=0,01, max=40 + gaussianitywithout98785499735699815710088619989599978499981.166666666756

    IRMFSP t=0.01, max=40 + anadiscwithout98755999675899815899876893866496896897.333333333380.833333333362.5

    IRMFSP t=0.01, max=40 + gaussianity + anadiscwithout9984601007364998462100886399936595866998.666666666784.666666666763.8333333333

    Flat KPPV (IRMFSP t=0.01 max=40) + anadiscdans newresult

    LODOsoliowamcgillmicrosoftprosonusvitus

    K=1without99724997614510074451007762947540100836098.333333333373.666666666750.1666666667

    K=3without100735098634510075471008165947938100846498.666666666775.833333333351.5

    K=5without100745099624510076471008065947937100856498.83333333337651.3333333333

    K=10without100744899624610077461008363948039100856698.833333333376.833333333351.3333333333

    K=20without100744599614510077441008564948039100866898.833333333377.166666666750.8333333333

    rem: son invers de cor et corn

    Hier KPPV (IRMFSP t=0.01 max=40) + anadisc

    LODOsoliowamcgillmicrosoftprosonusvitus

    K=1without99755297746010081591008970948764100916598.333333333382.833333333361.6666666667

    K=3without100765398736310081591009072948766100926898.666666666783.166666666763.5

    K=5without100755399736410081581009272948865100927098.833333333383.563.6666666667

    K=10without100765499746510080581009374948864100927098.833333333383.833333333364.1666666667

    gaussianity K=10 * IRMFSP t=0.01 max=20 remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scm + gaussianity + anadiscwithout9882591007565997754100917199916195826398.58362.1666666667

    K=20without100765299746610079571009375948862100927198.833333333383.666666666763.8333333333

    Decision Treedans newresult

    LODOsoliowamcgillmicrosoftprosonusvitus

    * BERT >0.01without96524197553890643192693998673795694194.666666666762.666666666737.8333333333

    weka-j48 (-M10)without583259375836674173396649063.539

    weka-PART (-M10)without592761446234684583417144067.333333333339.1666666667

    Flat gaussian

    ALLsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=401009795999594100979410099959995901009795

    IRMFSP t=0,01, max=40 + gaussianity

    IRMFSP t=0,01, max=40 + gaussianity + anadiscwith

    Hier gaussian

    ALLsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=40with

    IRMFSP t=0,01, max=40 + gaussianitywith

    IRMFSP t=0,01, max=40 + gaussianity + anadiscwith

    soliowamcgillmicrosoftprosonusvitusao

    gausshier

    kppvhier K=10

    Feuil2

    0soliowamcgillmicrosoftprosonusvitus

    sol by099999495100

    iowa by96098989988

    mcgill by69880939796

    microsoft by78979909999

    prosonus90969492096

    vitus by55939491980

    0soliowamcgillmicrosoftprosonusvitus

    sol by05858455779

    iowa by63064345871

    mcgill by47510355370

    microsoft by57577506488

    prosonus51434839078

    vitus by31514940570

    0soliowamcgillmicrosoftprosonusvitus

    sol by04221221123

    iowa by38020221426

    mcgill by26350191738

    microsoft by37342501639

    prosonus19321410032

    vitus by16291120220

    Feuil2

    000000

    000000

    000000

    000000

    000000

    000000

    sol by

    iowa by

    mcgill by

    microsoft by

    prosonus

    vitus by

    Feuil3

    000000

    000000

    000000

    000000

    000000

    000000

    sol by

    iowa by

    mcgill by

    microsoft by

    prosonus

    vitus by

    000000

    000000

    000000

    000000

    000000

    000000

    sol by

    iowa by

    mcgill by

    microsoft by

    prosonus

    vitus by

    Database:SOL

    Evaluation:66%/33% (50 sets)

    Algorithm:FGC manudesc

    T1T2T3

    LDA968986

    CFS weka99.0 (0.5)93.2 (0.8)60.8 (12.9)

    IRMFSP (t=0.01, nbdescmax=20)99.2 (0.4)95.8 (1.2)95.1 (1.2)

    Databases:all

    Evaluation:O2O

    Feature Selection:IRMFSP t=0.05, maxdesc=10 manudesc

    T1T2T3

    F-GC895730

    H-GC936338

    Databases:all

    Evaluation:LODO

    Feature Selection:IRMFSP t=0.01, maxdesc=40 manudesc

    T1T2T3

    F-GC987853

    F-GC (G+LDA)998152

    H-GC988057

    H-GC (G+LDA)998564

    [email protected] *

    6. EvaluationResults O2O (II) O2O (mean value over the 30 (6*5) experiments)Discussionlow recognition rate for O2O compared to 66%/33% -> problem of generalization ? system mainly learns the instrument instance instead of the instrument (each database contains a single instance of an instrument)

    LODO (mean value over the 6 Left Out databases)Goal: to increase the number of instances of each instrument How: by combining several databases

    [email protected] *

    Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization

    [email protected] *

    5. EvaluationConfusion matrix Low confusion between sustained / non-sustained sounds

    _1120556391.xls

    exportexcelmeanall

    original class

    pianoguitarharpviola-pizzbass-pizzcello-pizzviolin-pizzviolabasscelloviolinfrench-horncornettrombonetrumpettubaflutepiccolorecorderbassoonclarinetenglish-hornoboe

    recognized classpiano3634422112

    guitar29481218

    harp2422682352

    viola-pizz16851942

    bass-pizz137612

    cello-pizz22024187111

    violin-pizz316188243

    viola44514

    bass293461213

    cello1375681612115

    violin143551101

    french-horn115013115452

    cornet2130313214

    trombone1515497112

    trumpet1471061322

    tuba2223779

    flute125234177101012324

    piccolo1471558

    recorder12459

    bassoon14512212381121

    clarinet11217245110461020

    english-horn111331124

    oboe41493131144958

    number of sounds146159130541861709722528035626424253202157140323833920321241184

    [email protected] *

    5. EvaluationConfusion matrix Largest confusions inside each instrument family

    _1120556391.xls

    exportexcelmeanall

    original class

    pianoguitarharpviola-pizzbass-pizzcello-pizzviolin-pizzviolabasscelloviolinfrench-horncornettrombonetrumpettubaflutepiccolorecorderbassoonclarinetenglish-hornoboe

    recognized classpiano3634422112

    guitar29481218

    harp2422682352

    viola-pizz16851942

    bass-pizz137612

    cello-pizz22024187111

    violin-pizz316188243

    viola44514

    bass293461213

    cello1375681612115

    violin143551101

    french-horn115013115452

    cornet2130313214

    trombone1515497112

    trumpet1471061322

    tuba2223779

    flute125234177101012324

    piccolo1471558

    recorder12459

    bassoon14512212381121

    clarinet11217245110461020

    english-horn111331124

    oboe41493131144958

    number of sounds146159130541861709722528035626424253202157140323833920321241184

    [email protected] *

    5. EvaluationConfusion matrix Lowest recognition rates -> smallest training sets

    _1120556391.xls

    exportexcelmeanall

    original class

    pianoguitarharpviola-pizzbass-pizzcello-pizzviolin-pizzviolabasscelloviolinfrench-horncornettrombonetrumpettubaflutepiccolorecorderbassoonclarinetenglish-hornoboe

    recognized classpiano3634422112

    guitar29481218

    harp2422682352

    viola-pizz16851942

    bass-pizz137612

    cello-pizz22024187111

    violin-pizz316188243

    viola44514

    bass293461213

    cello1375681612115

    violin143551101

    french-horn115013115452

    cornet2130313214

    trombone1515497112

    trumpet1471061322

    tuba2223779

    flute125234177101012324

    piccolo1471558

    recorder12459

    bassoon14512212381121

    clarinet11217245110461020

    english-horn111331124

    oboe41493131144958

    number of sounds146159130541861709722528035626424253202157140323833920321241184

    [email protected] *

    5. EvaluationConfusion matrix Confusion piano / guitar-harp

    _1120556391.xls

    exportexcelmeanall

    original class

    pianoguitarharpviola-pizzbass-pizzcello-pizzviolin-pizzviolabasscelloviolinfrench-horncornettrombonetrumpettubaflutepiccolorecorderbassoonclarinetenglish-hornoboe

    recognized classpiano3634422112

    guitar29481218

    harp2422682352

    viola-pizz16851942

    bass-pizz137612

    cello-pizz22024187111

    violin-pizz316188243

    viola44514

    bass293461213

    cello1375681612115

    violin143551101

    french-horn115013115452

    cornet2130313214

    trombone1515497112

    trumpet1471061322

    tuba2223779

    flute125234177101012324

    piccolo1471558

    recorder12459

    bassoon14512212381121

    clarinet11217245110461020

    english-horn111331124

    oboe41493131144958

    number of sounds146159130541861709722528035626424253202157140323833920321241184

    [email protected] *

    5. EvaluationConfusion matrix Cross-family confusions

    _1120556391.xls

    exportexcelmeanall

    original class

    pianoguitarharpviola-pizzbass-pizzcello-pizzviolin-pizzviolabasscelloviolinfrench-horncornettrombonetrumpettubaflutepiccolorecorderbassoonclarinetenglish-hornoboe

    recognized classpiano3634422112

    guitar29481218

    harp2422682352

    viola-pizz16851942

    bass-pizz137612

    cello-pizz22024187111

    violin-pizz316188243

    viola44514

    bass293461213

    cello1375681612115

    violin143551101

    french-horn115013115452

    cornet2130313214

    trombone1515497112

    trumpet1471061322

    tuba2223779

    flute125234177101012324

    piccolo1471558

    recorder12459

    bassoon14512212381121

    clarinet11217245110461020

    english-horn111331124

    oboe41493131144958

    number of sounds146159130541861709722528035626424253202157140323833920321241184

    [email protected] *

    5. EvaluationConfusion matrixCross-family confusions

    Cornet -> BassoonCornet -> English-horn Flute -> ClarinetOboe -> FluteTrombone -> Flute

    [email protected] *

    Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization

    [email protected] *

    5. EvaluationMain selected features

    Par FSA (IRMFSP)

    Graph1

    Feuil1

    LAST UPDATE 2003/04/18

    TODOOno2One en fat gaussian nogaussianity,noanadisc (ca risque d'tre long: 36 expriences * 3 niveaux)

    LODO fatgaussian et hiergaussian gaussianity,noanadisc

    Diffrence avec Classresult2: erase des pizz cello dans sustained dans Prosonus

    66/33

    Test Feature Selection

    nogaussianty, noanadiscMETAeval_Fsinertiecfs_Dbsol (_manudesc)

    sol

    CFS99.53 (0.47)92.56 (0.63)10.95 (3.68)

    IRMFSP t=0.01 maxdesc=2099.65 (0.31)96.2 (1.28)95.95 (0.91)

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmCFS99.01 (0.50)93.25 (0.80)60.76 (12.87)

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmIRMFSP t=0.01 maxdesc=2099.25 (0.37)95.81 (1.22)95.17 (1.26)

    Ono2one

    Test paramtre FSinertie

    Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_10005

    NEW IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus

    sol by15151931390023.8

    iowa by33351546450034.8

    mcgill by31111931410026.6

    microsoft by3393049350031.2

    prosonus27232715460027.6

    vitus by23202614390024.4

    0029.40015.60026.60016.40039.20041.20028.0666666667

    Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_20005

    NEW IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus

    sol by216023320016.4

    iowa by35161124360024.4

    mcgill by43141033430028.6

    microsoft by37121840310027.6

    prosonus3826180390024.2

    vitus by3524140370022

    0037.60019.40014.4004.20031.40036.20023.8666666667

    Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_40001

    NEW IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus

    sol by1150912007.4

    iowa by3822114220019.4

    mcgill by42151013210020.2

    microsoft by44121215260021.8

    prosonus30880250014.2

    vitus by23108090010

    0035.40011.20011004.200100021.20015.5

    Test paramtre Fsinertie avec remove Dei,

    Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_10005_manudesc _manudesc1 _manudesc2

    NEW IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by95452298631690651994603185563692.457.824.8

    iowa by96513396653594501597654283655493.259.235.8

    mcgill by7242278642399644199264308471418652.631.2

    microsoft by755736944836987031946444847139896237.2

    prosonus8644229530279477289256158875439156.427

    vitus by44341283432588772685541499584079.853.223.4

    74.645.62690.641.629.894.870.427.291.453.816.495.262.237.484.867.642.688.566666666756.866666666729.9

    Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_20005_manudesc

    NEW IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by268024350018.6

    iowa by43241126350027.8

    mcgill by34431034400032.2

    microsoft by34312340360032.8

    prosonus2431180420023

    vitus by2730160380022.2

    0032.40032.20017.8004.20032.40037.60026.1

    Flat Gaussiannogaussianty, noanadiscMETAeval_CLfg_Dbmixed_O2O_40001_manudesc

    NEW IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by1950916009.8

    iowa by4222114220020.2

    mcgill by39341013200023.2

    microsoft by4426109250022.8

    prosonus314180250021

    vitus by18227030010

    0034.80028.40010.4004.2007.60021.60017.8333333333

    OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10005

    IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus

    sol by10035219869399653208659321007944965931.2

    iowa by974122914638985728191069067367944.226

    mcgill by715021843514100583697694494724889.256.832.6

    microsoft by7054389938141006848997345978769936442.8

    prosonus7954259637229675419358339680459260.833.2

    vitus by352713843215100694094583295634181.649.828.2

    70.445.223.892.635.417.29765.441.296.256.829.879.254.833.695.47748.488.466666666755.766666666732.3333333333

    OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10001

    IRMFSP t=0.01 maxdesc=10soliowamcgillmicrosoftprosonusvitus

    sol by99352198693998612595603210079439860.832

    iowa by99432298494198422481402291683893.448.429.4

    mcgill by916227883312100563399694298755195.25933

    microsoft by94653897341410068491007444998966986642.2

    prosonus92632596362094744092542998824794.461.832.2

    vitus by84602493381997694095532898643893.456.829.8

    9258.627.294.635.217.297.465.841.896.653.227.894.661.435.697.278.64995.458.833.1

    OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_20005

    IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus

    sol by100362198703896542286593310078429659.431.2

    iowa by97462591493698482619849070367944.225.4

    mcgill by715021843510100543697704294734789.256.431.2

    microsoft by705331993861007650997449978969936641

    prosonus795625963722964823936035968447925730.4

    vitus by352714843314100462094573095663981.645.823.4

    70.446.423.292.635.814.69757.833.496.254.629.879.255.433.495.478.848.288.466666666754.830.4333333333

    OK Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_40001

    IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus

    sol by99532698592498492095551610081229859.421.6

    iowa by9954269857139837218140591753093.452.619

    mcgill by915219883910100321699521598703595.24919

    microsoft by946935975119100712410065149990379869.225.8

    prosonus92491796452694411192271198813194.448.619.2

    vitus by84511993522397431095291398591693.446.816.2

    925523.294.64820.897.454.216.496.634.816.294.654.213.297.279.43195.454.266666666720.1333333333

    Test paramtre Fsinertie avec remove Dei,

    OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10005_manudesc

    IRMFSP t=0.05 maxdesc=10soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by1006047986942906029986438100794497.266.440

    iowa by95462296574093593399552586653793.856.431.4

    mcgill by66503084584196563696684294694287.260.238.2

    microsoft by76553999734798754999744999916894.273.650.4

    prosonus83602596644794774191603490744790.86738.8

    vitus by3427148454388865328550279965387852.229.8

    70.847.62692.661.84494.868.640.8915731.898.265.238.493.875.647.690.262.633333333338.1

    NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_10001_manudesc

    IRMFSP t=0.01 maxdesc=10soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by996046996943946634956034100784697.466.640.6

    iowa by96482298574098482799572688684095.855.631

    mcgill by69492988574193523197684196714388.659.437

    microsoft by78563997644199765099734799906994.471.849.2

    prosonus90622496644994774192633496805393.669.240.2

    vitus by55382193604094693491532998643886.256.832.4

    77.650.62794.66143.496.869.641.693.656.43197.664.437.295.877.450.292.666666666763.233333333338.4

    OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_20005_manudesc

    IRMFSP t=0.05 maxdesc=20soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by1006350986943905729986535100844397.267.640

    iowa by95623996523193543199552586633393.857.231.8

    mcgill by66513084523596523296693794724787.259.236.2

    microsoft by76574599664298814899754899936794.274.450

    prosonus83502396423394813591593390764790.861.634.2

    vitus by3427138443278867328555319967387851.828.2

    70.849.43092.653.237.494.87037.89155.431.298.266.236.693.877.647.490.261.966666666736.7333333333

    OK NEW Hierarchical Gaussiannogaussianity, noanadiscMETAeval_CLhg_Dbmixed_O2O_40001_manudesc

    IRMFSP t=0.01 maxdesc=40soliowamcgillmicrosoftprosonusvitus

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmsol by995842995821944522955711100792397.459.423.8

    iowa by96633898642098342299581488712695.85824

    mcgill by69472688513593351997531796703888.651.227

    microsoft by78573797573499752599641699883994.468.230.2

    prosonus90511996433294481492391096783293.651.821.4

    vitus by55311693512994491191402098572286.245.619.6

    77.649.827.294.65234.496.858.818.293.638.618.697.657.81695.877.231.692.666666666755.724.3333333333

    LODO

    Flat GaussianMETAeval_CLfg_Dbmixed_LODO_nomanudesc _gauss _anadisc _gaussanadisc

    LODOsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=40without9972449770371007048100785995816098834998.166666666775.666666666749.5

    IRMFSP t=0,01, max=40 + gaussianitywithout987038865727996952100825999926199865496.83333333337648.5

    IRMFSP t=0.01, max=40 + anadiscwithout100744699663810078571008357947957100845798.833333333377.333333333352

    IRMFSP t=0.01, max=40 + gaussianity + anadiscwithout997744936233997958100886499886299875898.166666666780.166666666753.1666666667

    Flat GaussianMETAeval_CLfg_Dbmixed_LODO _gauss _anadisc _gaussanadisc

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmLODOsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=40without987642100725098735397855898835996785497.833333333377.833333333352.6666666667

    IRMFSP t=0,01, max=40 + gaussianitywithout98763699724099715110081599993629984569979.550.6666666667

    IRMFSP t=0.01, max=40 + anadiscwithout98674399664899775799855693815996765897.333333333375.333333333353.5

    IRMFSP t=0.01, max=40 + gaussianity + anadiscwithout9981411006541998151100905599896095796498.666666666780.833333333352

    Hierarchical GaussianMETAeval_CLhg_Dbmixed_LODO 10-001 20-001 gaussanadisc

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmLODOsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=10without987453100574898685097795798785196755297.833333333371.833333333351.8333333333

    IRMFSP t=0,01, max=10 + gaussianity + anadiscwithout9973531007362996648100846599825395745898.666666666775.333333333356.5

    IRMFSP t=0.01 max=20without988155100635398755497907098855596845697.833333333379.666666666757.1666666667

    IRMFSP t=0.01 max=20 + gaussianity + anadiscwithout9983591007366997756100917399926295836498.666666666783.166666666763.3333333333

    Hierarchical GaussianMETAeval_CLhg_Dbmixed_LODO _gauss _anadisc _gaussanadisc

    LODOsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=40without9977499766541007049100937295865798855398.166666666779.555.6666666667

    IRMFSP t=0,01, max=40 + gaussianitywithout987549856441997651100906399896199825496.666666666779.333333333353.1666666667

    IRMFSP t=0.01 max=40 + anadiscwithout100785699716110081591009275948864100926998.833333333383.666666666764

    IRMFSP t=0.01 max=40 + gaussianity + anadiscwithout998458936854998560100896999936699876798.166666666784.333333333362.3333333333

    Hierarchical GaussianMETAeval_CLhg_Dbmixed_LODO _gauss _anadisc _gaussanadisc

    remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scmLODOsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=40without987552100705498755297866498865996866097.833333333379.666666666756.8333333333

    IRMFSP t=0,01, max=40 + gaussianitywithout98785499735699815710088619989599978499981.166666666756

    IRMFSP t=0.01, max=40 + anadiscwithout98755999675899815899876893866496896897.333333333380.833333333362.5

    IRMFSP t=0.01, max=40 + gaussianity + anadiscwithout9984601007364998462100886399936595866998.666666666784.666666666763.8333333333

    Flat KPPV (IRMFSP t=0.01 max=40) + anadiscdans newresult

    LODOsoliowamcgillmicrosoftprosonusvitus

    K=1without99724997614510074451007762947540100836098.333333333373.666666666750.1666666667

    K=3without100735098634510075471008165947938100846498.666666666775.833333333351.5

    K=5without100745099624510076471008065947937100856498.83333333337651.3333333333

    K=10without100744899624610077461008363948039100856698.833333333376.833333333351.3333333333

    K=20without100744599614510077441008564948039100866898.833333333377.166666666750.8333333333

    rem: son invers de cor et corn

    Hier KPPV (IRMFSP t=0.01 max=40) + anadisc

    LODOsoliowamcgillmicrosoftprosonusvitus

    K=1without99755297746010081591008970948764100916598.333333333382.833333333361.6666666667

    K=3without100765398736310081591009072948766100926898.666666666783.166666666763.5

    K=5without100755399736410081581009272948865100927098.833333333383.563.6666666667

    K=10without100765499746510080581009374948864100927098.833333333383.833333333364.1666666667

    gaussianity K=10 * IRMFSP t=0.01 max=20 remove Dei_, Dpi_loud_v, specloud, flustr, roughn, sfm, scm + gaussianity + anadiscwithout9882591007565997754100917199916195826398.58362.1666666667

    K=20without100765299746610079571009375948862100927198.833333333383.666666666763.8333333333

    Decision Treedans newresult

    LODOsoliowamcgillmicrosoftprosonusvitus

    * BERT >0.01without96524197553890643192693998673795694194.666666666762.666666666737.8333333333

    weka-j48 (-M10)without583259375836674173396649063.539

    weka-PART (-M10)without592761446234684583417144067.333333333339.1666666667

    Flat gaussian

    ALLsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=401009795999594100979410099959995901009795

    IRMFSP t=0,01, max=40 + gaussianity

    IRMFSP t=0,01, max=40 + gaussianity + anadiscwith

    Hier gaussian

    ALLsoliowamcgillmicrosoftprosonusvitus

    IRMFSP t=0,01, max=40with

    IRMFSP t=0,01, max=40 + gaussianitywith

    IRMFSP t=0,01, max=40 + gaussianity + anadiscwith

    soliowamcgillmicrosoftprosonusvitusao

    gausshier

    kppvhier K=10

    Feuil2

    0soliowamcgillmicrosoftprosonusvitus

    sol by099999495100

    iowa by96098989988

    mcgill by69880939796

    microsoft by78979909999

    prosonus90969492096

    vitus by55939491980

    0soliowamcgillmicrosoftprosonusvitus

    sol by05858455779

    iowa by63064345871

    mcgill by47510355370

    microsoft by57577506488

    prosonus51434839078

    vitus by31514940570

    0soliowamcgillmicrosoftprosonusvitus

    sol by04221221123

    iowa by38020221426

    mcgill by26350191738

    microsoft by37342501639

    prosonus19321410032

    vitus by16291120220

    Feuil2

    000000

    000000

    000000

    000000

    000000

    000000

    sol by

    iowa by

    mcgill by

    microsoft by

    prosonus

    vitus by

    Feuil3

    000000

    000000

    000000

    000000

    000000

    000000

    sol by

    iowa by

    mcgill by

    microsoft by

    prosonus

    vitus by

    Feuil4

    000000

    000000

    000000

    000000

    000000

    000000

    sol by

    iowa by

    mcgill by

    microsoft by

    prosonus

    vitus by

    Database:SOL

    Evaluation:66%/33% (50 sets)

    Algorithm:FGC manudesc

    T1T2T3

    LDA968986

    CFS weka99.0 (0.5)93.2 (0.8)60.8 (12.9)

    IRMFSP (t=0.01, nbdescmax=20)99.2 (0.4)95.8 (1.2)95.1 (1.2)

    Databases:all

    Evaluation:O2O

    Feature Selection:IRMFSP t=0.05, maxdesc=10 manudesc

    T1T2T3

    F-GC895730

    H-GC936338

    Databases:all

    Evaluation:LODO

    Feature Selection:IRMFSP t=0.01, maxdesc=40 manudesc

    T1T2T3

    F-GC987853

    F-GC (G+LDA)998152

    H-GC988057

    H-GC (G+LDA)998564

    soliowamcgillmicrosoftprosonusvitus manudesc inertie maxdesc-t=0.01 nogaussnoanadisc

    sust/non-sustnon-sustpluckstringpizzstringsustbowedstringbrassreedairreedsingledouble

    temporal increasetemporal decreasetemporal decreasetemporal decrease

    temporal decreasetemporal centroid

    temporal log-attack

    spectral centroidspectral centroidspectal spread +stdspectral spreadspectral spreadspectral centroidspectral centroidspectral skewnessspectral centroid

    spectral spreadspectral spreadspectral slopesharpnessspectral skewnessspectral spreadspectral skewnessspectral kurtosis + stdspectral spread

    spectral skewnessspectral variation + stdspectral kurtosisspectral kurtosis + stdsharpnessspectrall kurtosis stdspectral slopespectral skewness

    spectral slopespectral variationspectrall skewness stdspectral variation std

    spectral variationspectral decrease stdspectral kurtosis

    harmonic deviationtristimulus + stdtristimulusharmonic deviationtristimulusnoisinessharmonic deviationtristimulus

    tristimuls stdharmonic deviation

    mfcc2,6 stdvarious mfccvarious mfccvarious mfccmfcc3,4,6xcorr 3, 6, 8xcorr3xcorr3

    sust./non-sustsamong non-sust.among sust.among bowed-stringamong brassamong air reedsamong sing/dbl reeds

    temporal increasetemporal decreasetemporal decreasetemporal decrease

    temporal decreasetemporal centroid

    temporal log-attack

    spectral centroidspectral centroidspectral spreadspectral centroidspectral centroidspectral skewnessspectral centroid

    spectral spreadspectral spreadspectral skewnessspectral spreadspectral skewnessspectral kurtosis + stdspectral spread

    spectral skewnessspectral kurtosis + stdsharpnessspectrall kurtosis stdspectral slopespectral skewness

    spectral variationspectrall skewness stdspectral variation std

    spectral decrease stdspectral kurtosis

    harmonic deviationharmonic deviationtristimulusnoisinessharmonic deviationtristimulus

    tristimuls stdharmonic deviation

    mfcc2,6 stdvarious mfccmfcc3,4,6xcorr 3, 6, 8xcorr3xcorr3

    [email protected] *

    5. EvaluationMain selected featuresPar arbre de dcision (C4.5)

    DTg_decr 0.079858 AND

    DHg_mod_am > 0.000158 AND

    DPi_specloud_m21-mm > 0.026443 AND

    DPi_specloud_m5-mm -0.000202: vln (66.0/8.0)

    [email protected] *

    Feature extractionFeature selectionFeature TransformClassificationEvaluationConfusion matrixWhich featuresClasses organization

    [email protected] *

    7. Instrument Class Similarity

    Goal: check that the proposed tree structure corresponds to natural class organization How ?Most people use Martin hierarchy1) check the grouping among the decision trees leaves2) MDS ?

    ?

    MDS on acoustic features ? [Herrera AES114th] Compute the dissimilarity between each class How ?Compute the between-group F-matrix between class modelsObserve the dissimilarity between the classesHow ? MDS (Multi-dimensional scaling) analysisMDS preserve as much as possible distances between the data and allows representing them into a lower dimensional spaceusually MDS is used for representing dissimilarity judgements (Timbre similarity), used here on acoustic featuresMDS (Kruskals STRESS formula 1 scaling method)3 dimensional space

    [email protected] *

    7. Instrument Class Similarity

    Clusters ?non-sustained sounds

    Feuil1

    PIANPiano

    GUIGuitar

    HARPHarp

    VLNPViolin pizz

    VLAPViola pizz

    CELLPCello pizz

    DBLPDouble pizz

    VLNViolin pizz

    VLAViola pizz

    CELLCello pizz

    DBLDouble pizz

    TRPUTrumpet

    CORCornet

    TBTBTrombone

    FHORFrench-horn

    TUBBTuba

    FLTUFlute

    PICCPiccolo

    RECORecorder

    CLAClarinet

    SAXTETenor sax

    SAXALAlto sax

    SAXSOSoprano sax

    ACCAccordeon

    OBOEOboe

    BSBassoon

    EHOREnglish-horn

    Feuil2

    Feuil3

    [email protected] *

    7. Instrument Class Similarity

    Clusters ?non-sustained soundsBowed-strings sounds

    Feuil1

    PIANPiano

    GUIGuitar

    HARPHarp

    VLNPViolin pizz

    VLAPViola pizz

    CELLPCello pizz

    DBLPDouble pizz

    VLNViolin pizz

    VLAViola pizz

    CELLCello pizz

    DBLDouble pizz

    TRPUTrumpet

    CORCornet

    TBTBTrombone

    FHORFrench-horn

    TUBBTuba

    FLTUFlute

    PICCPiccolo

    RECORecorder

    CLAClarinet

    SAXTETenor sax

    SAXALAlto sax

    SAXSOSoprano sax

    ACCAccordeon

    OBOEOboe

    BSBassoon

    EHOREnglish-horn

    Feuil2

    Feuil3

    [email protected] *

    7. Instrument Class Similarity

    Clusters ?non-sustained soundsBowed-strings soundsBrass sounds (TRPU ?)

    Feuil1

    PIANPiano

    GUIGuitar

    HARPHarp

    VLNPViolin pizz

    VLAPViola pizz

    CELLPCello pizz

    DBLPDouble pizz

    VLNViolin pizz

    VLAViola pizz

    CELLCello pizz

    DBLDouble pizz

    TRPUTrumpet

    CORCornet

    TBTBTrombone

    FHORFrench-horn

    TUBBTuba

    FLTUFlute

    PICCPiccolo

    RECORecorder

    CLAClarinet

    SAXTETenor sax

    SAXALAlto sax

    SAXSOSoprano sax

    ACCAccordeon

    OBOEOboe

    BSBassoon

    EHOREnglish-horn

    Feuil2

    Feuil3

    [email protected] *

    7. Instrument Class Similarity

    Clusters ?non-sustained soundsBowed-strings soundsBrass sounds (TRPU ?)mix between single/double reeds and brass instruments

    Feuil1

    PIANPiano

    GUIGuitar

    HARPHarp

    VLNPViolin pizz

    VLAPViola pizz

    CELLPCello pizz

    DBLPDouble pizz

    VLNViolin pizz

    VLAViola pizz

    CELLCello pizz

    DBLDouble pizz

    TRPUTrumpet

    CORCornet

    TBTBTrombone

    FHORFrench-horn

    TUBBTuba

    FLTUFlute

    PICCPiccolo

    RECORecorder

    CLAClarinet

    SAXTETenor sax

    SAXALAlto sax

    SAXSOSoprano sax

    ACCAccordeon

    OBOEOboe

    BSBassoon

    EHOREnglish-horn

    Feuil2

    Feuil3

    [email protected] *

    7. Instrument Class Similarity

    Dimension 1: separate sustained sounds / non sustained soundsnegative values: PIAN, GUI, HARP, VLNP, VLAP, CELLP, DBLP-> attack-time, decrease time

    Feuil1

    PIANPiano

    GUIGuitar

    HARPHarp

    VLNPViolin pizz

    VLAPViola pizz

    CELLPCello pizz

    DBLPDouble pizz

    VLNViolin pizz

    VLAViola pizz

    CELLCello pizz

    DBLDouble pizz

    TRPUTrumpet

    CORCornet

    TBTBTrombone

    FHORFrench-horn

    TUBBTuba

    FLTUFlute

    PICCPiccolo

    RECORecorder

    CLAClarinet

    SAXTETenor sax

    SAXALAlto sax

    SAXSOSoprano sax

    ACCAccordeon

    OBOEOboe

    BSBassoon

    EHOREnglish-horn

    Feuil2

    Feuil3

    [email protected] *

    7. Instrument Class Similarity

    Dimension 1: separate sustained sounds / non sustained soundsnegative values: PIAN, GUI, HARP, VLNP, VLAP, CELLP, DBLP-> attack-time, decrease timeDimension 2: brightnessdark sounds: TUBB, BSN, TBTB, FHORbright sounds: PICC, CLA, FLUTproblem DBL ?

    Feuil1

    PIANPiano

    GUIGuitar

    HARPHarp

    VLNPViolin pizz

    VLAPViola pizz

    CELLPCello pizz

    DBLPDouble pizz

    VLNViolin pizz

    VLAViola pizz

    CELLCello pizz

    DBLDouble pizz

    TRPUTrumpet

    CORCornet

    TBTBTrombone

    FHORFrench-horn

    TUBBTuba

    FLTUFlute

    PICCPiccolo

    RECORecorder

    CLAClarinet

    SAXTETenor sax

    SAXALAlto sax

    SAXSOSoprano sax

    ACCAccordeon

    OBOEOboe

    BSBassoon

    EHOREnglish-horn

    Feuil2

    Feuil3

    [email protected] *

    7. Instrument Class Similarity

    Dimension 1: separate sustained sounds / non sustained soundsnegative values: PIAN, GUI, HARP, VLNP, VLAP, CELLP, DBLP-> attack-time, decrease timeDimension 2: brightnessdark sounds TUBB, BSN, TBTB, FHORbright sounds: PICC, CLA, FLUTproblem DBL ?Dimension 3: ?Separation of bowed stings (VLN, VLA, CELL, DBL)amount of modulation ?

    Feuil1

    PIANPiano

    GUIGuitar

    HARPHarp

    VLNPViolin pizz

    VLAPViola pizz

    CELLPCello pizz

    DBLPDouble pizz

    VLNViolin pizz

    VLAViola pizz

    CELLCello pizz

    DBLDouble pizz

    TRPUTrumpet

    CORCornet

    TBTBTrombone

    FHORFrench-horn

    TUBBTuba

    FLTUFlute

    PICCPiccolo

    RECORecorder

    CLAClarinet

    SAXTETenor sax

    SAXALAlto sax

    SAXSOSoprano sax

    ACCAccordeon

    OBOEOboe

    BSBassoon

    EHOREnglish-horn

    Feuil2

    Feuil3

    [email protected] *

    Conclusion ?

    [email protected] *

    ConclusionState of the artMartin [1999] 76% (family)39% for 14 instrumentsEronen [2001] 77% (family)35% for 16 instruments

    This study 85% (family)64% for 23 instrumentsincreased recognition rates mainly explained by the use of new features

    Perspectivesderive automatically the tree structure (analysis of decision tree ?)test other classification algorithm (GMM, SVM, )test the system for other sound classes (non-instrumental sounds, sound FX)extend the system to musical phrasesextend the system to polyphonic soundsextend the system to multi-sources sounds

    Links:http://www.cuidado.mu http://www.cs.waikato.ac.nz/ml/weka/

    Ircam - CUIDADO I.S.T.Ircam - CUIDADO I.S.T.