Transcript
Page 1: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2013 2207

Automatic Ontology Generation for MusicalInstruments Based on Audio Analysis

Şefki Kolozali, Student Member, IEEE, Mathieu Barthet, Member, IEEE, György Fazekas, andMark Sandler, Member, IEEE

Abstract—In this paper we present a novel hybrid system thatinvolves a formal method of automatic ontology generation forweb-based audio signal processing applications. An ontologyis seen as a knowledge management structure that representsdomain knowledge in a machine interpretable format. It de-scribes concepts and relationships within a particular domain,in our case, the domain of musical instruments. However, thedifferent tasks of ontology engineering including manual anno-tation, hierarchical structuring and organization of data can belaborious and challenging. For these reasons, we investigate howthe process of creating ontologies can be made less dependenton human supervision by exploring concept analysis techniquesin a Semantic Web environment. In this study, various musicalinstruments, from wind to string families, are classified usingtimbre features extracted from audio. To obtain models of theanalysed instrument recordings, we use K-means clustering todetermine an optimised codebook of Line Spectral Frequencies(LSFs), or Mel-frequency Cepstral Coefficients (MFCCs). Twoclassification techniques based on Multi-Layer Perceptron (MLP)neural network and Support Vector Machines (SVM) were tested.Then, Formal Concept Analysis (FCA) is used to automaticallybuild the hierarchical structure of musical instrument ontologies.Finally, the generated ontologies are expressed using the OntologyWeb Language (OWL). System performance was evaluated undernatural recording conditions using databases of isolated notes andmelodic phrases. Analysis of Variance (ANOVA) were conductedwith the feature and classifier attributes as independent variablesand the musical instrument recognition F-measure as dependentvariable. Based on these statistical analyses, a detailed comparisonbetween musical instrument recognition models is made to investi-gate their effects on the automatic ontology generation system. Theproposed system is general and also applicable to other researchfields that are related to ontologies and the Semantic Web.

Index Terms—Automatic ontology generation, instrumentrecognition, semantic web intelligence.

I. INTRODUCTION

I N recent years, theWorldWideWeb has gone through rapiddevelopment both technologically and in its popularity. It

became closely integrated with our lives. The Web exposes vast

Manuscript received May 14, 2012; revised October 23, 2012 and April 12,2013; acceptedApril 24, 2013. Date of publicationMay 17, 2013; date of currentversion August 09, 2013. This work was supported in part by the Networked En-vironment for Music Analysis (NEMA) project, U.K. Engineering and PhysicalSciences Research Council (EPSRC), the U.K. Engineering and Physical Sci-ences Research Council (EPSRC), and the Making Musical Moods MetadataTSB project. The associate editor coordinating the review of this manuscriptand approving it for publication was Prof. Bryan Pardo.The authors are with the School of Electronic Engineering and Com-

puter Science, Queen Mary University of London, London E1 4NS, U.K.(e-mail: [email protected]; [email protected];[email protected]; [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TASL.2013.2263801

amounts of resources including music, photos, video and textcontained in unstructured web documents. However, these doc-uments cannot be interpreted and used by machines directly,since standard Hyper Text Markup Language (HTML) docu-ments do not provide machine readable information about theircontent. For this reason, the Web presents an important chal-lenge in knowledge management. The Semantic Web was con-ceived in order to resolve these issues by creating a web of ma-chine-interpretable data as an extension to the current Web. Theconcept of the Semantic Web was initially proposed by BernersLee [1] in order to enable search through explicit specificationsof meaning in the content of web pages. Creating ontologies toenable formalized description and linking of resources within aparticular application domain is among the first steps towardsbuilding a web of machine-interpretable data.The semantic interpretation of music audio analysis relies

heavily on the availability of formal structures that encoderelevant domain knowledge. Many research groups built on-tologies manually to represent different types of data (e.g.music data, social data) within the formation of the SemanticWeb. Some examples of ontologies in the music domain arethe music ontology (MO) and the music performance ontology,grounded in the MO [2], [3]. The use of ontological modelsto access and integrate knowledge repositories is an importantcontribution, improving knowledge-based reasoning and musicinformation retrieval (MIR) systems alike [4]. There is alsosignificant benefits for the discovery of cultural heritagesof exchanging data among diverse knowledge repositories,such as musical instrument museums, libraries, institutions orrepositories. However, knowledge management in the domainof musical instruments is a complex issue, involving a widerange of instrument characteristics, for instance, physical as-pects of instruments such as different types of sound initiation,resonators, as well as the player-instrument relationship1. Rep-resenting every type of sound producing material or instrumentis a very challenging task since musical instruments evolvewith time and vary across cultures. The domain of musicalinstruments is broad and dynamic, including both folkloricinstruments made out of non-manufactured items (e.g. bladesof grass or conch sells), and new instruments relying on hightechnology (e.g. iPad app). Although much work has been doneon instrument classification in organology, there is currentlyno classification scheme encompassing such complexity anddiversity [6]. Thus, there is a need for automated systemsthat overcome these problems in knowledge management andontology engineering.

1More information, including detailed examples can be found in [5].

1558-7916 © 2013 IEEE

Page 2: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

2208 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2013

Hence, as a first step to solve this issue with regards to audiocontent, we developed a hybrid system for generating the classhierarchy of an ontology, automatically relying on the acousticalanalysis of isolated notes and solo performances played on var-ious musical instruments. The system performs two main tasks:i) musical instrument recognition, and ii) the construction of in-strument concept hierarchies. In the first part, the hybrid systemuses a classifier, either aMulti-Layer Perceptron neural network,or Support Vector Machines, to model sounds produced by in-struments according to their instrument categories (e.g., violin)and their attributes (e.g. bowed) using content-based timbre fea-tures, to find the ratio of relationship among instrument cate-gories and attributes. In the second part, the output of the in-strument recognition system is processed using Formal ConceptAnalysis to construct a hierarchy of instrument concepts.As a possible use case scenario, the proposed system may be

used as a plugin tab for ontology editor applications, such asProtege, in order to facilitate the ontology development processfor ontology engineers. Due to the fact that different sounds canindicate different characteristics of instruments, the processingof information directly derived from audio content contains veryimportant clues for the retrieval and management of musical in-strument knowledge. Therefore, the proposed system may alsobe used to overcome issues in current musical instrument clas-sification schemes developed in organology and musicology.To the knowledge of the authors, the proposed audio analysis

based automatic ontology generation is the first of its kind. Thesystem is based on a general conceptual analysis approach andcan be applied to any research fields that deal with knowledgemanagement issues. We, therefore, believe that this study con-tributes to the theory of the Semantic Web intelligence as wellas audio music analysis, and it will enable an improved method-ical design of automatic ontology generation.The rest of the paper is organized as follows: previous works

related to the musical instrument recognition and automatic on-tology generation are described in the second section. Section IIIexplains the automatic ontology generation system and the al-gorithms used in this study. Section IV presents the statisticalanalyses of the musical instrument recognition system and theevaluation of the automatically generated ontologies. Finally, inthe concluding section, we remark on the importance of this re-search problem and outline our future work.

II. RELATED WORK

The aim of the proposed content-based automatic ontologygeneration system is to process information from audio fileswith a classification system and accurately extract the termi-nology related to musical instruments in order to analyze theconceptual structure and assist ontology engineers during theontology construction process. Since the proposed system in-volves two different research areas, we will review both musicalinstrument identification and conceptual analysis studies.

A. Music Instrument and Family Identification

To automate musical instrument identification, various ap-proaches have been developed based on isolated notes, solo per-formances, or complex mixtures. The latter case is still in its

early days (see [7] for a thorough review). The results obtaineddepend on three main factors: the databases used during thelearning and testing stages, the features selected to characterizethe timbre of the instruments, and the classification methods.The isolated note or solo performances present an advantage

of simplicity and tractability, since there is no need to sepa-rate the sounds from different instruments. For example, Chétryet al. [8] proposed a system based on Line Spectral Frequen-cies (LSF), which are derived from a linear predictive anal-ysis of the signal and represent well the formant structure ofthe spectral envelope. The instrument identification unit of oursystem is based on this model. K-means clustering is used toconstruct a collection of LSF feature vectors, called codebook(due to the use of LSF features in speech data compression).The principle of K-means clustering is to partition a n-dimen-sional space (here the feature space) into K distinct regions (orclusters), which are characterized by their centres (called code-vectors). During the training stage, the K-means clustering isapplied on the LSF feature vectors extracted from several instru-ment recordings (isolated notes or solo performances). Duringthe testing stage, the K-means clustering is applied on the LSFfeature vectors extracted from the instrument recording to beidentified. The collection of the K codevectors (LSF vectors)constitutes a codebook, whose function, within this context, is tocapture the most relevant features to characterize the timbre ofan audio signal segment. The classification decisions are madeby finding which instrument minimizes the Euclidean distancebetween the LSF codebook associated with the audio sample tobe predicted and the LSF codebook associated with the instru-ment (training stage). The system achieved 95% performanceon a dataset comprising 4415 instrumental sound instances.In another study [9], Vincent and Rodet proposed a system

based on Gaussian Mixture Models (GMM) which was trainedand tested on isolated notes and solo recordings. The datasetwas gathered by extracting 2 excerpts of 5 seconds from eachof the 10 solo recordings used in the experiment. This approachyielded up to 90% of accuracy. Essid et al. [10], proposed asystem tested on a relatively large dataset. The same classifi-cation technique, GMM, was compared to Support Vector Ma-chines (SVM) with different audio features. Their system ob-tained a 12% performance improvement compared to a systembased on the SVM classifier, leading up to 87% of accuracyfor 0.5 s-long audio segments. Furthermore, the performance oftheir system increased from 6% points up to 93% of accuracy,using SVM on 5 s-long audio segments.However, there are only a few studieswhere instrument recog-

nition produces a hierarchical instrument structure. For example,Martin [11] proposed a system which was based on three dif-ferent hierarchical levels: 1) pizzicato (plucked) and sustainedsounds, 2) instrument families such as strings, woodwinds, andbrass 3) individual instruments for the corresponding instrumentfamilies. The recognition rate obtained with this system was90% for instrument family and 70% for individual instruments,while the dataset consisted of 1023 solo tones samples from 15instruments. Other hierarchical systems have been developedsince then by Eronen [12], Kitahara et al. [13] and Peeters [14].The overall correct identification rate of these systems are in therange of 35% to 80% for individual instruments, and 77% to 91%

Page 3: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

KOLOZALI et al.: AUTOMATIC ONTOLOGY GENERATION FOR MUSICAL INSTRUMENTS BASED ON AUDIO ANALYSIS 2209

for instrument family recognition. In general, the problem withhierarchical classification systems is that the errors at each levelpropagate increasingly to the other levels of the hierarchy.

B. Conceptual Analysis

Creating a class hierarchy is an important aspect of ontologydesign. Establishing such a hierarchy is a difficult task that isoften accomplishedwithout any clear guidance and tool support.Yet, the most commonly used hierarchical data mining tech-niques such as Hierarchical Agglomerative Clustering [15] andDecision Trees [16] do not take into account the relationshipsbetween objects. Therefore they do not provide an applicablesolution to knowledge representation issues and the multi-re-lational hierarchical design of ontology systems. This problembecomes even more apparent considering the multi-relationalnature of musical data.On the other hand, Formal Concept Analysis (FCA) allows

to generate and visualise the hierarchies relying on the relation-ships of objects and attributes. FCA, also known as concept lat-tice, was first proposed by GermanmathematicianWille in 1982[17]. It has been used in many software engineering topics suchas the identification of objects in legacy code, or the identifi-cation and restructuring of schema in object-oriented databases[18]. These works are important since ontologies provide thebasis for information and database systems [19]. Various spec-ification techniques for hierarchical design in object-orientedsoftware development have been proposed in [20]. This studysuggested alternative designs for FCA by not only utilizing at-tribute-based categorizations but also using different levels ofspecification details (e.g., objects, attributes, methods) in orderto obtain the class diagram of the software system. Furthermore,FCA has been used in conceptual knowledge discovery in col-laborative tagging systems [21], andwebmining studies in orderto create adaptive web sites utilizing user access patterns ex-tracted from Web logs [22].By offering a solution to bridge the gap between data and

knowledge automatically, FCA has generated considerable re-search interest, recently one of the influential ideas of automaticontology generation has been originally proposed by Maedcheand Staad [23] and can be described as the acquisition of a do-main model from data. Other FCA-based systems have beendeveloped since then by Cimiano [24], and Stumme [25]. Forinstance, one crucial requirement in ontology learning is that theinput data should represent the application domain very well.

III. AUTOMATIC GENERATION OF A SEMANTIC WEBONTOLOGY FOR MUSICAL INSTRUMENTS

The general architecture of the proposed system is shown inFig. 1. The system aims to automatically obtaining ontologydesigns in an Ontology Web Language2 (OWL) expressedfrom pre-labelled (tagged) music audio collection. The inputof the system is pre-labelled music audio collection and theoutput of the system is an OWL document that represents thecorresponding conceptualized structure of the data collection.The taxonomy of musical instruments given by Hornbostel and

2The Ontology Web Language is a W3C recommendation for defining andinstantiating web ontologies.

Fig. 1. Automatic ontology generation system based on audio features.

Sachs3 was considered as the basis for instrument terminologyand initial hierarchical structure. The hybrid system consists oftwo main units: i) Content-based audio analysis, ii) ConceptualAnalysis.Our experimental dataset consists of two sets of audio sam-

ples: one set of instruments’ isolated notes, and another set ofsolo performances, both collected from the following datasets,the Real World Computing (RWC) music collection4, Univer-sity of IOWA’s Musical Instrument Samples (MIS)5, McGillUniversity Master Samples6, and additional samples recorded atthe Centre for Digital Music, QueenMary University of London(QMUL). In the rest of this section, we describe each functionalunit of our system in more detail. The isolated note dataset con-tains recordings of 10 different musical instruments (15 pre-de-fined classes/objects and 12 musical attributes). The solo instru-ments dataset contains recordings of 8 musical instruments (12pre-defined classes/objects and 9 musical attributes).

A. Content-Based Audio Analysis

The content-based audio analysis involves two stages,namely, feature extraction and classification. The aim of theunit is to process information from audio files with a classifica-tion system and identify musical instruments and their proper-ties, accurately. In the content-based audio analysis stage, thesystem identifies the pre-labelled musical terms, 15 predefinedclasses of musical instruments (the hierarchy is deliberatelyomitted from the input)—chordophones7, aerophones8, bas-soon, cello, clarinet, flute, oboe, piano, saxophone, trombone,tuba, violin—and 12 musical instrument attributes—vibratingstring, vibrating air, sound initiation process:Bowed, soundinitiation process:Struck, reeds, edge, lip vibrated, reeds no:1,reeds no:2, valves:With valves, valves:Without valves, trueflutes.1) Feature Extraction and Clustering: The system is based

on the Short Time Fourier Transform (STFT) time frequencyrepresentation of audio signals. On the account of the factthat spectral envelope provides a good representation of the

3H. Doktorski, http://www.free-reed.net/description/taxonomy.html4http://www.staff.aist.go.jp/m.goto/RWC-MDB/5http://www.theremin.music.uiowa.edu/MIS.html.6http://www.music.mcgill.ca/resources/mums/html/.7Chordophone is a musical instrument category in which sounds are initiated

by string vibrations.8Aerophone is a musical instrument category in which sounds are initiated by

a vibrating mass of air.

Page 4: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

2210 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2013

spectrum while keeping a small amount of features, the timbremodels used in this study rely on features modeling the spectralenvelope which are obtained either from linear prediction (LP)or from Mel-Frequency Cepstral Coefficients (MFCCs).MFCCs model the short-time spectral characteristics of the

signal onto the Mel psycho acoustic frequency scale. For thecalculation of the MFCCs; the audio is first divided into shortframes, which are then transformed from time domain to fre-quency domain using the Discrete Cosine Transform (DFT).The magnitude spectrum of each frame is sent into a bank ofmel-frequency filters, whose logarithm is transformed back intothe time domain using the discrete cosine transform (DCT) [26].Line Spectral Frequencies (LSFs) are derived from the poly-

nomial coefficient of the inverse filter associated with LinearPredictive Coding (LPC) analysis. In Linear Predictive Codingthe production of sound in resonant systems is approximatedby a source-filter model. In this model, an excitation is passedthrough a resonant all-pole synthesis filter (inverse of the anal-ysis filter). The analysis filter coefficients are estimated on ashort-time basis from the auto-correlation of the signal. LineSpectral Frequencies can be obtained using the LPC analysisfilter polynomial written as a pair of symmetric and anti-sym-metric polynomials. The most important advantages of LSF fea-tures compared to direct form LPC coefficients are their simplefrequency domain interpretation and their robustness to quanti-zation [27]. Both LPC and LSF coefficients can be used to char-acterize the same aspect of timbre: the formant structure of asound, an important aspect of timbre.In order to classify various music performances or iso-

lated notes, we extract Mel-Frequency Cepstral Coefficientsand Line Spectral Frequencies (LSF) over overlapping audioframes using the method proposed in [8] and [28]. The timbre ofeach instrument is then characterized by a collection of MFCCand LSF feature vectors. In order to determine the best featurevector (codebook) dimensions with regard to performance,different number of feature coefficients (8, 16, 24 and 32) andnumber of clusters (K-means) were tested (8, 16, 32, and 64)[29]. In total 16 different codebook dimensions were tested foreach spectral feature set (LSFs and MFCCs). The details aregiven in the statistical analysis section.The recordings were sampled at 44100 Hz and the short-term

audio features were considered on successive frames of 1024samples, weighted by a Hamming window [8].2) Classification: The classification was performed using

both Multi-Layer Perceptron (MLP) neural network and Sup-port Vector Machines (SVM) which are supervised learning al-gorithms. Our goal is to associate audio signals related to in-strument and attributes. The input features are represented by asingle LSF/MFCC feature vector which is composed by the av-erage and variance of the K vectors obtained after the K-meansclustering of vectors calculated from all frames. MFCC and LSFfeature sets have both been used separately in the experiment.The associated results have then been analysed using statisticalanalysis (ANOVA). In the training and testing stages, we use amulti-network system (one network for each instrument objectsor attribute) consisting of 27 networks for the isolated notes and21 networks for the solo music dataset both in case of the MLPand SVM classifiers. For this experiment, we used the default

Matlab Neural Network Toolbox, and the SVM Toolbox whichwas published in [30]. The dataset was randomly divided 4 timesfor cross validation. In each experimental run 75% of the sam-ples were used for training and 25% were used for testing. Theoverall results are obtained by averaging the results obtained inthe 4 experimental runs.

a) Multi-layer perceptron: Multi-Layer Perceptron isamong the most common types of neural networks. Its com-puting power results from the parallel and distributed structure,and the ability to learn. The MLP networks contain two hiddenlayers, with 10 neurons in each hidden layer and an output layerwith one neuron. The activation function for each neuron inthe hidden layer is a tan-sigmoid function, and a linear transferfunction was selected for the output (purelin function inMatlab). The MLP was trained using the Levenberg-Marquardt(LM) [31] back-propagation algorithm. For this experiment,the number of iterations was set to 1000 and the parameters forthe learning rate and momentum of the MLP were 0.3 and 0.6,respectively.

b) Support vector machines: Support Vector Machines(SVM) have been widely used as an alternative to NeuralNetworks in modern machine learning. Their basic principle isto find out the best hyperplane leading to the largest distanceseparation between classes. The formulation embodies theStructural Risk Minimization (SRM) principle used to maxi-mize the margin of separation. SVM algorithms determine thedata points in each class that lie closest to the margin (decisionboundary), which are called support vectors. Intuitively, a goodseparation is achieved by a margin that has the largest distanceto the support vectors [32].There are a number of different kernels that can be used in

Support Vector Machines models. These include linear, polyno-mial, radial basis function (RBF), and sigmoid. In our experi-ments, we focused on polynomial kernel functions of variousdegrees. We only tested one type of kernel in the present studysince the use of different SVM kernels only lead to small accu-racy differences (2–3%) in previous musical instrument classifi-cation studies [10], [33], and that such small difference wouldn’taffect the output of the automatic ontology generation system.For this experiment, the lambda kernel parameter ( ) was set to

as described in [34], where is the number of instrumentsin the database. The degree of polynomial kernel function wasset to two different values, 2 and 3, to test which performed best.Due to the satisfactory results obtained with a polynomial kernelof degree 3, higher degrees were not tested.

B. Conceptual Analysis

The conceptual analysis involves two stages, namely, FormalConcept Analysis (FCA) and lattice pruning. Formal ConceptAnalysis is performed using the Colibri-Java library9 in order togenerate a hierarchical structure using the outputs of the classi-fiers. To determine the binary associations between instrumentsand attributes two criteria need to be verified: (i) a candidate re-lationship is determined as follows:

(1)

9http://www.code.google.com/p/colibri-java/

Page 5: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

KOLOZALI et al.: AUTOMATIC ONTOLOGY GENERATION FOR MUSICAL INSTRUMENTS BASED ON AUDIO ANALYSIS 2211

TABLE ICROSS TABLE REPRESENTING A FORMAL CONTEXT BETWEEN A SETOF INSTRUMENTS (CELLO, PIANO, VIOLIN) AND A SET OF ATTRIBUTES

(VIBRATING STRING, BOWED, STRUCK)

where is the attribute, is the instrument, and theis Precision. (ii) the binary association criteria is given by:

(2)

The obtained relationships are used to form the formal con-text, that represents the relationships among instrument classes/objects and attributes, to generate a graphical representation ofconcepts into a lattice form. Finally, in the lattice pruning stage,empty concepts are eliminated and the hierarchical form is re-vised in order to generate the OWL output of the system.1) Formal Concept Analysis:a) Definition: Formal Concept Analysis is a mathematical

theory of concept hierarchies which is based on Lattice Theory.Thekeystoneof theFCAis thenotionof formal context.A formalcontext is defined as a binary relation between a set of objects anda set of attributes. In a formal context, a pair, formed by a set ofobjects and a set of attributes that uniquely associate with eachother, is called a formal concept. The set of objects are referred toas extent closure, and the set of attributes are referred to as intentclosure. In the reminder of this Section, the notions underlyingFormal Concept Analysis are defined following Ganter et al.’sformalism [35] and illustrative examples are given.Definition 1 (Formal Context): A formal context

is composed of a set of objects G, a setof attributes M, and a binary relation . We call I theincidence relation and read as the object g has theattribute m. The relation of an object to an attribute is denotedas .A formal context can be represented by a cross table where

the rows are defined by the object names and the columns aredefined by the attribute names. In Table I, the formal context iscomposed of three objects representing three instruments (cello,piano, violin), and three attributes representing three instrumentproperties (vibrating string, sound initiation process:Bowed,and sound initiation process:Struck). A symbol “ ” in rowand column means that the object has the attribute—that is, the object has the indicated attributes (e.g. the

cello instrument has the attributes “vibrating string and soundinitiation process:Bowed”).Definition 2 (Derivation Operators): For a subset of

objects, we define a set of attributes common to the objects inA as:

(3)

and reciprocally, for a subset of attributes we define aset of objects which have all attributes in B as:

(4)

The following statements are the derivation operators for agiven context , its subsets of objectsas well as its subsets of attributes:

(5)

(6)

(7)

(8)

The first derivation of the set of objects is the attributeswhich are possessed by those objects, and we can apply the

second derivative operator to obtain the objects possessedby these attributes . In addition, if a selected object set isenlarged, then the common attributes of the larger object setis among the common attributes of the smaller object set. Thesame principle applies for the enlarged attribute set.Definition 3 (Formal Concept): A pair (A, B) is a formal

concept of if and only if

(9)

The set A is called the extent, and B is called the intent of theformal concept (A, B).Example 1: Table I gives an example of formal context

based on

and the binary relation “ ” represented bythe “ ” (has/has not) in the cross table. As intent

,and extent

)is not a formal concept of (G, M, I). However, intent

, extent, therefore the pair

is a formal concept.Definition 4: Let (A1, B1) and (A2, B2) be two formal con-

cepts of a formal context (G, M, I), (A1, B1) is called the sub-concept of (A2, B2) and denoted as , ifand only if . Equivalently, iscalled the superconcept of . The relation is calledthe hierarchical order (or simply order) of the formal concepts.Example 2: Let

andbe two

formal concepts by considering Table I. Asand

,C0 is a superconcept of C1. Equivalently, C1 is called a sub-concept of C0.Definition 5: The family of concepts which obeys the above

mathematical axioms is called a concept lattice. The lowerbound of the concept lattice is called infimum, and its upperbound is called supremum.

b) Many valued contexts: The representation of instru-ment classes and attributes in OWL is another important taskin ontology generation. Therefore, we need to explicitly define

Page 6: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

2212 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2013

TABLE IIA NAIVE SCALING AND CROSS TABLE OF A FORMAL CONTEXT

the type of attributes we use as an input. In computer science,an attribute is a specification that defines a property of an ob-ject, data or file, in our case instrument classes. In OWL thereare two kind of properties: object properties, which relates ob-jects to objects, and data properties, which relates objects todata type values. Since the basic data type of Formal ConceptAnalysis is that of a binary formal context (single-valued), itis difficult to explicitly define whether an attribute is an ob-ject or a data property. Therefore, we relate this issue to thesingle-valued and many-valued representation of data in data-base design. In order to overcome this issue in the context ofFormal Concept Analysis, before the experiment in the taggingstage we identify the type of attribute in a many-valued context,and during the experiment we use a conceptual scaling tech-nique, the dichotomous scale, to transform a many-valued con-text into a one-valued context. The dichotomous scale whichdefines the negations among attributes has been used by Ganter[36], [37]. To obtain the scaled context, each many-valued at-tribute has been replaced by the corresponding row of the scale.We applied this method on instrument many-valued attributes,such as sound initiation process, reeds no and valves, in orderto understand if the property is an ‘object property’ or a ‘dataproperty’. Thus, these attributes have been replaced by scale at-tributes: e.g., sound initiation process:Bowed, sound initiationprocess:Struck, reeds no:1, reeds no:2, valves:With Valves andvalves:Without Valves. A many-valued context example is givenin Table II illustrating the transformation of the many-valuedcontext into a one-valued context. Note that in this way, thesystem automatically detects the data properties with their cor-responding data, as these are easily identifiable (see right handside of the table). Expressions which do not contain a colon (notpresent in the particular example in Table II) represent objectproperties.2) Lattice Pruning: Lattice Pruning is the process of re-

moval of “empty or unnecessary repetitions“ of concepts, ob-jects or attributes based on any of the necessity and stabilitynotions that are defined by knowledge engineers. The conceptlattice of is the set of all formal concepts, ordered assubconcepts-superconcepts, that depicts particularities and re-lationships of our data. Each node represents a formal concept.However, each of these nodes involve object and attribute rep-etitions in order to illustrate the relationship among the nodes.Therefore, in order to formally define the transformation of the

lattice into the partial order or a concept hierarchy, to subse-quently make it simpler and more readable, we used a pruningtechnique called reduced labelling.Here, the main idea is that each object is entered only once

in the hierarchical form [38]. In other words, we remove anyterms from the inner node which are the same as their children[24]. The difference, as compared with previous Formal Con-cept Analysis studies, is that we deleted the corresponding in-fimum edges of the empty and the non empty sets on the latticeform. Thus, the taxonomic (i.e. symmetric) reflection of eachsuperconcept for the predicted concept hierarchy is removed.Then, the reduced labelling technique is applied to the latticediagram. For example in Table III, in order to obtain a concepthierarchy for Chordophones concept that involves the concepts, , and , we delete the node , since we accept it

as a symmetric reflection of . Subsequently, we remove theChordophones term from the extents of and that led tous having a simple concept hierarchy. In addition, andare the supremum and the infimum of the lattice diagram, there-fore, we also delete these nodes during the pruning process.3) Construction of Musical Instrument Concept Hierar-

chies: As mentioned previously, the aim of the instrumentidentification experiments were to find the associations be-tween musical instruments and their attributes, in order toautomatically generate a Semantic Web ontology for musicalinstruments. Therefore, the outputs of the best musical instru-ment recognition systems of each dataset were used to obtainthe associations among instrumental attributes. The overallperformance of the musical instrument recognition system wasevaluated by computing the average and standard deviation ofthe system’s F-measure across instruments. The binarizationprocess was applied on the obtained results and each networkexperiment was run for 4 different training/testing sets (crossvalidation technique), to prevent biased results.In order to generate a binary context for FCA, a threshold

of 0.5 was used to determine whether an instrument possessedan attribute or not, as given in (1)–(2). The results obtainedwith SVM (3rd degree Polynomial kernel) are satisfactory (onboth datasets, solo music and isolated notes) for our purpose offormal context generation since all the attributes are associatedto the relevant instruments (no errors were made after binariza-tion). The formal context obtained after binarization of the re-sults of the isolated notes can be seen in Table IV.The identified formal context was used as an input to the

FCA algorithm. The formal concepts are extracted by applyingFCA to the context generated in the instrument recognitionsystem. Table III shows the extracted formal concepts andTable III(b) shows the graphical representation of the corre-sponding concepts in a concept lattice form using a line dia-gram. Each concept is a cluster of instrument objects and at-tributes shared by the objects. The concept lattice is constructedto interpret the subconcept and superconcept relationships be-tween concepts. It consists of 17 formal concepts which arerepresented by the 17 circles in the diagram. The labels of thecircles represent the extent (E) and intent (I) of each formalconcept node. By following the ascending paths and the con-nected circles by edges, the diagram shows the concepts andtheir subconcepts.

Page 7: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

KOLOZALI et al.: AUTOMATIC ONTOLOGY GENERATION FOR MUSICAL INSTRUMENTS BASED ON AUDIO ANALYSIS 2213

TABLE IIIEXTRACTED FORMAL CONCEPTS FROM Table IV (LEFT) AND THE GRAPHICAL CONCEPT LATTICE REPRESENTATION OF THE CORRESPONDING CONCEPTS (RIGHT)

TABLE IVFORMAL CONTEXT OBTAINED AFTER BINARIZATION OF THE RESULTSFOR SVM WITH THE 3RD DEGREE POLYNOMIAL KERNEL USING 32LSF FEATURES AND 64 CODEVECTORS FOR ISOLATED NOTES

Finally, as we mentioned in this section on reduce labeling,the concept lattice labels are reduced successfully into 12 formalconcepts after removing the empty and non empty infimum con-cepts, such as , , , and . The class hierarchyof the instrument ontology is transformed to the ontology weblanguage (OWL) using the OWL API Java library10. The final

10http://www.owlapi.sourceforge.net/

class hierarchies of the acquired instrument ontologies are givenin Fig. 2.

IV. EVALUATIONS

A. Statistical Analysis of the Musical Instrument RecognitionSystem

In order to determine the level of accuracy of the musicalinstrument recognition system, F-Measures were computed forvarious combinations of classifiers, audio spectral features, andcodebook dimensions (i.e. no. of coefficients and no. of clus-ters). The F-measure was classically obtained based on the pre-cision and recall of the identification.In order to test whether any significant differences occur

when the codebook dimensions, the spectral features, and theclassifiers, four-way analyses of variance (ANOVA) were con-ducted with the classifiers (i.e. MLP, SVMwith Polynomial 2ndDegree, SVM with Polynomial 3rd Degree), the audio spectralfeatures (i.e. LSFs and MFCCs), and the codebook dimensionsranging from (8 to 32) to (8 to 64), as independent variables.The dependent variable was the F-Measure. The effects ofthe factors along with interaction effects were analyzed fromone-way up to four-ways by using the partial eta squared indexof effect size. When interactions were observed, a multiplecomparison procedure (MCP) was conducted to identify ifthere was a significant difference between the factors that havebeen used in the experiments. The Holm-Sidak procedure [39]was used here, as in [40]. The definitions in [41] have beenadopted to discuss the effect sizes: small effect size ,medium effect size and large effect size

. ANOVA level of significance are reported

Page 8: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

2214 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2013

TABLE VPERFORMANCE OF THE MUSICAL INSTRUMENT RECOGNITION SYSTEMS FOR THE ISOLATED NOTES AND

SOLO MUSIC DATASETS. IN EACH CASE, THE BEST PERFORMANCE IS REPORTED IN BOLD

TABLE VIRESULTS OF FOUR-WAY ANALYSES OF VARIANCE FOR THE MUSICAL INSTRUMENT RECOGNITION SYSTEM. IS THE PARTIAL ETA SQUAREDMEASURE OF EFFECT SIZE. . CORRESPONDING MEANINGS OF THE ABBREVIATIONS ARE AS FOLLOWS:

CSLR: CLASSIFIER; ASFT: AUDIO SPECTRAL FEATURE SET; COEF: COEFFICIENT NUMBER; CLUN: CLUSTER NUMBER

using the F-statistics and probability . A risk of .05 wasused in all statistical tests.1) Results of the Musical Instrument Recognition System:

The performance of six systems using LSFs and MFCCs are re-ported in Table V. The total average correct identification ratesranges between 67.5% and 87.5% for the solo music, and 38.5%and 90.3% for the Isolated Notes. For the MLP classifier, the av-erage correct identification rates were 76% and 46.7%, for thesolo music and isolated notes, respectively.Overall, the best results were found by using the SVM poly-

nomial at 3rd degree for both of the datasets: for instance, theaverage correct identification rate was slightly increased, up to83.0% and 86.3% for the solo music and the isolated notes, re-

spectively. The highest performance for this classifier was ob-tained with 32 coefficients and 64 codevectors for both of thefeature sets, LSF (87.5%) andMFCC (83.1%), on the solomusicdataset. For the isolated notes, although the highest accuracy(90.3%)was obtained with the same settings, 32 coefficients and64 codevectors, for the LSF feature set, the best performancewas obtained with 8 MFCCs and 64 codevectors (87.7%) forthe MFCCs.2) Comparison of the Classifiers and the Spectral Feature

Sets: The results of the four-way analyses of variance results formusical instrument recognition system is reported in Table VI.Highly significant effects of the classifiers were foundfor both the solo music and isolated note datasets,

Page 9: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

KOLOZALI et al.: AUTOMATIC ONTOLOGY GENERATION FOR MUSICAL INSTRUMENTS BASED ON AUDIO ANALYSIS 2215

and ,respectively. The effect size of the classifier factor was mediumsize on the solo music dataset, whereas a very largeeffect size was found for the isolated notes.We also found a significant effect for the audio spectral

feature sets on both of the datasets,, for the solo music, and, for the isolated notes. The effects were

of small size for both datasets ( and ),respectively.The posthoc analyses (multiple comparison procedure) con-

ducted for the solo music and the isolated notes showed thatthe SVM classifier was significantly better than the MLP clas-sifier independently of the polynomial degree andthe dataset. The average differences between the MLP classi-fier and the SVM degree 2 , 3

polynomials was significant at the .05 level forthe solo music dataset. Although a significant difference oc-curred for the isolated notes as well, the average differences be-tween MLP and SVM-based F-measures surprisingly increased( for degree 2, andfor degree 3), respectively. This shows that there was also asignificant difference between the performance of SVM poly-nomial kernels of 2nd and 3rd degrees. The SVM with a 3rdorder polynomial kernel performed significantly better than theSVM with a 2nd order polynomial kernel for both datasets:

for solo music, andfor the isolated notes.The posthoc test also revealed that there was significant dif-

ference between LSF and MFCC feature sets for both datasets.The LSF feature sets performed slightly better than the MFCCfeature set for the solo music dataset, whereas, MFCCs per-formed slightly better than LSFs on the isolated notes dataset.This is in line with Chetry’s results [34]. The average F-mea-sure differences between the LSF and the MFCC feature setswere in the range for the solo music dataset,

for the isolated notes dataset.3) Influence of the Codebook Dimensions: The anal-

ysis of variance showed that the effect number of coef-ficients was highly significant for solo musicand isolated notes, and

, respectively. The effect sizeswere found to be small for both datasets ( , solo music,and , isolated notes). There was also a significanteffect of the number of codebook clusters related tothe K-means algorithm, on both datasets. The effect sizes weresmall ( , solo music, and , isolated notes).There was no significant mean differences in the results whenthe number of clusters was varied.With regard to the codebook dimensions in the case of solo

music, the posthoc test revealed that there was no significantdifferences, except for 8 coefficients, when the number of fea-ture coefficients varied by only 8. However, there was a sig-nificant difference between experiments where thecoefficient number differed from 16 (e.g. between 16 and 32)with a small average difference, . Highly sig-nificant differences were found when the numberof coefficient was small (8) compared to other cases (16, 24,

32) for spectral features in general, with a small average dif-ferences ( , , and

, respectively).For isolated notes, the same pattern occured. The average dif-

ference between the 16 and 32 feature coefficient was, and the average differences between 8 feature co-

efficients and the other coefficient dimensions, 16, 24 and 32,were , , and

, respectively.4) Relationships Between the Factors: The interaction be-

tween the classifier factor, the spectral feature sets, and the di-mensions of the feature vector were highly significant for solomusic, and

,, respectively. Although

the effect size of the interaction between classifier and spectralfeature sets was larger than the interaction between classifierfactor and coefficient factor, both interaction effects were small

. There was also a highly significanteffect of the interaction between the classifier and the spectralfeature set factors, , for iso-lated notes with a small effect size . Nevertheless,there was no significant interaction effect between the classifierand the dimensions of the feature vectors for the isolated notes.The interaction of spectral features and the number of coef-

ficients yielded an ratio of ,and , for solo music and isolatednotes respectively, indicating that there was highly significanteffect of the interaction of these factors on both datasets. Theeffect sizes of the interaction were very small ( and

) for both datasets.Finally, the interaction between the classifier, the spectral fea-

ture set, and the dimensions of the feature vectors were highlysignificant on the solo music dataset,

, and significant on the isolated notes dataset,. The effect sizes were small for

the solo music, whereas it was very small for iso-lated notes.

B. Evaluation of Conceptual Analysis

A number of ontology evaluation metrics for automaticallygenerated ontologies have been proposed in the literature. Themostly used techniques are: i) human assessment, ii) compar-ison to a pre-defined gold-standard or hand-crafted ontology,and iii) task-based evaluation in a running application. Humanassessment evaluation wasn’t used since it is difficult to choosethe right set of people, e.g., ontologists, end-users or domain ex-perts. Considering the limited number of instrument categoriesin the generated instrument ontologies, and the difficulty offinding labeled music pieces with a single instrument, we didn’tapply a task-based evaluation. Thus, we based our evaluationon comparison to an existing hand-crafted taxonomy.1) Evaluation Metrics: We used the evaluation method pro-

posed by [42]. This approach is the modified version of one ofthe most popular approach in the ontology learning field [43].We applied Taxonomic Overlap, which is one of the measuresused in both studies. Taxonomic overlap is a similarity measurethat takes into account the taxonomy structures of ontologies.

Page 10: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

2216 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2013

In particular, each concept in a learned taxonomy and a corre-sponding concept in a reference ontology are compared basedon howmuch their ancestors and descendants are similar to eachother as described in [44].The idea is based on two different measures: i) local taxo-

nomic measure, and ii) global taxonomic measure. The localtaxonomic measure compares the positions of two concepts, andthe global taxonomic measure compares the entire concept hi-erarchy of the two ontologies. The local taxonomic precision isgiven by the following equation:

(10)

where is the character extraction that gives the characteristicobjects for the position of a concept in the hierarchies and. For the taxonomic overlap measure the semantic cotopy,, and common semantic cotopy, is given by:

(11)

(12)

where and represent AND and OR logical commands.returns all the descendants and returns all the ancestorsfor the concept in taxonomy. The corresponding ontology isdefined by whereas the corresponding set of concepts for thetaxonomy is defined by . The common semantic cotopy, ,is another taxonomy overlap measure which exclude the corre-sponding concept from its common semantic cotopy, as well asall the concepts that are not included in the concept set of theother ontology. The set of concepts for the corresponding on-tologies ( and ) are defined as and , respectively. Inthe optimistic assessment as in [43], the current concept is com-pared with all the concepts from the reference ontology and thehighest precision is chosen by picking the best match of in .The global taxonomic precision and recall are definedby the following equations:

ifif

(13)

(14)

where represents the local taxonomic precision andrepresents the taxonomic overlap recall of the corresponding on-tology based on the semantic cotopy, . The local taxonomicprecisions are summed up and averaged over all the taxonomicoverlaps for a set of concepts in the corresponding ontology. The common semantic cotopy of the global taxonomic

overlap is computed as follows:

(15)

(16)

In the (15) and (16), the local and are summed upand averaged over all the taxonomic overlaps according to acommon set of concepts of the ontologies. Finally, we used thetaxonomic F-measure calculating the harmonic averageof taxonomic overlap in both and , the automaticallygenerated and reference ontologies, respectively. The equationfor the taxonomic F-measure is given by:

(17)

At the lexical comparison level, the comparison of the on-tologies is restricted to comparing their lexicons without con-sidering the conceptual structures defined above. The measureis based on a technique called edit distance, proposed by Lev-enshtein [45]. This measure has been used in conjunction withthe taxonomic measures to evaluate ontologies. The equationsfor the lexical precision and lexical recall are asfollows:

(18)

(19)

In addition to the taxonomic F-measure, there is a need fora higher level metric that involves not only the quality of con-cept hierarchy but also the lexical measure of the ontologies. Toavoid this issue we used a higher level F-measure, , givenby (for further details, including detailed examples see [42]):

(20)

2) Quantitative Comparison: In this section the measurespresented in Section IV-B-1 will be analytically evaluated onthe basis of our experimental concept hierarchies. Fig. 2 illus-trates the automatically generated ontologies and the referenceontology. In order to analyse the ontology generation system,the highest and the lowest instrument recognition performancein both datasets were used to determine ontologies. The hand-crafted reference concept hierarchy (based on Hornbostel andSachs’ system) is denoted, , and the automatically gener-ated concept hierarchies for the highest and the lowest perfor-mance of the recognition system are denoted, and forthe isolated notes, and and for the solo music dataset.The ontology evaluation results are given in Table VII.Compared to the values of the taxonomic measures

are slightly lower than the corresponding values of the lexicalmeasures, since there is no error on the lexical term layer. Itshould be noted that the {Brass Instruments, Tuba, Trombone}concepts was not taken into account during evaluations of solomusic, since those instruments were not present in the solomusic dataset.As can be seen in Table VII, the semantic cotopy of the on-

tology is almost identical to the reference ontology .For example, the semantic cotopy of the concept Aerophones inthe hand-crafted ontology in Fig. 2(a) is {Thing, Aero-phones, Edge instruments, Brass Instruments, Reed Pipe Instru-ments, Flute, Trombone, Tuba, Clarinet, Saxophone, Bassoon,

Page 11: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

KOLOZALI et al.: AUTOMATIC ONTOLOGY GENERATION FOR MUSICAL INSTRUMENTS BASED ON AUDIO ANALYSIS 2217

Fig. 2. Automatically generated concept hierarchies , compared to the reference concept hierarchy .

Oboe} and the semantic cotopy of the Aerophones in the au-tomatically generated ontology in Fig. 2(b), , is {Thing,Aerophones, Edge instruments, Brass Instruments, Reed PipeInstruments, Flute, Trombone, Tuba, Clarinet, Saxophone, Bas-soon, Oboe}. That is, the semantic cotopy of the concept Aero-phones is identical in both ontologies. It is possible to see the

same identical overlap for other concepts as well. However, forthe concepts Edge Instruments and Flute the semantic cotopy in

is Thing, Aerophones, Edge Instruments, Flute, whereasfor the concept Edge Instruments the semantic cotopy inis Thing, Aerophones, Edge Instruments and for the Flute it isThing, Aerophones, Flute.

Page 12: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

2218 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2013

TABLE VIIEVALUATION OF AUTOMATICALLY GENERATED ONTOLOGIES WITH A

SEMANTIC COTOPY AND COMMON SEMANTIC COTOPY BASED MEASURE FORISOLATED NOTES ( AND ) AND SOLO INSTRUMENTS ( AND )

Parallel to this, there were slightly lower results for thecommon semantic cotopy of ontology . It is possible tosee the same problem with this metric as well, for example,the common semantic cotopies of the concept Aerophonesand many other concepts are identical (e.g., {Thing, Edgeinstruments, Brass Instruments, Reed Pipe Instruments, Flute,Trombone, Tuba, Clarinet, Saxophone, Bassoon, Oboe}).However, the conceptual hierarchy dissimilarity of the EdgeInstruments and the Flute concepts were also reflected oncommon semantic cotopy measurements; for instance, for theEdge Instruments and the Flute in , the common semanticcotopies are Thing, Aerophones, Flute and Thing, Aerophones,Edge Instruments, whereas in for both of the concepts, itis Thing, Aerophones.In fact, except the concepts Edge Instruments and Flute,

every leaf concept in the reference concept hierarchy has amaximum overlap with the corresponding concept hierarchyin . Thus, it is evident from the results of , in termsof arrangement of the leaf nodes and abstracting from theinner nodes, obtained fairly high as shown in Fig. 2(b). Thegood correspondences obtained from instrument identificationsystem lead us to a high precision and recall with respect to thetaxonomic overlap. As can be seen in the and columnsof the Table VII, there are no errors on the lexical term layer,since all the terms also exist in the generated ontology.Compared to eight concepts are missing in , but

the hierarchy of the remaining concepts are mostly not changed.The errors on the lexical term layer of the learned ontologyare slightly less than the taxonomic measures; this leads to avery small increase of the . On the contrary,and are independent from the lexical precision and re-call, therefore there was a significantly high taxonomicmeasure,as can be seen in Table VII(b). However, it is worth pointingout that the influence of the lexical measure is even higher oncommon semantic cotopy compared to semantic cotopy tax-onomy measures in .For the third and the fourth ontologies , which

are based on the solo music dataset, although the difference be-tween the average correct identification rates was almost 20%(see Table V), it is highly probable that the conceptual termsin both ontologies successfully passed the predefined threshold(50%) regarding the binary context. This evidence suggests thatthe ontology generated from solo music dataset was unaffected

Listing 1. Sample of generated musical instrument ontology.

by the changes in spectral feature sets and the codebook dimen-sions. As can be seen in Fig. 2(d), the hierarchy of the ontologywas not changed except for the concepts of Edge Instrumentsand Flute. Due to the fact that there is only one leaf for the EdgeInstruments, and that every data in the concept of Flute is mostlikely to be the same data in the Edge Instruments, it may bereasonable to assume that this case represent a challenge for thecase of audio signals to generate ontologies by FCA. The Flutecould only be separated from Edge Instrument if there was atleast one more instrument concepts with an identical attribute.Thus, the problem could be solved in the lattice pruning phaseby removing the corresponding lower bound, infimum edge, ofthe Edge Instruments.It is worth pointing out that the derived musical instrument

hierarchy based on audio features is valid from an organolog-ical point of view. Indeed, the instrument hierarchy shownin Fig. 2(a) is fairly similar to the corresponding part of theHornbostel and Sachs hierarchy, which was used as a refer-ence (training stage) in this study. The results shows that theproposed system produces a promising conceptual hierarchydesign, using the ontology web language. This could be ofassistance to music ontology engineers. A sample from thegenerated OWL file is presented in Listing 1. More detailsregarding the generated OWL files can be found online11.

V. CONCLUSIONS

Our research is motivated by the fact that the music on-tology design processes currently do not incorporate automatedlearning systems. This makes the construction of ontologiesexpensive and highly dependent on human supervision. Todate, only manually created ontologies have been proposedwithout any clear guidance.In this study, we proposed a hybrid system to build an

Ontology Web Language based ontology automatically usingtimbre based instrument recognition techniques and FormalConcept Analysis. The system was tested using MLP and SVM

11http://www.isophonics.net/content/automatic-ontology-generation-based-semantic-audio-analysis

Page 13: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

KOLOZALI et al.: AUTOMATIC ONTOLOGY GENERATION FOR MUSICAL INSTRUMENTS BASED ON AUDIO ANALYSIS 2219

classifiers by modeling different codebook dimensions of thetimbre features, LSFs and MFCCs, on various instruments fromwind to string families in order to find the best performance.We found that in all cases the system succeed to generate afairly good ontology for the solo music dataset. Contrarily,significant effects of the classifiers and the codebook vectorson ontology generation were highlighted for the isolated notesdataset. Overall, the best results were obtained for SVM withpolynomial kernel of the 3rd degree using 32 LSF and dic-tionaries of 64 codevectors in both solo music (87.5%) andisolated notes (90.3%) datasets.In general, reasonable results were obtained with regard to the

hierarchical design of the instrument ontology on both datasets.In the context of the Semantic Web, these findings confirm thatthe proposed hybrid system enables a novel methodical designfor automatic ontology generation. Most notably, to the authors’knowledge, this is the first study to investigate automatic on-tology generation in the context of audio and music analysis. Inaddition, the proposed system can be applied to a broad range ofresearch fields investigating knowledge exploitation and man-agement in order to overcome the complex problems of com-munication, knowledge sharing and project management. In fu-ture work, we will continue towards incorporating a wider setof musical instruments and attributes and utilizing more OWLlanguage features. The developments of techniques to handlemultimodal data (e.g. textual, audio and visual) may improvefuture ontology generation systems relying on the wide diver-sity of data on the Web (e.g. text, sound, photos, and videos).

ACKNOWLEDGMENT

The authors wish to thank to Dan Tidhar for his assistance ondevelopment of the ground truth and the anonymous musicolo-gists for providing valuable resources which helped us in evalu-ation of our experiment. This work was partly supported by theNetworked Environment for Music Analysis (NEMA) project,Online Music Recognition And Searching (OMRAS2), the UKEngineering and Physical Sciences Research Council (EPSRC),and the Making Musical Moods Metadata TSB project.

REFERENCES[1] T. Berners-Lee, J. Handler, and O. Lassila, “The semantic web,” Sci.

Amer., 2001.[2] Y. Raimond, S. Abdallah, M. Sandler, and F. Giasson, “The music on-

tology,” in Proc. 7th Int. Conf. Music Inf. Retrieval, 2007.[3] G. Fazekas, Y. Raimond, K. Jacobson, and M. Sandler, “An overview

of semantic web activities in the omras2 project,” J. New Music Res.Spec. Iss. Music Informatics and the OMRAS2 Project, vol. 39, no. 4,pp. 295–311, 2010.

[4] M. Abulaish, Ontology Engineering for Imprecise Knowledge Man-agement. Saarbrücken, Germany: Lambert Academic, 2008.

[5] S. Kolozali, G. Fazekas, M. Barthet, and M. Sandler, “Knowledge rep-resentation issues in musical instrument ontology design,” in Proc. Int.Soc. for Music Inf. Retrieval Conf., Oct. 24–28, 2011, pp. 465–470.

[6] M. Kartomi, “The classification of musical instruments: Changingtrends in research from the late nineteenth century, with specialreference to the 1990s,” Ethnomusicol., vol. 45, pp. 283–314, 2001.

[7] , A. Klapuri and M. Davy, Eds., Signal Processing Methods for MusicTranscription. New York, NY, USA: Springer, 2006.

[8] N. Chétry, M. Davies, and M. Sandler, “Musical instrument identifica-tion using LSF and K-means,” in Proc. AES 118th Conv., 2005.

[9] E. Vincent and X. Rodet, “Instrument identification in solo and en-semble music using independent subspace analysis,” in Proc. Int. Conf.Music Inf. Retriev., 2004.

[10] S. Essid, G. Richard, and B. David, “Musical instrument recognitionby pairwise classification strategies,” IEE Trans. Audio, Speech, Lang.Process., vol. 14, no. 4, pp. 1401–1412, Jul. 2006.

[11] K. D. Martin, E. D. Scheirer, and B. L. Vercoe, “Music content anal-ysis through models of audition,” in Proc. ACM Multimedia WorkshopContent Process. Music for Multimedia Applicat., Sep. 1998.

[12] A. Eronen and A. Klapuri, “Musical instrument recognition usingcepstral coefficients and temporal features,” in Proc. IEEE Int. Conf.Acoust., Speech, Signal Process., 2000, pp. 753–756.

[13] T. Kitahara, M. Goto, and H. G. Okuno, “Musical instrument iden-tification based on f0-dependent multivariate normal distribution,”in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2003, pp.421–424.

[14] G. Peeters and Xavier, “Hierarchical gaussian tree with intertiaratio maximization for the classification of large musical instrumentdatabases,” in Proc. 6th Int. Conf. Digital Audio Effects (DAFX-03),2003.

[15] P. Cimiano, A. Hotho, and S. Staab, “Learning concept hierarchiesfrom text corpora using formal concept analysis,” J. Artif. Intell. Res.,2005.

[16] A.-E. Elsayed, S. R. El-Beltagy, M. Rafea, and O. Hegazy, “Applyingdata mining for ontology building,” in Proc. ISSR, 2007.

[17] R. Wille, Restructuring Lattice Theory: An Approach Based on Hier-archies of Concepts, I. Rival, Ed. Boston, MA, USA: Reidel, 1982,pp. 445–470.

[18] G. Snelting, “Concept lattices in software analysis,” in Proc. Int. Conf.Formal Concept Anal., 2003, pp. 272–287.

[19] A. Yahia, L. Lakhal, J. P. Bordat, and R. Cicchetti, “An algorithmicmethod for building inheritance graphs in object database design,”in 15th Int. Conf. Conceptual Modelings. New York, NY, USA:Springer, 1996, pp. 422–437.

[20] R. Godin and P. Valtchev, Formal Concept Analysis-Based HierarchyDesign in Object-Oriented Software Development. Berlin/Heidel-berg, Germany: Springer, 2005, pp. 304–323.

[21] Y.-K. Kang, S.-H. Hwang, and K.-M. Yang, “FCA-based conceptualknowledge discovery in folksonomy,”World Acad. Sci., Eng., Technol.,2009.

[22] D. Vasumathi and D. A. Govardhan, “Efficient web usagemining basedon formal concept analysis,” J. Theoret. Appl. Inf. Technol., 2009.

[23] A. Maedche and S. Staab, “Ontology learning for the semantic web,”IEEE Intell. Syst., vol. 16, no. 2, pp. 72–79, Mar. 2001.

[24] P. Cimiano, Ontology Learning and Population From Text Algorithms,Evaluation and Applications. New York, NY, USA: Springer, 2006.

[25] G. Stumme, R. Wille, and U. Wille, “Conceptual knowledge discoveryin databases using formal concept analysis methods,” in Proc. 2nd Eur.Symp. Principles of Data Mining Knowl. Discov., 1998.

[26] E. Brown and K. Deffenbacher, Perception and the Senses. Oxford,U.K.: Oxford Univ. Press, 1979.

[27] G. Fazekas and M. Sandler, “Structural decomposition of recordedvocal performances and its application to intelligent audio editing,” inProc. 123rd Conv. Audio Eng. Soc., New York, NY, USA, Oct. 5–8,2007, 2007.

[28] M. Barthet andM. Sandler, “Time-dependent automatic musical instru-ment recognition in solo recordings,” in Exploring Music Contents. 7thInt. Symp., CMMR 2010, S. Ystad, M. Aramaki, R. Kronland-Martinet,and K. Jensen, Eds. Malaga, Spain: Springer, Jun. 2010, pp. 183–194.

[29] Y. Linde, A. Buzo, and R. M. Gray, “An algorithm for vector quantizerdesign,” IEEE Trans. Commun., vol. COM-28, no. 1, pp. 84–95, Jan.1980.

[30] S. Canu, Y. Grandvalet, V. Guigue, and A. Rakotomamonjy, “SVMand kernel methods Matlab Toolbox,” Perception Systemes et Infor-mation, INSA de Rouen, Rouen, France, 2005.

[31] M. Hagan and M. B. Menhaj, “Training feedforward networks with theMarquardt algorithm,” IEEE Neural Netw., vol. 5, no. 6, pp. 989–993,Nov. 1994.

[32] C. M. Bishop, Pattern Recognition and Machine Learning. NewYork, NY, USA: Springer, 2006.

[33] S. Essid, G. Richard, and B. David, “Musical instrument recognitionon solo performances,” in Proc. Eur. Signal Process. Conf., Sep.2004.

[34] N. D. Chetry, “Computer models for musical instrument identifica-tion,” Ph.D. dissertation, QueenMary Univ. of London, London, U.K.,2006.

[35] B. Ganter, G. Stumme, and R. Wille, Formal Concept Analysis Foun-dation and Applications. New York, NY, USA: Springer, 2005.

[36] B. Ganter, “Finger exercises in formal concept analysis,” DresdenICCL Summer School, Technishe Univ. Dresden, Jul. 2006.

Page 14: Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

2220 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2013

[37] B. Ganter and R. Wille, Conceptual Scaling, F. Roberts, Ed. Berlin,Germany: Springer-Verlag, 1989.

[38] U. Krohn, N. J. Davies, and R. Weeks, “Concept lattices for knowledgemanagement,” BT Technol. J., vol. 17, 1999.

[39] S. Holm, “A simple sequentially rejective multiple test procedure,”Scand. J. Statist., vol. 6, pp. 65–70, 1989.

[40] M. Barthet, P. Depalle, R. Kronland-Martinet, and S. Ystad, “Accous-tical correlates of timbre and expressiveness in clarinet performance,”Music Percept., vol. 28, no. 2, pp. 135–153, 2010.

[41] J. Cohen, Statistical Power Analysis for the Behavioral Sciences , 2nded. Hillsdale, NJ, USA: Lawrence Erlbaum Associates, 1977.

[42] K. Dellschaft and S. Staab, “On how to perform a gold standard basedevaluation of ontology learning,” Lecture Notes in Comput. Sci., vol.4273, pp. 228–241, 2006.

[43] A. Maedche and S. Staab, “Measuring similarity between ontologies,”Eng. Knowl. Manage.: Ontologies and the Semantic Web, vol. 2473/2002, pp. 15–21, 2002.

[44] P. A. and L. K. , “Constructing folksonomies from user-specified rela-tions on flickr,” in Proc. 18th Int. Conf. World Wide Web (WWW’09),2009.

[45] V. I. Levenshtein, “Binary codes capable of correcting deletions, in-sertions, and reversals,” Cybern. Control Theory, vol. 10, no. 8, pp.707–710, 1966.

Şefki Kolozali was born in 1984. He received theB.Sc. degree in Computer Engineering from theNear East University, Nicosia, Turkish Republic ofNorthern Cyprus, in 2005 and the M.Sc. in ElectronicCommerce Technology from University of Essex,Essex, UK, in 2008. He is currently pursuing thePh.D. degree at the School of Electronic Engineeringand Computer Science, Queen Mary, Universityof London with a thesis on automatic ontologygeneration based on semantic audio analysis. His re-search interests include semantic web technologies,

knowledge management, music information retrieval, and machine learningtechniques.

Mathieu Barthet was born in Paris, France, in1979. He received the M.Sc degree in Acous-tics, Aix-Marseille II University, Ecole CentraleMarseille, France, and the Ph.D. degree from theAix-Marseille II University in 2004 in the field ofmusical acoustics and signal processing. From 2007to 2008, he was a Teaching and Research Assistantat Aix-Marseille I University. Since 2009, he hasbeen a Postdoctoral Research Assistant at Centre forDigital Music, Queen Mary University of London,London, UK. He has taken part in several projects in

collaboration with the BBC, the British Library, and I Like Music.

György Fazekas is a post doctoral research assis-tant at Queen Mary University of London, workingat the Centre for Digital Music (C4DM), School ofElectronic Engineering and Computer Science. Hereceived his BSc degree at Kando Kalman Collegeof Electrical Engineering, now Budapest PolytechnicUniversity, Kando Kalman Faculty of Electrical En-gineering. He received anMSc degree at QueenMaryUniversity of London, and a subsequent PhD degreeat the same institution in 2012. His thesis titled Se-mantic Audio Analysis—Utilities and Applications

explores novel applications of semantic audio analysis, Semantic Web tech-nologies and ontology-based information management in Music InformationRetrieval and intelligent audio production tools. His main research interest in-cludes the development of semantic audio technologies and their application tocreative music production. He is working on extending audio applications withontology based information management. He is involved in several collabora-tive research projects, and he is a member of the AES, ACM and BCS.

Mark Sandler (SM’98) was born in 1955. Hereceived the B.Sc. and Ph.D. degrees from theUniversity of Essex, Essex, U.K., in 1978 and1984, respectively. He is a Professor of SignalProcessing at Queen Mary University of London,London, U.K., and Head of the School of Elec-tronic Engineering and Computer Science. Hehas published over 350 papers in journals andconferences. Prof. Sandler is a Fellow of theInstitute of Electronic Engineers (IEE) anda Fellow of the Audio Engineering Society.

He is a two-time recipient of the IEE A. H. Reeves Premium Prize.