Automatic Ontology Generation for Musical Instruments Based on Audio Analysis

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li><p>IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2013 2207</p><p>Automatic Ontology Generation for MusicalInstruments Based on Audio Analysis</p><p>efki Kolozali, Student Member, IEEE, Mathieu Barthet, Member, IEEE, Gyrgy Fazekas, andMark Sandler, Member, IEEE</p><p>AbstractIn this paper we present a novel hybrid system thatinvolves a formal method of automatic ontology generation forweb-based audio signal processing applications. An ontologyis seen as a knowledge management structure that representsdomain knowledge in a machine interpretable format. It de-scribes concepts and relationships within a particular domain,in our case, the domain of musical instruments. However, thedifferent tasks of ontology engineering including manual anno-tation, hierarchical structuring and organization of data can belaborious and challenging. For these reasons, we investigate howthe process of creating ontologies can be made less dependenton human supervision by exploring concept analysis techniquesin a Semantic Web environment. In this study, various musicalinstruments, from wind to string families, are classified usingtimbre features extracted from audio. To obtain models of theanalysed instrument recordings, we use K-means clustering todetermine an optimised codebook of Line Spectral Frequencies(LSFs), or Mel-frequency Cepstral Coefficients (MFCCs). Twoclassification techniques based on Multi-Layer Perceptron (MLP)neural network and Support Vector Machines (SVM) were tested.Then, Formal Concept Analysis (FCA) is used to automaticallybuild the hierarchical structure of musical instrument ontologies.Finally, the generated ontologies are expressed using the OntologyWeb Language (OWL). System performance was evaluated undernatural recording conditions using databases of isolated notes andmelodic phrases. Analysis of Variance (ANOVA) were conductedwith the feature and classifier attributes as independent variablesand the musical instrument recognition F-measure as dependentvariable. Based on these statistical analyses, a detailed comparisonbetween musical instrument recognition models is made to investi-gate their effects on the automatic ontology generation system. Theproposed system is general and also applicable to other researchfields that are related to ontologies and the Semantic Web.</p><p>Index TermsAutomatic ontology generation, instrumentrecognition, semantic web intelligence.</p><p>I. INTRODUCTION</p><p>I N recent years, theWorldWideWeb has gone through rapiddevelopment both technologically and in its popularity. Itbecame closely integrated with our lives. The Web exposes vast</p><p>Manuscript received May 14, 2012; revised October 23, 2012 and April 12,2013; acceptedApril 24, 2013. Date of publicationMay 17, 2013; date of currentversion August 09, 2013. This work was supported in part by the Networked En-vironment for Music Analysis (NEMA) project, U.K. Engineering and PhysicalSciences Research Council (EPSRC), the U.K. Engineering and Physical Sci-ences Research Council (EPSRC), and the Making Musical Moods MetadataTSB project. The associate editor coordinating the review of this manuscriptand approving it for publication was Prof. Bryan Pardo.The authors are with the School of Electronic Engineering and Com-</p><p>puter Science, Queen Mary University of London, London E1 4NS, U.K.(e-mail:;;; versions of one or more of the figures in this paper are available online</p><p>at Object Identifier 10.1109/TASL.2013.2263801</p><p>amounts of resources including music, photos, video and textcontained in unstructured web documents. However, these doc-uments cannot be interpreted and used by machines directly,since standard Hyper Text Markup Language (HTML) docu-ments do not provide machine readable information about theircontent. For this reason, the Web presents an important chal-lenge in knowledge management. The Semantic Web was con-ceived in order to resolve these issues by creating a web of ma-chine-interpretable data as an extension to the current Web. Theconcept of the Semantic Web was initially proposed by BernersLee [1] in order to enable search through explicit specificationsof meaning in the content of web pages. Creating ontologies toenable formalized description and linking of resources within aparticular application domain is among the first steps towardsbuilding a web of machine-interpretable data.The semantic interpretation of music audio analysis relies</p><p>heavily on the availability of formal structures that encoderelevant domain knowledge. Many research groups built on-tologies manually to represent different types of data ( data, social data) within the formation of the SemanticWeb. Some examples of ontologies in the music domain arethe music ontology (MO) and the music performance ontology,grounded in the MO [2], [3]. The use of ontological modelsto access and integrate knowledge repositories is an importantcontribution, improving knowledge-based reasoning and musicinformation retrieval (MIR) systems alike [4]. There is alsosignificant benefits for the discovery of cultural heritagesof exchanging data among diverse knowledge repositories,such as musical instrument museums, libraries, institutions orrepositories. However, knowledge management in the domainof musical instruments is a complex issue, involving a widerange of instrument characteristics, for instance, physical as-pects of instruments such as different types of sound initiation,resonators, as well as the player-instrument relationship1. Rep-resenting every type of sound producing material or instrumentis a very challenging task since musical instruments evolvewith time and vary across cultures. The domain of musicalinstruments is broad and dynamic, including both folkloricinstruments made out of non-manufactured items (e.g. bladesof grass or conch sells), and new instruments relying on hightechnology (e.g. iPad app). Although much work has been doneon instrument classification in organology, there is currentlyno classification scheme encompassing such complexity anddiversity [6]. Thus, there is a need for automated systemsthat overcome these problems in knowledge management andontology engineering.1More information, including detailed examples can be found in [5].</p><p>1558-7916 2013 IEEE</p></li><li><p>2208 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 10, OCTOBER 2013</p><p>Hence, as a first step to solve this issue with regards to audiocontent, we developed a hybrid system for generating the classhierarchy of an ontology, automatically relying on the acousticalanalysis of isolated notes and solo performances played on var-ious musical instruments. The system performs two main tasks:i) musical instrument recognition, and ii) the construction of in-strument concept hierarchies. In the first part, the hybrid systemuses a classifier, either aMulti-Layer Perceptron neural network,or Support Vector Machines, to model sounds produced by in-struments according to their instrument categories (e.g., violin)and their attributes (e.g. bowed) using content-based timbre fea-tures, to find the ratio of relationship among instrument cate-gories and attributes. In the second part, the output of the in-strument recognition system is processed using Formal ConceptAnalysis to construct a hierarchy of instrument concepts.As a possible use case scenario, the proposed system may be</p><p>used as a plugin tab for ontology editor applications, such asProtege, in order to facilitate the ontology development processfor ontology engineers. Due to the fact that different sounds canindicate different characteristics of instruments, the processingof information directly derived from audio content contains veryimportant clues for the retrieval and management of musical in-strument knowledge. Therefore, the proposed system may alsobe used to overcome issues in current musical instrument clas-sification schemes developed in organology and musicology.To the knowledge of the authors, the proposed audio analysis</p><p>based automatic ontology generation is the first of its kind. Thesystem is based on a general conceptual analysis approach andcan be applied to any research fields that deal with knowledgemanagement issues. We, therefore, believe that this study con-tributes to the theory of the Semantic Web intelligence as wellas audio music analysis, and it will enable an improved method-ical design of automatic ontology generation.The rest of the paper is organized as follows: previous works</p><p>related to the musical instrument recognition and automatic on-tology generation are described in the second section. Section IIIexplains the automatic ontology generation system and the al-gorithms used in this study. Section IV presents the statisticalanalyses of the musical instrument recognition system and theevaluation of the automatically generated ontologies. Finally, inthe concluding section, we remark on the importance of this re-search problem and outline our future work.</p><p>II. RELATED WORK</p><p>The aim of the proposed content-based automatic ontologygeneration system is to process information from audio fileswith a classification system and accurately extract the termi-nology related to musical instruments in order to analyze theconceptual structure and assist ontology engineers during theontology construction process. Since the proposed system in-volves two different research areas, we will review both musicalinstrument identification and conceptual analysis studies.</p><p>A. Music Instrument and Family Identification</p><p>To automate musical instrument identification, various ap-proaches have been developed based on isolated notes, solo per-formances, or complex mixtures. The latter case is still in its</p><p>early days (see [7] for a thorough review). The results obtaineddepend on three main factors: the databases used during thelearning and testing stages, the features selected to characterizethe timbre of the instruments, and the classification methods.The isolated note or solo performances present an advantage</p><p>of simplicity and tractability, since there is no need to sepa-rate the sounds from different instruments. For example, Chtryet al. [8] proposed a system based on Line Spectral Frequen-cies (LSF), which are derived from a linear predictive anal-ysis of the signal and represent well the formant structure ofthe spectral envelope. The instrument identification unit of oursystem is based on this model. K-means clustering is used toconstruct a collection of LSF feature vectors, called codebook(due to the use of LSF features in speech data compression).The principle of K-means clustering is to partition a n-dimen-sional space (here the feature space) into K distinct regions (orclusters), which are characterized by their centres (called code-vectors). During the training stage, the K-means clustering isapplied on the LSF feature vectors extracted from several instru-ment recordings (isolated notes or solo performances). Duringthe testing stage, the K-means clustering is applied on the LSFfeature vectors extracted from the instrument recording to beidentified. The collection of the K codevectors (LSF vectors)constitutes a codebook, whose function, within this context, is tocapture the most relevant features to characterize the timbre ofan audio signal segment. The classification decisions are madeby finding which instrument minimizes the Euclidean distancebetween the LSF codebook associated with the audio sample tobe predicted and the LSF codebook associated with the instru-ment (training stage). The system achieved 95% performanceon a dataset comprising 4415 instrumental sound instances.In another study [9], Vincent and Rodet proposed a system</p><p>based on Gaussian Mixture Models (GMM) which was trainedand tested on isolated notes and solo recordings. The datasetwas gathered by extracting 2 excerpts of 5 seconds from eachof the 10 solo recordings used in the experiment. This approachyielded up to 90% of accuracy. Essid et al. [10], proposed asystem tested on a relatively large dataset. The same classifi-cation technique, GMM, was compared to Support Vector Ma-chines (SVM) with different audio features. Their system ob-tained a 12% performance improvement compared to a systembased on the SVM classifier, leading up to 87% of accuracyfor 0.5 s-long audio segments. Furthermore, the performance oftheir system increased from 6% points up to 93% of accuracy,using SVM on 5 s-long audio segments.However, there are only a few studieswhere instrument recog-</p><p>nition produces a hierarchical instrument structure. For example,Martin [11] proposed a system which was based on three dif-ferent hierarchical levels: 1) pizzicato (plucked) and sustainedsounds, 2) instrument families such as strings, woodwinds, andbrass 3) individual instruments for the corresponding instrumentfamilies. The recognition rate obtained with this system was90% for instrument family and 70% for individual instruments,while the dataset consisted of 1023 solo tones samples from 15instruments. Other hierarchical systems have been developedsince then by Eronen [12], Kitahara et al. [13] and Peeters [14].The overall correct identification rate of these systems are in therange of 35% to 80% for individual instruments, and 77% to 91%</p></li><li><p>KOLOZALI et al.: AUTOMATIC ONTOLOGY GENERATION FOR MUSICAL INSTRUMENTS BASED ON AUDIO ANALYSIS 2209</p><p>for instrument family recognition. In general, the problem withhierarchical classification systems is that the errors at each levelpropagate increasingly to the other levels of the hierarchy.</p><p>B. Conceptual Analysis</p><p>Creating a class hierarchy is an important aspect of ontologydesign. Establishing such a hierarchy is a difficult task that isoften accomplishedwithout any clear guidance and tool support.Yet, the most commonly used hierarchical data mining tech-niques such as Hierarchical Agglomerative Clustering [15] andDecision Trees [16] do not take into account the relationshipsbetween objects. Therefore they do not provide an applicablesolution to knowledge representation issues and the multi-re-lational hierarchical design of ontology systems. This problembecomes even more apparent considering the multi-relationalnature of musical data.On the other hand, Formal Concept Analysis (FCA) allows</p><p>to generate and visualise the hierarchies relying on the relation-ships of objects and attributes. FCA, also known as concept lat-tice, was first proposed by GermanmathematicianWille in 1982[17]. It has been used in many software engineering topics suchas the identification of objects in legacy code, or the identifi-cation and restructuring of schema in object-oriented databases[18]. These works are important since ontologies provide thebasis for information and database systems [19]. Various spec-ification techniques for hierarchical design in object-orientedsoftware development have been proposed in [20...</p></li></ul>


View more >