Automatic Mood Classi cation of Indian Popular MusicAutomatic Mood Classi cation of Indian Popular Music Dissertation Submitted in partial ful llment of the requirements for the degree

Automatic Mood Classification of Indian

Popular Music

Dissertation

Submitted in partial fulfillment of the requirements

for the degree of

Master of Technology, Computer Engineering

by

Aniruddha M. Ujlambkar

Roll No: 121022001

under the guidance of

Prof. V. Z. Attar

Department of Computer Engineering and Information Technology

College of Engineering, Pune

Pune - 411005.

June 2012

Dedicated to

my mother, Smt. Manasi M. Ujlambkar, who has always emphasized the

importance of education, discipline, integrity and has been a constant source of

inspiration for me, my entire life

and

my father, Shri. Mukund K. Ujlambkar, who has always been my role model

for hard work, persistence, patience and always supported me open heartedly in

all my endeavors.

DEPARTMENT OF COMPUTER ENGINEERING AND

INFORMATION TECHNOLOGY,

COLLEGE OF ENGINEERING, PUNE

CERTIFICATE

This is to certify that the dissertation titled

Automatic Mood Classification of IndianPopular Music

has been successfully completed

By


(121022001)

and is approved for the degree of

Master of Technology, Computer Engineering.

Prof. V. Z. Attar, Dr. Jibi Abraham,

Guide, Head,

Department of Computer Engineering Department of Computer Engineering

and Information Technology, and Information Technology,

College of Engineering, Pune, College of Engineering, Pune,

Shivaji Nagar, Pune-411005. Shivaji Nagar, Pune-411005.

Date :

Abstract

Music has been an inherent part of human life when it comes to recreation; en-

tertainment and much recently, even as a therapeutic medium. The way music is

composed, played and listened to has witnessed an enormous transition from the

age of magnetic tape recorders to the recent age of digital music players streaming

music from the cloud. What has remained intact is the special relation that music

shares with human emotions. We most often choose to listen to a song or music

which best fits our mood at that instant. In spite of this strong correlation, most

of the music softwares present today are still devoid of providing the facility of

mood-aware play-list generation. This increases the time music listeners take in

manually choosing a list of songs suiting a particular mood or occasion, which can

be avoided by annotating songs with the relevant emotion category they convey.

The problem, however, lies in the overhead of manual annotation of music with

its corresponding mood and the challenge is to identify this aspect automatically

and intelligently.

The study of mood recognition in the field of music has gained a lot of mo-

mentum in the recent years with machine learning and data mining techniques

contributing considerably to analyze and identify the relation of mood with mu-

sic. We take the same inspiration forward and contribute by making an effort to

build a system for automatic identification of mood underlying the audio songs by

mining their spectral, temporal audio features. Our focus is specifically on Indian

Popular Hindi songs. We have analyzed various data classification algorithms in

order to learn, train and test the model representing the moods of these audio

songs and developed an open source framework for the same. We have been suc-

cessful to achieve a satisfactory precision of 70% to 75% in identifying the mood

underlying the Indian popular music by introducing the bagging (ensemble) of

random forest approach experimented over a list of 4600 audio clips.

iii

Acknowledgments

I express my deepest gratitude towards my guide Prof. V. Z. Attar for

her constant help and encouragement throughout the project work. I have been

fortunate to have a guide who gave me the freedom to explore on my own and at

the same time helped me plan the project with timely reviews and constructive

comments, suggestions wherever required. A big thanks to her for having faith

in me throughout the project and helping me walk through the new avenues of

research papers and publications.

I would like to thank Prof. A. A. Sawant, for the continuous support and

encouragement he extended through the enthusiastic discussions he used to have

very often with us and the insightful thoughts and ideas he used to share. I also

take this opportunity to thanks all those teachers, staff and colleagues who

have constantly helped me grow, learn and mature both personally and profes-

sionally throughout the process.

A BIG thanks goes to my dearest friends who have always supported, guided

and even criticized me, always for the right reasons and have helped me stay sane

throughout this and every other chapter of my life. I greatly value their friendship

and deeply appreciate their belief in me. Special thanks to all the new friends from

M.Tech. I have made without whom the journey wouldn’t have been so interesting

and memorable!

Most importantly, none of this would have happened without the love and

patience of my family - my parents, to whom this dissertation is dedicated. I

would like to express my heart-felt gratitude to my family.


College of Engineering, Pune

June 2012

iv

Contents

Abstract iii

Acknowledgments iv

List of Figures viii

List of Tables ix

1 Introduction 1

1.1 Music and Mood . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Introduction to music features . . . . . . . . . . . . . . . . . . . . . 2

1.3 Music and Data Mining . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Thesis Objective and Scope . . . . . . . . . . . . . . . . . . . . . . 5

1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Literature Survey 7

2.1 Music Mood Model and Audio Features . . . . . . . . . . . . . . . . 7

2.2 Music classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Music Mood Model 12

3.1 Music Mood Relation . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Mood(Emotion) Models . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.1 Hevner’s experiment . . . . . . . . . . . . . . . . . . . . . . 13

3.2.2 Russell’s model . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2.3 Thayer’s model . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2.4 Indian Classical model: Navras . . . . . . . . . . . . . . . . 16

4 Audio Features 17

4.1 Low level Audio Features . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2 Feature List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Mining Mood from Audio Features 22

5.1 Overview of Data Mining . . . . . . . . . . . . . . . . . . . . . . . . 22

5.2 Overview of Data Mining functionalities . . . . . . . . . . . . . . . 23

5.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.3.1 Classification using Decision-tree . . . . . . . . . . . . . . . 26

5.3.2 Random Forest Classification . . . . . . . . . . . . . . . . . 28

5.4 Random Forest Highlights . . . . . . . . . . . . . . . . . . . . . . . 29

5.5 Our Approach: Bagging of Random Forests . . . . . . . . . . . . . 30

5.5.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6 Mood Identification System 32

6.1 Mood Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.3 System Design and Components . . . . . . . . . . . . . . . . . . . . 34

6.3.1 Audio Pre-processor . . . . . . . . . . . . . . . . . . . . . . 34

6.3.2 Audio Feature Extractor . . . . . . . . . . . . . . . . . . . . 35

6.3.3 Mood Identification System . . . . . . . . . . . . . . . . . . 36

7 Experiments and Results 38

7.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.1.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.1.2 Data pre-processing . . . . . . . . . . . . . . . . . . . . . . . 39

7.1.3 Training and Testing . . . . . . . . . . . . . . . . . . . . . . 39

7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

7.2.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 40

8 Applications 45

8.1 Music Therapy Applications . . . . . . . . . . . . . . . . . . . . . . 45

8.2 Music Information Retrieval . . . . . . . . . . . . . . . . . . . . . . 45

8.3 Intelligent Automatic Music Composition . . . . . . . . . . . . . . . 46

9 Conclusion and Future Work 47

9.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

10 Project Milestones 49

10.1 Project Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

vi

10.2 Publications’ status . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Bibliography 51

vii

List of Figures

3.1 Hevner’s Mood Model . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Russell’s Mood Model . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 Thayer’s Mood Model . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Navras: Indian Classical emotion model . . . . . . . . . . . . . . . 16

4.1 Audio Features Taxonomy . . . . . . . . . . . . . . . . . . . . . . . 18

5.1 Data Mining in Knowledge discovery . . . . . . . . . . . . . . . . . 22

5.2 Data Mining disciplines . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.3 Classification process . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.4 Classification using Decision Tree . . . . . . . . . . . . . . . . . . . 27

6.1 Mood Recognition System . . . . . . . . . . . . . . . . . . . . . . . 34

6.2 Mood Detection System: Detailed Design . . . . . . . . . . . . . . . 35

7.1 Area under ROC statistics . . . . . . . . . . . . . . . . . . . . . . . 42

7.2 Recall statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7.3 Precision statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7.4 F-measure statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

List of Tables

6.1 Mood Model: Indian popular Hindi music . . . . . . . . . . . . . . 33

7.1 Experimental Results on Test Dataset of 2938 music clips . . . . . . 44

7.2 Experimental Results on Test Dataset of 2938 music clips . . . . . . 44

10.1 Weekly Schedule of Project Starting 1 July, 2011 . . . . . . . . . . 49

10.2 Paper publications’ status . . . . . . . . . . . . . . . . . . . . . . . 50

Chapter 1

Introduction

1.1 Music and Mood

THE well-known German philosopher Friedrich Nietzsche” once quoted a famous

line: “Without music, life would be a mistake”. Music has always been an inherent

part of recreation of human life. Music is not just useful for entertainment, but

studies have shown that listening the right music does play an important role in

healing, rejuvenating and even inspiring human mind in difficult conditions such as

is widely studied and demonstrated by the field of Music Therapy [27]. With the

rapidly increasing technology and the new advent in latest multimedia gadgets,

music has reached almost every individual’s personal gadget may it be a laptop,

music player or a cell phone. The music which in the olden days was limited

to live concerts, performances or radio broadcasts is now available at everyone’s

finger tips within few clicks. Music has thus become very easily accessible and

available. However, the music database is ever increasing and the list will go so

long that it would not be wrong to say that we might hear a couple or more

completely new and never-heard music pieces every single day. Today, the overall

music collection can count to a few millions of records in the whole world and still

continue to increase every day. With so much of variety of music easily available,

we humans do not always listen to the same type of music all the time. We have

our interests, favorite artists, albums, music type. To put simply, we have our

personal choices and more importantly, even our choice might differ from time to

time. This choice is very much naturally governed by our emotional state at that

particular instant. The relation between musical sounds and their influence on the

listeners’ emotion has been well studied and is evident from the much celebrated

papers such as that of Hevner [18] and Farnsworth [13]. The papers described

experiments which eventually substantiated a hypothesis that music inherently

1

1.2 Introduction to music features

carries emotional meaning.

Currently we can store, sort and retrieve our digital music files based on various

traditional music classification tags like Artist, Band(Group), Album, Movie and

Genre. However, choosing a song or music piece suiting our mood from a large

database is difficult and time consuming, since each of the mentioned parameter

cannot sufficiently convey the emotional aspect associated with the song. What

we need is an additional parameter or rather search filter here - Mood - which

signifies the emotion of that particular music piece. However, classifying music as

per its mood is a much harder task. The main reasons being:

• First, emotion or mood of music is very subjective. Human mood, surround-

ing environment, personality, cultural background can have an influence on

the perceived emotion of a particular music.

• Second, the adjectives describing emotion can be ambiguous. For instance

happy and refreshing may refer to the same song.

• Third, it is inexplicable as to how music arouses emotion. What intrinsic

quality of music, if any, creates a specific emotional response is still far from

well-understood [6].

1.2 Introduction to music features

As it is a well established fact that music indeed has an emotional quotient at-

tached with it, it is very essential to know what are the intrinsic factors present

in music which associate it with a particular mood or emotion. A lot of research

has been done and still going on in capturing various features from the audio file

based on which we can analyze and classify a list of audio files. Audio features

are nothing but mathematical functions calculated over the audio data, in order

to describe some unique aspect of that data. In the last decades a huge number

of features were developed for the analysis of audio content. Dalibor Mitrovic and

team [7] have analyzed various state-of-the-art audio features useful for content-

based audio retreival.

Audio features were initially studied and explored for application domains like

speech recognition [29]. With upcoming novel application areas, the analysis of

music and general purpose environmental sounds gained importance. Different

research fields evolved, such as audio segmentation, music information retrieval

(MIR), and environmental sound recognition (ESR). Each of these areas devel-

oped its specific description techniques (features). Many audio features have been

2

1.3 Music and Data Mining

proposed in the literature for music classification. Different taxonomies exist for

the categorization of audio features. Weihs et al. [40] have categorized the audio

features into four subcategories, namely short-term features, long-term features,

semantic features, and compositional features. Scaringella [33] followed a more

standard taxonomy by dividing audio features used for genre classification into

three groups based on timbre, rhythm, and pitch information, respectively. Each

taxonomy attempts to capture audio features from certain perspective. Instead

of a single-level taxonomy, Zhouyu Fu and team [42] unify the two taxonomies

and present a hierarchical taxonomy that characterizes audio features from dif-

ferent perspectives and levels. From the perspective of music understanding, we

can divide audio features into two levels, low-level and mid-level features along

with top-level labels. Low-level features can be further divided into two classes

of timbre and temporal features. Timbre features capture the tonal quality of

sound that is related to different instrumentation, whereas temporal features cap-

ture the variation and evolution of timbre over time. Low-level features are the

basis description of the audio data, for instance, tempo, beats per minute and so

on. On the contrary, mid-level features are derived by using these basic features

to provide the music related technical understanding such as rhythm, pitch which

in turn is perceived by the humans as genre, mood, which form the top-level of

the taxonomy.

A wide range of audio features have been studied by specialists and experts

over the past many years and many of the features have been even standardized for

instance MPEG7 standards [28] which provide a list of low level audio descriptors

(features) and techniques and tools to extract the same. Audio feature extraction

process involves a lot of complex mathematical and signal processing to convert

the digital audio data into meaningful features represented by numbers (fixed

or variable dimensions). To name a few, following are examples of some low-level

audio features :- zero-crossing rate, magnitude spectrum, spectral centroid, spectral

roll-off and many more which will be discussed in detail in coming chapters.

With the increasing standardization and research in audio features, an effort

is always made to to obtain features, which are orthogonal and which provide

descriptions with a high variance for the underlying data.

1.3 Music and Data Mining

Data Mining is relatively young and interdisciplinary field in computer science

which deals with analyzing and discovering interesting and useful patterns from

3

1.4 Motivation

large data sets. This field involves various tasks for analyzing data out of which

the most important task relevant in the context of our work is :- Classification.

Classification task involves generalizing a known structure or pattern among the

available data already assigned some specific class or label. This generalized pat-

tern can then be used to predict the class of a new and unknown data. On the

contrary, Clustering is also a data mining task of discovering groups and struc-

tures of data which are in some way similar to each other and differ in similar

way from other groups, without any prior knowledge of the structure of the data.

Music mood detection fits the criteria for a data mining problem as we have a huge

number of music pieces each with few hundred audio features associated with it.

Music emotion detection and classification has been studied and researched

before. Initially most of them adopted pattern recognition approach. Wang et

al. [39] extracted features from MIDI files and used a support vector machine

(SVM) to classify music into six classes: joyous, robust, restless, lyrical, sober,

and gloomy. High classification accuracy was reported; however, one cannot easily

transcribe real world music into symbolic form, as done in MIDI files. Li et

al. [21] divided emotion into thirteen categories and combined them into six classes.

Then, they adopted MARSYAS [25] in their system to extract music features from

acoustic data and used SVM to train and recognize music emotion. Liu et al. [22]

presented a hierarchical mood recognition system, which uses a Gaussian mixture

model (GMM) to represent the feature dataset and a Bayesian classifier to classify

music clips. Byeong-jun Han proposed a music emotion recognition system using

support vector regression and using the Thayers emotion model [37].

Overall, may it be genre classification, instrument classification or even mood

classification, data mining techniques, especially classification techniques have

proved much effective in analyzing music and categorizing it.

1.4 Motivation

The way we choose a song to listen is very much restricted by what search options

we are provided with by the underlying software of the music device. Today we

can search a song by tags like Name, Artist, Album, and Genre. After years of

research and study, it was an established fact that human emotions are influenced

by music. Hence, it was high time to introduce a parameter Mood for annotating

an audio file so that users can search the list of songs relating to their mood at that

instant. The idea is great, but the question is, how the music will be annotated

with this Mood tag? It has been observed that the search tags (like Genre/Mood)

4

1.5 Thesis Objective and Scope

are most of the times edited manually. This is prone to lot of human errors and

needs a better solution.

This is where we intend to contribute so that the mood can be automatically

detected for a given audio file and no manual intervention needed to annotate

a song with particular mood. This can not only reduce manual effort, but also

minimize to a certain extent the human errors associated with it. This in turn

can thus help users to organize their music according to their moods instead of

remembering and searching for a particular artist or album name that contained

the song of that particular mood. More interesting fact observed was, a great deal

of work has been done on non-Indian music so far. So we find it a challenging

task to see how we can utilize the relevant work done till now and take it further

to Indian music mood recognition with an intention to provided a much enriched

user experience but also contribute for the good of digital music community.

1.5 Thesis Objective and Scope

Most of the experimentation done in the field of music mood recognition has been

observed with respect to non-Indian music. Music being subjective to cultural

backgrounds, it is but natural that Indian Music might need a different treatment

as compared to non-Indian music. Our goal is to develop a music emotion recogni-

tion system for Indian popular music by analyzing the relation of timbral, spectral

and temporal features of audio file with the emotion represented by the audio file.

The main goals of this thesis can be stated as:-

a. To build an Automatic mood recognition system for Indian popular songs.

b. To develop an open-source framework that can help analyze and experiment

music data with various machine learning, data mining algorithms.

In order to achieve these ultimate goals, we have laid down a list of sub-goals

that we together help achieve the end objective:

a. Identifying the various moods associated with Indian popular music and

finalize a mood model.

b. Identifying and finalizing the set of audio features important from mood

perspective.

c. Identifying and Developing tools required for extracting audio features.

5

1.6 Thesis Outline

d. Identifying and developing the Data Mining technique to construct the mood

classification model.

e. Design, implementation and testing of the framework integrating the whole

process of Mood classification and prediction.

The scope of the work is limited to Indian popular Hindi music.

1.6 Thesis Outline

The rest of the paper is organized as follows: In Chapter 2 we give a brief descrip-

tion of the important papers and literature that we have studied or utilized as a

part of our literature survey. In Chapter 3, we discuss the about our mood model.

Chapter 4 explains the various features associated with music and the feature set

important from the perspective of this project. In Chapter 5, Data mining and

its use in mining mood from music is discussed. Chapter 6 puts forth the overall

design of the Music Mood identification system followed by Chapter 7 discussing

the experiments and corresponding results obtained for the performance of the

system. Chapter 8 gies an overview of the possible applications and use of this

project. Chapter 9 outlines the conclusion and future work related to this project.

Finally chapter 10 lists the various project milestones and the publications’ status

that resulted during the course of the project.

6

Chapter 2

Literature Survey

The research and study behind this topic can be subdivided into 3 different sub-

fields:

• First: Mood model, which involves identifying and defining the list of

adjectives precisely describing all possible moods.

• Second: Audio features identification and extraction, which involves

identifying and extracting the essential features from an audio file applying

signal processing algorithms and techniques in order to analyze the file.

• Third: Mining, machine learning algorithm, which involves learning

and choosing the appropriate algorithm(s) that help to mine efficiently the

music datasets with substantial accuracy.

2.1 Music Mood Model and Audio Features

Various experts in the fields of psychology, musicology have come with models

describing human emotions. One of the most ancient of the experiments done by

Hevner [18] helped in categorizing various adjectives into 8 different groups each

representing a class of mood. The model was more of a categorical model wherein

a list of adjectives representing same emotion were grouped together. Russell [31]

later came up with the circumplex model representing human emotions on a circle

with each mood category plotted within the circle separated from other cate-

gories along the polar co-ordinates. Thayer [37] too came up with a dimensional

model plotted along two axes (Stress versus energy) with mood represented by

a two-dimensional co-ordinate system and lying on either of the two axes or the

four quadrants formed by the two-dimensional plot. Details of these models are

discussed in further chapters.

7

2.1 Music Mood Model and Audio Features

JungHyun Kim and team [20] proposed an Arousal-Valence (A-V) based mood

classification model for music recommendation system. The collected music mood

tags and A-V values from 20 subjects were analyzed and the A-V plane was clas-

sified into 8 regions depicting mood by using k-means clustering algorithm. Their

work shows that some regions on the A-V plane can be identified by representa-

tive mood tags like previous mood models, but some mood tags are overlapped in

almost all regions.

Akase and group [1] discuss an approach for the feature extraction for audio

mood classification. In this task the timbral information has been widely used,

however many musical moods are characterized not only by timbral information

but also by musical scale and temporal features such as rhythm patterns and

bass-line patterns. Their paper proposed the extraction of rhythm and bass-line

patterns, and these unit pattern analysis are combined with statistical feature

extraction for mood classification. In combination with statistical features includ-

ing MFCCs and musical scale feature, the effectivity of the features was verified

experimentally.

McKay and team [24] developed ”jAudio” an open-source audio feature extrac-

tion framework that includes implementations of 26 core features, including both

features proven in MIR research and more experimental perceptually motivated

features. jAudio places an even greater emphasis on implementations of meta-

features and aggregators that can be used to automatically generate many more

features from core features (for instance , standard deviation, derivative, running

mean etc.) that can be useful for music analysis. The tool has been quite useful

and widely accepted for music analysis research.

Dalibor Mitrovic and team’s work [7] deals with the statistical data analysis of

a broad set of state-of-the-art audio features and low-level MPEG-7 audio descrip-

tors. The investigation comprises of data analysis to reveal redundancies between

state-of-the-art audio features and MPEG-7 audio descriptors. The work employs

Principal Components Analysis, which reveals low redundancy between most of

the MPEG-7 descriptor groups. However, there is high redundancy within some

groups of descriptors such as the BasicSpectral group and the TimbralSpectral

group. Redundant features capture similar properties of the media objects and

should not be used in conjunction. The paper provides a good insight on the

choice of audio features for analysis.

8

2.2 Music classification


Doris Baum and team introduce EmoMusic [3] in their paper which presents a user

study on the usefulness of the “PANAS-X” emotion descriptors as mood labels

for music. It describes the attempt to organize and categorize music according

to emotions with the help of different machine learning methods, namely Self-

organizing Maps and Naive Bayes, Random Forest and Support Vector Machine

classifiers. The study showed that emotions may very well be derivable in an

automatic way, although the procedure certainly can be refined further. Naive

Bayes and Random Forest classifiers can, for instance, be used to predict the

emotion of a piece of music with reasonable success.

Z. Fu and team [14] provide a comprehensive review on audio-based classifica-

tion in their paper. It systematically summarizes the state-of-the-art techniques

for music classification along with the recent progress information in this field.

This survey emphasizes on recent development of the techniques and discusses

several open issues for future research. The survey has provided an up-to-date

discussion of audio features and classification techniques used in the literature. In

addition, the individual tasks for music classification and annotation also reviewed

and identified both task-specific issues.

K. C. Dewi and A. Harjoko [10] put forth the music classification system based

on mood parameters using K-Nearest Neighbor classification method and Self

Organizing Map. The mood parameters used is based on Robert Thayer’s energy-

stress Model. Features that are used are rhythm patterns of the music. Classifi-

cation of music based on mood parameters by the method of K-Nearest Neighbor

and Self Organizing Map with 30 songs reached an accuracy of 66.67%. Classifica-

tion of music based on mood parameters with 120 songs reached 73.33% accuracy

for K-Nearest Neighbor methods and 86.67% for Self-Organizing Map method.

B. Han and group [16] proposed SMERS - A Music Emotion Recognition

using Support Vector Regression. In their proposed paper, automatic emotion

recognition of music has been evaluated using various machine learning classifica-

tion algorithms such as SVM, SVR and GMM with remarkable increase in accu-

racy using SVR as compared to GMM. For further research, the paper suggests

more perceptual features should be considered and other classification algorithms

such as fuzzy and kNN (k-Nearest Neighbor). The paper also suggests comparing

the result of machine learning based emotion recognition with human performed

arousal/valence data.

Chia-Chu Liu and team [6] presented an emotion detection and classification

9


system for pop music. The system extracts feature values from the training music

files by PsySound2 and generates a music model from the resulting feature dataset

by a classification algorithm. The system is designed using a hierarchical frame-

work followed by an accuracy enhancement mechanism. The experimental results

show that the system gives satisfactory performance. Furthermore, the system

aims at popular music, so it can be applied to public music database software to

provide emotion-based search. The features that affect the perception of emotion

are associated with frequency centroid, spectral dissonance and pure tonalness.

The paper suggests finding out the deeper relation between these features and

music emotion in order to have a more accurate music mood classification.

T. Li and M. Ogihara [21] discussed SVM-based multi-label classification method

for two problems: classification into the thirteen adjective groups and classification

into the six super-groups. The experiments showed an overall low performance

which can be attributed to the fact that there were numerous borderline cases

for which the labeler found it difficult to make decision. Experiments show that

emotion detection is a rather difficult problem and improvement of performance

is the immediate issue. This can be resolved by: expanding the sound data sets,

collecting labeling in multiple rounds.

Trung-Thanh Dang and Kiyoaki Shirai [8] proposed the classification of moods

of songs based on lyrics and meta-data, and proposed several methods for super-

vised learning of classifiers. The training data was collected from a LiveJournal

blog site in which each blog entry is tagged with a mood and a song. Then three

kinds of machine learning algorithms were applied for training classifiers: SVM,

Nave Bayes and Graph-based methods. The results showed the accuracy of mood

classification methods is not good enough to apply for a real music search engine

system. There are two main reasons: mood is a subjective meta-data; lyric is short

and contains many metaphors which only human can understand. The authors

hence planned to integrate audio information with lyric for further improvement.

As per the contribution of Atin Das and Pritha Das [9], explained in their

paper, some of the prevailing classifications of Indian songs were quantified by

measuring their fractal dimension. Samples were collected from three categories:

Classical, Semi-classical, and Light. After appropriate processing, the samples

were converted into time series datasets and their fractal dimension was computed.

The analysis presented here can be generalized to categorize different types of

songs. Samples can be chosen from playing a prerecorded song or directly from the

recorder device. Samples would be filtered to remove sounds from accompanying

musical instruments to get only the sound of the voice. In the present case this was

10

2.3 Summary

done manually and the length of music pieces used was not sufficient to accurately

classify the songs.

2.3 Summary

The literature survey helped us gain a better insight with reference to the mood

analysis of music, various techniques used for the same along with their current

performance limitations and corresponding improvement suggestions given by re-

spective authors. It is clearly evident that lots of serious work has been going

on for automatic mood identification and music analysis. Observing the work

done so far, it is seen that Data Mining and Machine Learning techniques have

played a good deal of part in learning music data. The fact still remains; the

accuracy achieved so far needs more improvement from learning and identification

perspective which calls for better algorithms and techniques. It is also seen that

classification techniques have been much prevalent and performed well in mining

music data as compared to clustering techniques and we too prefer the former.

The striking and most important finding from the survey, which is worth noting,

is that much of the music research has been done on non-Indian music. Although

some work has been done on Indian Classical music, but not been explored to

an extent as much as compared non-Indian Music with respect to mood. Indian

Popular Music accounts for almost 72% of the music sales in India, which shows

its immense popularity among the people. Identifying the lack of mood-based

categorizers and the growing popularity and use of Indian popular music, we take

this opportunity to develop an automatic mood recognition system for Indian pop-

ular music by analyzing existing classification mining techniques and developing a

novel approach to automatically categorize the songs belonging to Indian Popular

Music, according to their underlying mood.

11

Chapter 3

Music Mood Model

Most of the literature dealing with music and psychology tells us that music mood

is subjective and the mood of the same music piece can be interpreted differently

by different individual. However, it is seen that there are considerable agreements

about the moods underlying the music belonging to similar cultural context [30].

Thus music belonging to similar cultural background has a better chance of con-

sensus among the individuals in interpreting the mood of the song. Our work

limits the scope to India popular music which falls under a common cultural con-

text thus increasing the chances of similar interpretations of the music among the

individuals, when it comes to understanding mood.

In order to classify songs according to their mood, its essential to identify

the list of moods which a song can be categorized into. This chapter explores

the various mood models that have been proposed and proven constructive in

categorizing music as per the emotions.

3.1 Music Mood Relation

Music psychology studies on music mood have a number of fundamental general-

izations that can benefit MIR research as mentioned below:

• There does exist mood effect in music and studies have confirmed the ex-

istence of functions of music which can change people’s mood [5]. Also, it

comes naturally to associate mood labels with the music the listeners listen

to [36]

• Not all moods are equally likely to be aroused by listening to music. For

instance emotions like sadness, happiness, peace have a very high probability

of getting induced through music as compared to that of anger or disgust.

12

3.2 Mood(Emotion) Models

• There do exist uniform mood effects among different people. Sloboda and

Juslin [36] summarized that listeners are often consistent in their judgment

about the emotional expression of music.

• There is definitely some correspondence between listeners judgment regard-

ing mood and the musical parameters such as tempo, rhythm, dynamics,

pitch, mode, beats, harmony etc. [36]. People do relate to the tune or

rhythm of a song and do most of the times hymn the tune.


Mood Models are generally studied by two approaches:-

• Categorical approach: This introduces distinct classes of moods which form

the basis for all other possible emotional variations.

• Dimensional approach: This classifies emotions along several axes such as

valence (pleasure), arousal (activity), potency (dominance) and so on. This

is generally the most commonly used approach in music applications

Human psychologists have done a great deal of work and proposed a number

of models on human emotions. Musicologists have too adopted and extended a

few of the influential models that we will be navigating through. The six universal

emotions defined by Ekman [12]: anger, disgust, fear, happiness, sadness, and

surprise, are well known in psychology. However, since they were designed for

encoding facial expressions, some of them may not be suitable for music (for

instance, disgust), and some common music moods are missing (for instance, calm

or soothing).

3.2.1 Hevner’s experiment

In music psychology, the earliest and still best known systematic attempt at creat-

ing music mood taxonomy was by Kate Hevner [18]. Hevner examined the affective

value of six musical features such as tempo, mode, rythm, pitch , harmont and

melody and studied how they relate to mood. Based on the study 67 adjectives

were categorized into eight different emotional groups with similar emotions. The

Figure 3.1 shows the emotional groups with adjectives belonging to each group.

13


merry

joyous

gay

happy

cheerful

bright

humorous

playful

whimsical

fanciful

quaint

sprightly

delicate

light

graceful

lyrical

leisurely

satisfying

serene

tranquil

quite

soothing

dreamy

yielding

tender

sentimental

longing

yearning

pleading

plaintive

pathetic

sad

mournful

tragic

melancholy

frustrated

depressing

gloomy

heavy

dark

vigorous

robust

emphatic

martial

ponderous

majestic

exalting

exhilarated

triumphant

dramatic

passionate

sensational

agitated

excited

impetuous

restless

spiritual

lofty

inspiring

dignified

sacred

solemn

sober

serious

Figure 3.1: Hevner’s Mood Model

3.2.2 Russell’s model

Both Ekmans and Hevners models belong to Categorical Model” because the mood

spaces consist of a set of discrete mood categories.On the contrary, James Rus-

sell [31] came up with a circumplex model of emotions arranging 28 adjectives in

a circle on two-dimensional bipolar space (arousal - valence). This model helped

in separating and keeping away the opposite emotions. Figure 3.2 depicts the

Russell’s model which has been adopted in a considerable number of musical psy-

chology studies [31] [35] [38].

3.2.3 Thayer’s model

Yet another well-known dimensional model was proposed by Thayer [37]. It de-

scribes the mood with two factors: Stress dimension (happy/anxious) and Energy

dimension (calm/energetic), and divides music mood into four clusters according

to the four quadrants in the two-dimensional space: Contentment, Depression, Ex-

uberance and Anxious (Frantic). In this model, Contentment refers to happy and

calm music; Depression refers to calm and anxious music; Exuberance refers to

happy and energetic music; and Anxious (Frantic) refers to anxious and energetic

music. Such definitions of the four clusters are clear and with high discriminatory

power. Such a dimensional mood model which divides the whole music emotion

14


ArousedAstonished

Excited

Delighted

Happy

Pleased

Glad

SereneContent

Satisfied

Calm

At ease

Relaxed

SleepyTiredDroopy

Bored

Depressed

Sad

Gloomy

Miserable

Frustrated

Distressed

Annoyed

AfraidAngry

Tense

Alarmed

00

2700

1800

900

Figure 3.2: Russell’s Mood Model

Anxious Exuberance

Depression Contentment

High Energy

Low Energy

+ve Stress

-ve Stress

Figure 3.3: Thayer’s Mood Model

15


Krauna

(Pathos)

Shringar

(Love)

Veer

(Valor)

Hasya

(Happy)

Raudra

(Angry)

Bhayanak

(Horrific)

Vibhatsa

(Disgust)

Adbhut

(Surprise)

Shaanti

(Peace)

Figure 3.4: Navras: Indian Classical emotion model

space into four meaningful quadrants facilitates rough music mood categorization

and thus is also widely adopted in mood recognition studies. Figure 3.3 depicts

the Thayers model for mood.

3.2.4 Indian Classical model: Navras

Since we are considering the analysis of Indian Music, we need to have a look at

the traditional mood model that is prevalent since ancient times in the Indian

Classical Music which forms the base for India music. Navras as it is termed in

Sanskrit, it means nine sentiments. This model sums up all the major categories of

emotions that a human can exhibit into total nine classes. These nine sentiments

are depicted in the Figure 3.4.

Studying the advantages and short-coming of various models learned so far and

taking into consideration the Indian popular music scenario, deriving one of the

mood models exactly from the existing ones mentioned in the literature cannot do

justice in selecting the mood categories. Hence, we try to put forth a simple mood

model covering majority of mood aspects after careful study and experiments as

would be witnessed in coming chapters.

16

Chapter 4

Audio Features

4.1 Low level Audio Features

The key components of a classification system are feature extraction and classifier

learning [11]. Feature extraction addresses the problem of how to represent the

music pieces to be classified in terms of feature vectors or pair-wise similarities.

Many audio features have been proposed in the literature for music classifica-

tion. Different taxonomies exist for the categorization of audio features. Weihs et

al. [40] has categorized the audio features into four subcategories, namely short-

term features, long-term features, semantic features, and compositional features.

Scaringella [33] followed a more standard taxonomy by dividing audio features

used for genre classification into three groups based on timbre, rhythm, and pitch

information, respectively. Each taxonomy attempts to capture audio features from

certain perspective. Zhouyu Fu [42] characterizes the audio features into two lev-

els: low-level and middle-level features as seen in Figure 4.1. Our audio feature

selection is inspired by this two-tier taxonomy of audio features.

Low-level features although not closely related to the intrinsic properties of

music as perceived by human listeners, form the basic features which can be used

to derive the mid-level features providing a closer relationship and include mainly

three classes of features, namely rhythm, pitch, and harmony as seen in the Figure

4.1 . In our work we focus only on the low-level audio features which can be further

categorized as :-

• Timbral features: These capture the tonal quality of sound that is related

to different instrumentation. “Timbre” is the quality of a musical note or

sound or tone that distinguishes different types of sound production, such as

voices and musical instruments, string instruments, wind instruments, and

17

4.1 Low level Audio Features

Genre Mood InstrumentArtist,

Style, Other

Pitch

PH/PCP, EPCP

Rythmn

BH, BPM

Harmony

CP, CH

Timbre

ZCR, SC, SR, MFCC, SF ...

Temporal

SM, ARM, FP, AM, ...

Top level labels

(User perspective)

Mid-level features

Low-level features

Semantic Gap

Figure 4.1: Audio Features Taxonomy

percussion instruments. The physical characteristics of sound that determine

the perception of timbre includes spectrum and envelope.

• Temporal features: These capture the variation and evolution of timbre over

time. In this work, more focus is laid on the instantaneous timbre values

rather than their temporal variation, although not completely ignored.

These low-level features are extracted using various signal processing tech-

niques like Fourier transform, Spectral/Cepstral analysis, autoregressive modeling

and similar computations. We follow the MPEG-7 standardization [28] and make

use of the jAudio [24] and Marsyas [25] open source tools for extracting selective

timbral spectral and temporal audio features from the music pieces. The fea-

tures are extracted and consolidated for each music piece in a standard Attribute-

relation file format (ARFF) [2] so as to make it easy for mining the relations

between these features with respect to the corresponding mood of the audio files.

After a careful study and survey of various experts papers and publications, our

current consolidated list of selected and extracted features is as mentioned below.

The list below names just each form of audio feature, but the feature vector is

comprised of its actual value as well as corresponding meta-features such as stan-

dard deviation, mean, logarithm wherever required as identified by McKay and

team [24].

18

4.2 Feature List

4.2 Feature List

• Root Mean Square (RMS): RMS is calculated on a per window basis.

It is defined by the equation:

R.M.S. =

√∑Nn x2

n

N

where N is the total number of samples provided in the time domain. RMS

is used to calculate the amplitude of a window.

• Magnitude Spectrum: This feature extracts the FFT (Fast Fourier Trans-

form) magnitude spectrum from a set of audio samples. It gives a good idea

about the magnitude of different frequency components within a window.

The magnitude spectrum is found by first calculating the FFT with a Han-

ning window [give ref]. The magnitude spectrum value for each bin is found

by first summing the squares of the real and imaginary components. The

square root of this is then found and the result is divided by the number of

bins.

• Power Spectrum: This feature extracts the FFT power from a set of au-

dio samples. It gives a good idea about the power of different frequency

components within a window.

• Spectral Roll-off Point [15]: The spectral roll-off point is the fraction

of bins in the power spectrum at which 85% of the power is at lower fre-

quencies. It denotes the amount of the right-skewness of the power spectrum

• Spectral Centroid [15]: This is a measure of the ”center of mass” of the

power spectrum. It is obtained by calculating the mean bin of the power

spectrum. The result returned is a number from 0 to 1 that represents at

what fraction of the total number of bins this central frequency is.

• Spectral Flux: It measures the amount of spectral change of a signal by

calculating the difference between the current value of each magnitude spec-

tral bin in current window and the corresponding value of the magnitude

19

4.2 Feature List

spectrum of the previous window. Each of these differences is then squared,

and the result is the sum of the squares.

• Spectral Variability: It stands for the standard deviation of the magni-

tude spectrum of the audio signal.

• Fraction of low energy windows [26]: This measures the quietness of

the signal relative to the rest of a signal and is calculated by taking the mean

of the RMS (Root Mean Square) of the last 100 windows and finding what

fraction of these 100 windows are below the mean.

• Zero Crossings [26]: This feature helps identify the pitch as well as the

noisiness of a signal. It is calculated by finding the number of times the

signal changes sign from one sample to another crossing the zero value.

• Strongest Beat: This feature finds the strongest beat in a signal.

• Beat sum: It is calculated by summing up the beat values of a signal and

gives a measure of how important a role regular beats play in a piece of music.

• Beat histogram: This feature helps to identify the strength of different

rhythmic periodicities in a signal. This is calculated by taking the RMS of

256 windows and then taking the FFT of the result.

• Strongest frequency via Zero Crossings: It denotes the highest fre-

quency of the signal present at the Zero crossing point. This is found by

mapping the fraction in the zero-crossings to a frequency in Hertz.

• Mel Cepstral coefficients (MFCC):This feature constitutes the co-efficients

derived from the Cepstral representation of audio signal such that the fre-

quency bands are equally spaced on the Mel scale approximating the human

auditory system’s response more closely. MFCCs are more commonly and

widely used as features in speech recognition systems. In recent times these

20

4.2 Feature List

features are increasingly finding uses in Music information retrieval, audio

similarity measures, genre classification.

• Linear Predictive Coding (LPC) coefficients: This feature helps in

representing the spectral envelope of an audio or speech signal.

• Spectral smoothness: This feature is calculated by evaluating the log of a

partial minus the average of the log of the surrounding partials and is based

upon Stephan McAdam’s Spectral Smoothness [23]. This feature helps in

identifying the peak based calculation of the smoothness of an audio signal.

• Relative difference function: It represents the onset detection and is

calculated as the log of the derivative of the Root Mean Square value.

• Mood: This is the class attribute that needs to be populated during the

training and which is detected automatically while testing the classifier

against a new audio file.

In the given list of audio features some features have a single dimension - for

instance, Strongest Beat, which has just one value - and on the contrary, some

features have variable dimensions - for instance Beat Histogram, which has a series

of values exhibiting the histogram. The variable dimension however depends on

the window size, which in case of this work has been kept constant to “32” for

every 30-second audio clip in the data-set. Hence, including the class attribute,

the data-set consists of a total of 330 feature vectors.

21

Chapter 5

Mining Mood from Audio

Features

5.1 Overview of Data Mining

Data mining is the field which deals with the extraction of interesting, non-trivial,

implicit, previously unknown and potentially useful patterns or knowledge from

huge amount of data. Data mining is often termed as knowledge discovery in

databases (KDD). A typical knowledge discovery process can be seen as depicted

in the Figure 5.1.

Data Mining can be considered as a confluence of various disciplines, includ-

ing database systems, statistics, machine learning, visualization, and informa-

tion science. Moreover, depending on the data mining approach used, techniques

from other disciplines may be applied, such as neural networks, fuzzy and/or

Figure 5.1: Data Mining in Knowledge discovery

22

5.2 Overview of Data Mining functionalities

Figure 5.2: Data Mining disciplines

rough set theory, knowledge representation, inductive logic programming, or high-

performance computing. Depending on the kinds of data to be mined or on the

given data mining application, the data mining system may also integrate tech-

niques from spatial data analysis, information retrieval, pattern recognition, image

analysis, signal processing, computer graphics, Web technology, economics, busi-

ness, bio-informatics, or psychology. Figure 5.2 depicts few of these prominent

disciplines closely associated with Data mining.


Data mining functionalities are used to specify the kind of patterns to be found

in data mining tasks. In general, data mining tasks can be classified into two

categories: descriptive and predictive. Descriptive mining tasks characterize the

general properties of the data in the database. Predictive mining tasks perform

inference on the current data in order to make predictions. Since this work is

related to predicting the mood underlying a particular music piece, the focus of

the work is directed towards exploring the “predictive” mining tasks rather than

“descriptive” mining. Following are the various Data Mining functionalities that

exist formally with a short description of each:-

23


• Characterization and discrimination: Data characterization is a sum-

marization of the general characteristics or features of a target class of data.

The data corresponding to the user-specified class are typically collected

by a database query. For example, to study the characteristics of software

products whose sales increased by 10% in the last year, the data related to

such products can be collected by executing an SQL query. Data discrim-

ination is a comparison of the general features of target class data objects

with the general features of objects from one or a set of contrasting classes.

The target and contrasting classes can be specified by the user, and the cor-

responding data objects retrieved through database queries. The output in

both cases can in the form of pie-chart, bar graphs and similar constructs

for the analyst to study the data.

• Frequent patterns, Association, Correlation: Frequent patterns, as

the name suggests, are patterns that occur frequently in data. There are

many kinds of frequent patterns, including item-sets, subsequences, and sub-

structures. The data under consideration can be analyzed for such frequently

occurring patterns of data attributes which leads to the discovery of inter-

esting associations and correlations within data.

• Classification and prediction: Classification is the process of finding

a model (or function) that describes and distinguishes data classes or con-

cepts, for the purpose of being able to use the model to predict the class

of objects whose class label is unknown. The derived model is based on

the analysis of a set of training data (i.e., data objects whose class label

is known). Whereas classification predicts categorical (discrete, unordered)

labels, prediction models continuous-valued functions. That is, it is used to

predict missing or unavailable numerical data values rather than class labels

• Cluster analysis: Unlike classification and prediction, which analyze

class-labeled data objects, clustering analyzes data objects without consult-

ing a known class label. In general, the class labels are not present in the

training data simply because they are not known to begin with. Clustering

can be used to generate such labels. The objects are clustered or grouped

based on the principle of maximizing the intraclass similarity and minimiz-

ing the interclass similarity. That is, clusters of objects are formed so that

objects within a cluster have high similarity in comparison to one another,

but are very dissimilar to objects in other clusters.

24

5.3 Classification

• Outliers analysis: A database may contain data objects that do not

comply with the general behavior or model of the data. These data ob-

jects are outliers. Most data mining methods discard outliers as noise or

exceptions.However, in some applicate ions such as fraud detection, the rare

events can be more interesting than the more regularly occurring ones. The

analysis of outlier data is referred to as outlier mining.

• Trend and evolution analysis: Data evolution analysis describes and

models regularities or trends for objects whose behavior changes over time.

Although this may include characterization, discrimination, association and

correlation analysis, classification, prediction, or clustering of time related

data, distinct features of such an analysis include time-series data analysis,

sequence or periodicity pattern matching, and similarity-based data analysis.

5.3 Classification

This work involves learning the mood aspect of music by analyzing the various

feature vectors extracted from each audio file. The learning done thus can facilitate

in identifying which specific category of mood a particular audio file belongs to,

provided its fixed set of audio features are available. Of all the functionalities of

data mining just described, “classification” and “cluster analysis” seem to be the

best methods of discovering the mood information from the music feature data-

set. Also, as witnessed in most of the literature survey, classification algorithms

have always proved to be quite effective as compared to others in analyzing the

mood or genre aspect of music data-sets so far. Our own experimentation too has

revealed a quite higher performance of classification algorithms as compared to

clustering algorithms. Hence, we opt for the classification techniques of mining

this music feature data-set with a supervised learning approach.

The Figure 5.3 shows a general process of classification. It is a two step process

:-

• First: This is also called as a “learning step” or “training phase” which

involves learning of a mapping or a function y = f(X), that can predict

the associated class label y of a given tuple X. In this view, we wish to

learn a mapping or function that separates the data classes. Typically, this

mapping is represented in the form of classification rules, decision trees, or

mathematical formulae. This mapping or function is generally termed as

the “Classification Model”. As seen in the step 1 of Figure 5.3, each row of

25

5.3 Classification

Figure 5.3: Classification process

the table represents the tuple X. The function f(X) is learnt as a process of

training by using classification algorithms, and corresponding rule is stored

in the classifier model. This rule helps in predicting if the person represented

in the tuple X is tenured (yes) or not (no) depending upon the values of

various attributes of the tuple.

• Second: This model is used for classification. The model is evaluated against

the test data-set in order to predict the class label of each data instance as has

been learned from the model. The results are compared with actual classes of

the test data and accordingly decided whether the model is accurate enough

to classify the test data. If the model is acceptable, it can be used further

for classifying data with unknown classes. As seen in the step 2 of Figure

5.3, the classifier model evaluates over an unknown tuple X by applying the

function f(X) learnt in order to predict its outcome.

5.3.1 Classification using Decision-tree

Classification of data can be achieved by various methods, to name a few:-

• Classification by Decision Tree Induction

• Bayesian Classification

• Artificial Neural Networks

• Rule-Based Classification

• Classification by Back-propagation

26

5.3 Classification

Figure 5.4: Classification using Decision Tree

• Support Vector Machines

• Associative Classification

Since this work focuses more on a Decision-tree based classification approach,

the description of the rest of methods is outside the scope of the document although

relevant information can be found in the book published by Han Kamber [17].

Decision tree induction is the learning of decision trees from class-labeled training

tuples. A decision tree is a flowchart-like tree structure,where each internal node

(non-leaf node) denotes a test on an attribute, each branch represents an outcome

of the test, and each leaf node (or terminal node) holds a class label. The topmost

node in a tree is the root node. A typical example of a Decision tree is as shown

in the Figure 5.4.

Given a tuple, X, for which the associated class label is unknown, the attribute

values of the tuple are tested against the decision tree. A path is traced from the

root to a leaf node, which holds the class prediction for that tuple. Decision trees

can easily be converted to classification rules. Figure 5.4 represents a decision

tree to predict the class for sanctioning a credit (class values: yes, no) depending

upon various parameters of credit risk assessment like age, current credit rating

and profession. For instance, a senior with an excellent credit rating definitely has

higher chances of sanctioning as compared to the one who has comparatively fair

credit rating.

27

5.3 Classification

Why Decision trees?

Following are the few strong reasons why decision trees have been considered so

often when it comes to classification techniques [17]:

• The construction of decision tree classifiers does not require any domain

knowledge or parameter setting, and therefore is appropriate for exploratory

knowledge discovery.

• Decision trees can handle high dimensional data.

• Their representation of acquired knowledge in tree form is intuitive and

generally easy to assimilate by humans.

• The learning and classification steps of decision tree induction are simple

and fast.

• In general, decision tree classifiers have good accuracy.

• Decision tree induction algorithms have been successfully used for classi-

fication in many application areas, such as medicine, manufacturing and

production, financial analysis, astronomy, and molecular biology.

5.3.2 Random Forest Classification

In order to improve the classification accuracy, ensemble methods like bag-

ging and boosting have been proved quite productive. Ensemble methods use

a combination of models of a series of k learned classification models , M1, M2,

..., Mk, with the aim of creating an improvised model in terms of classification

accuracy. Our work makes use of the “Bagging” approach also called as “Boot-

strap aggregation”. In this method, bootstrap samples of data-sets are created

by randomly sampling the features and data instances from the given training set

with replacement. These samples are then independently and simultaneously used

for training and learning classifier models separately for each sample. Finally, the

classification is done by considering the maximum of the votes taken from all the

models learnt. Random forest approach involves learning such ensemble consist-

ing of a bagging of un-pruned decision tree learners with a randomized selection

of features at each split.This is done by randomly sampling a feature subset for

each decision tree (as in Random Subspaces [19]), and/or by randomly sampling

a training data subset for each decision tree (as in Bagging [4]).

28

5.4 Random Forest Highlights

Random Forests Algorithm

Following is a simplified algorithm explaining the Random forest approach

Data: Training set, Ntrees = Number of treesResult: Majority vote of classification

1 initialization;2 for i← 1 to Ntrees do3 Select a new bootstrap sample from training set;4 Grow an un-pruned tree on this bootstrap;5 for each internal node do6 Mtry ← Random number of predictors;7 Choose best split for these Mtry predictors;

8 end9 Save the un-pruned tree built;

10 Record the vote of classification for each class;11 Return the Majority vote;

12 end

Algorithm 1: Random forest

CART (Classification and Regression Tree) is chosen for building the Ran-

domly generated trees as it is evident from the literature [4] that Random forests

generated from CART yield better results as compared to other tree algorithms

in most of the cases.

5.4 Random Forest Highlights

Random Forests have time and again proven useful and effective in many classi-

fication problem scenarios. Here are few highlights of this approach that make

it more appropriate and suitable for the purpose of mood classification of high-

dimensional music data-sets:-

• Random forests readily handle large number of classifiers.

• They are faster to train and evaluate as compared to other comparable

approaches.

• Random forests exhibit stronger resistance to over-training and thus over-

fitting.

• Separate Cross-validation is unnecessary in case of Random forests since it

is already taken care at the time of forest building.

29

5.5 Our Approach: Bagging of Random Forests

• Random forests generally have similar accuracy as Support Vector Machines,

Neural Networks although Random forests have shown much better perfor-

mance in case of huge and high-dimensional data-sets.


In this work we present an additional hierarchy of ensemble by generating an

ensemble of Random Forests using bootstrap aggregation also know as Bagging.

The Algorithm for the same can be explained as below.

5.5.1 Algorithm

Data: Training set, Nforests = Number of forestsResult: Majority vote of classification

1 initialization;2 for i← 1 to Nforests do3 Select a new bootstrap sample from training set;4 Generate a Random forest on this bootstrap with un-pruned random

tress as mentioned in Algorithm1;5 Save the Random Forest built;6 Record the majority vote of classification among the trees;

7 end8 Return the majority vote of classification among the Random Forests;

Algorithm 2: Bagging of Random forest

For growing Random Trees, the randomly sampled data attributes are split

on the basis of “Gini Index” which has shown better results when working with

CART trees. Gini Index, basically measures the impurity of the data set D at

hand and is given by the formula:-

Gini(D) = 1−m∑i=1

p2i

where pi is the probability that a tuple in D belongs to class Ci. The sum

is computed over m classes. The Gini index considers a binary split for each

attribute. For each split, a weighted sum of the impurity of each resulting partition

is calculate. Suppose dataset D is split into partition D1 and D2, on the basis of

attribute A1 then the Gini Index of attribute A1 for splitting dataset D is given

by :-

30


GiniA1(D) =|D1||D|

Gini(D1) +|D2||D|

Gini(D2)

The Gini Index is computed for all the elligible splitting attributes and the

reduction in impurity of Gini index for each attribute is calculated by the formula:-

∆GiniA1 = Gini(D)−GiniA1(D)

The attribute maximizing the above mentioned reduction in impurity is se-

lected as the splitting attribute.

The approach mentioned in Algorithm 2 has not only shown a rise in accuracy

of classification of music data-sets as compared to traditional Random forest ap-

proach, but also has shown a consistent better performance as compared to other

classification techniques as would be discussed in coming chapters.

31

Chapter 6

Mood Identification System

6.1 Mood Model Selection

As seen in the literature, the mood models studied were mostly from the perspec-

tive of psychology. The way the new dimensional models like Thayer’s Model [37]

and Russell’s model [31] were proposed, if we map music emotion on any of these

dimensional models, then the different emotions could be plotted with different co-

ordinates on the two-dimensional plot, which would have to be grouped together

to get different categories of emotions. There is always a trade-off between the

number of emotions a mood model can portray. A very large number of different

moods can be confusing and frustrating for an end-user to choose a song belong-

ing to one of these moods and a very small number will be too general to isolate

the basic emotions. Since most of the mood models in the literature have been

evaluated on non-Indian music, in this work we do consider the Indian perspective

explained by the nine sentiments (Navras) as mentioned in section 3.2.4.

Out of these nine emotions, however, emotions like anger, horrific, surprise

are seen very rarely seen described by just music alone. These emotions are a

combined effect of music, expression and act. Also, some emotions like Hasya

(Happiness) need a further subdivision, for instance, happy, excited. Hence this

model cannot be used as it is for analyzing the mood aspect of Indian popular

songs. It needs further changes as to how people interpret these songs.

A list of 2500 Indian popular songs which are well-known were chosen by

surveying the songs most liked by majority of the people. A short experiment

similar to Hevner’s [18] was conducted with the help of a panel of 5 music listeners

wherein each member of the panel was independently suppose to listen to a 30

second clip of each of the 2500 songs and note down the best adjective(s) that

they think describes the song emotion aptly. The panel constituted of one Music

32

6.2 System Overview

expert, two avid music listeners and two common music listeners. The adjectives

collected were then grouped together depending on the similarity and the music

clip under consideration. A total of five groups of moods were categorized each

covering a group of adjectives of songs with similar emotional quotient. These five

categories of moods forms our mood model as shown in the table 6.1:-

Table 6.1: Mood Model: Indian popular Hindi music

Mood Category Adjectives represented

Happy cheerful, funny, comic, happy, jovialSad depressed, frustrated, angry, betrayal, with-

drawal, seriousSilent peaceful, calm, silent, nostalgia, slow-pacedExcited danceable, celebration, fast-track, excited,

motivational, inspirationalRomantic love, romantic, playful

6.2 System Overview

The Mood Identification system is the main engine which would help identify the

mood of given music or audio files. This system is designed as an open source

software system. The system would generally be a part of the back-end in most

of the applications whose result can be used by the application layer on top to

utilize the information in the required way. The system has two-fold objectives as

mentioned below:-

1. The system should have a provision of analyzing music files and learn the

classifier models associated with them

2. It should be able to predict the class of mood that a particular audio file or

music belongs to.

An abstract view of the Mood Identification system as seen from a users’

perspective is as shown in the Figure 6.1. The system accepts music files as input

from the user and returns the respective mood associated with each file to the

end-user.

33

6.3 System Design and Components

Figure 6.1: Mood Recognition System


The system can be divided into several components, each dedicated to perform

a particular task as shown in the Figure 6.2 and as explained in the following

content:-

6.3.1 Audio Pre-processor

This component as the name signifies has the main task of preprocessing the audio

files that are fed by the user to the system. The preprocessing task involves :-

a. Audio file splitting: Each of the input music file split into a consequent clips,

each of 30 second duration. A music or a song is generally of a duration of

more than a couple of minutes at least, which makes it difficult to analyze

it due to the enormous data content within this duration. Moreover, 30

seconds has been proven to be quite good duration from analysis point of

view as it is not too short to lose any important content and not too long to

increase the processing time. It is very much possible to relate a particular

mood to a song by just listening to an excerpt of 30 seconds of that song.

b. Audio format conversion: Each of the 30 second music clip is converted to a

standard WAV format (PCM signed 16 bit, stereo) with a sampling rate of

34


Input : Music Files

File Splitter

Wav format

convertor

Audio Feature Extractor

Audio Pre-processor Mood Identification

system

Mood

Classifier

Learner

Mood

Detector

Output: mood

Train Data Test Data

Mood

Model

Figure 6.2: Mood Detection System: Detailed Design

44.1 kHz. Currently the system supports MP3 and WAV formats which are

widely used for audio. A provision for conversion of other formats can also

be done by extending the existing code interfaces. Format conversion to a

single format is necessary so as to ensure the files that would be processed

and analyzed are consistent in structure and format thereby ensuring similar

treatment and processing unlike the case would have been if the formats were

different.

This component thus makes sure that the input music files provided by the user

are transformed so that they can be ready for processing and analyzing further.

6.3.2 Audio Feature Extractor

This module revolves around the audio signal features associated with the music

clips obtained from the Audio Pre-processor. The module performs two main

tasks :-

a. Feature Extraction: Each of the 30 second music clip received as input is

processed by applying mathematical signal computations like Fourier trans-

forms, logarithms, integrals, to name a few, and their variants and com-

bination. These mathematical functions are representatives of each of the

feature mentioned in the list discussed in section 4.2. The module imple-

ments the computations and functions involved for calculating each of the

35


features mentioned. Most part of the module implementation is inspired and

extended from the well-know open source tool jAudio [24], with some vari-

ations and customizations as required for this work. The features extracted

are either fixed dimension or multi-dimensional. A feature vector comprising

all of these features are extracted for each of the music clip.

b. Data-set generation: The feature vectors thus extracted form the attributes

of the each music clip - which can be called as a data instance. These

feature vectors computed in the memory are stored in a flat file following

the standard ARFF file format understood by most of the data mining tools

like Weka [41]. In addition to the features extracted, another feature called

“Mood” is appended to each data instance. This particular feature will be

manually updated in case of a training set and can be possess any dummy

value in case of real scenarios of mood prediction.

6.3.3 Mood Identification System

This is the main processing unit of the whole system and is responsible for mining

the mood from the music data-set obtained as input from the audio feature extrac-

tor module. It comprises of the actual implementation of algorithms mentioned in

section 5.5. The module has two important roles to perform as mentioned below:-

a. Mood Learner: In this case, the input received is a training data-set of

music features with the “Mood” attribute manually updated by the domain

experts, from the point of view of training. The Mood learner can make use

of the existing mining algorithms or newly written algorithms, provided they

follow the convention and framework laid down by Weka tool [41]. Thus, this

module can serve as the experimenter so that user - analyst or researcher -

can utilize it to try various algorithms to mine mood aspect of the underlying

music data-set. The classifier model learnt can thus be saved so that it can be

utilized for further evaluation purpose. The output of this part of the module

generally serves useful to end-users who are analysts or researchers, keen

to understand and tune the machine learning aspect of this whole process.

Using this model, the classifier model for bagging of Random forest approach

was trained and store after careful evaluation and comparisons with other

comparable models. Mood learning is generally one-time activity. Once

done, the model is saved and can be re-used for evaluations any number of

times. However, depending upon the user preference, the learning can be

made iterative to improve accuracy with the most updated music data which

36


evolves over time to a great extent. This change, however, might require few

code changes which is out of the scope of this project currently.

b. Mood Detector: In this case, the music data-set received as input will

have some dummy data in the “Mood” attribute as this feature is not known

ans is expected to be predicted by this module. The Mood detector then

evaluates the data-set under consideration against the mood classifier model

that has been saved. The evaluation results in predicting the mood for every

30 second music clip that was fed to the system by the user. In case a whole

song was fed instead by the user, the system returns the maximum voted

mood from the moods predicted for all of the clips derived from that song.

The output of this module is generally used by the end-user application

such as a mood-annotator or any Music information retrieval application or

even the end-user himself/herself. Although the module helps in detecting

the mood of the music under consideration, the whole and sole control of

accepting or rejecting this decision can be always given to the end-user with

some minor enhancements to the code.

37

Chapter 7

Experiments and Results

The project involved a lot of rigorous experimentation from data mining point of

view. In addition to it, the preparation and pre-processing involved in carrying

out the experimentation is also worth mentioning. This section describes the

experimental apparatus, flow and results obtained during the experimentation for

music mood identification process.

7.1 Experimental Setup

The apparatus included:-

• A huge diverse personal music collection of Indian popular music in mp3 or

wav format.

• Open-source tools and libraries for audio processing.

• Open-source data mining framework - Weka [41].

• Music Mood Identification System.

• Panel of five people - one Music expert, two avid Music listeners, two com-

mon music listeners

• One workstation for software development, assembly and execution

7.1.1 Data Collection

The data collection involved personal music collection of Indian popular Hindi

songs. Only those songs which are generally popular and famous among the people

were selected and care was taken to ensure there is a good mix of collection of

38

7.1 Experimental Setup

songs spawning across each of the five mood classes. Only songs belonging to MP3

or WAV format were shortlisted in alignment with the scope of the project.

7.1.2 Data pre-processing

Dataset generation was carried out in three stages. First stage consisted of 490

songs, the second consisted of 2200 songs and by the third stage a total of 2300

audio songs, popular and belonging to Indian Hindi films were processed to gen-

erate the dataset. All the songs were trimmed to 30 seconds duration clips. Their

low-level features were extracted and consolidated in an ARFF file dataset. Each

entry was annotated with respective most probable mood from the data collected

by consulting the panel of five people in order to recreate a real scenario for su-

pervised training.

7.1.3 Training and Testing

The datasets in each stage were subjected to a range of various existing classifica-

tion algorithms under numerous runs and folds. Those algorithms showing a bias

towards only specific class labels or performing very low were discarded thereby

subjecting the dataset to a 66%-34% training-testing split learning and evaluation

for all the algorithms. Following are top 11 algorithms which have shown top

comparable results during this experimentation:-

• NaiveBayes

• Support Vector Machines

• J48 (ID3 algorithm implementation)

• Random Tree

• Random Forest

• REPTree

• Simple CART (Classification and Regression Trees)

• Bagging of Random Trees

• Bagging of Random Forests

• Bagging of simple CART

• Bagging of REPTree

39

7.2 Results

7.2 Results

7.2.1 Evaluation Metrics

The 11 classification algorithms were evaluated with respect to four evaluation

measures for each of the datasets generated:-

• Receiver Operating Characteristic (ROC) : It shows the trade-off between

the true positive rate and the false positive rate. It is a two-dimensional

plot with vertical axis representing the true positive rate and horizontal axis

representing the false positive rate. A model with perfect accuracy will have

an area of “1”. The area under the ROC curve is a measure of the accuracy

of the model. It ranks the test tuples in decreasing order: the one that is

most likely to belong to the positive class appears at the top of the list .The

closer to the diagonal line (i.e., the closer the area is to 0.5), the less accurate

is the model. Area under ROC was mainly used in signal detection theory

and medical domain where in it was said to be the plot of Sensitivity verses

1- Specificity which is a similar plot as defined earlier. For each of the five

classes of mood model, area under ROC is calculated and more the value

nears “1”, more accurate the classification is.

• Confusion Matrix: The columns of the confusion matrix represent the pre-

dictions, and the rows represent the actual class. Correct predictions always

lie on the diagonal of the matrix. Equation 7.1 shows the general structure

of confusion matrix.

[TP FN

FP TN

](7.1)

wherein, True Positives (TP) indicate the number of instances of a class

that were correctly predicted, True Negatives (TN) indicate the number of

instances NOT of a particular class that were correctly predicted NOT to

belong to that class. False Positives (FP) indicate the number of instances

NOT belonging to a class were incorrectly predicted belonging to that class

and False Negatives (FN) indicate the number of instances that were in-

correctly predicted belonging to other class. Though the confusion matrix

gives a better outlook on how the classifier performed than accuracy, a more

detailed analysis is preferable which are provided by the further metrics.

Since, in this case we have five mood classes, the confusion matrix will be a

5 X 5 matrix with each diagonal representing the True positives.

40

7.2 Results

• Recall: Recall is a metric that gives a percentage of how many of the actual

class members the classifier correctly identified. (FN + TP) represent a total

of all minority members. Recall is given by equation 7.2

Recall =TP

TP + FN(7.2)

• Precision: It gives us the total the percentage of how many of a particular

class instances as determined by the model or classifier actually belong to

that particular class. (TP + FP) represents the total of positive predictions

by the classifier. Precision is given by equation 7.3

Precision =TP

TP + FP(7.3)

Thus in general it is said that Recall is a Completeness Measure and Pre-

cision is a Exactness Measure. The ideal classifier would give value as 1

for both Recall and Precision but if the classifier gives higher(closer to one)

for one of the above metrics and lower for the other metrics in that case

choosing the classifier is difficult task. In such cases some other metrics as

discussed further are suggested in the literature.

• F-Measure: It is a harmonic mean of Precision and Recall. We can say that

it is essentially an average between the two percentage. It really simplifies

the comparison between the classifiers. It is given by the equation 7.4.

F −Measure =2

( 1Recall

+ 1Precision

)(7.4)

Figures 7.1, 7.2, 7.3, 7.4 depict the performance of the algorithms with reference

to the four measures namely, AUROC, Recall, Precision and Fmeasure. From each

of the results, it can be seen that Bagging(ensemble) approach of classification

tree algorithms like RandomForest, RandomTree and SimpleCART showed better

results as compared to other algorithms, and Bagging of Random Forest topped

among all consistently.

41

7.2 Results

Figure 7.1: Area under ROC statistics

Figure 7.2: Recall statistics

42

7.2 Results

Figure 7.3: Precision statistics

Figure 7.4: F-measure statistics

43

7.2 Results

Table 7.1: Experimental Results on Test Dataset of 2938 music clips

TPRate

FPRate

Precision Recall F-Measure

ROCArea

MoodClass

0.964 0.106 0.751 0.964 0.845 0.991 excited0.805 0.021 0.914 0.805 0.856 0.978 happy0.77 0.006 0.971 0.77 0.859 0.967 romantic0.822 0.019 0.867 0.822 0.844 0.977 sad0.871 0.038 0.849 0.871 0.86 0.983 silent

0.853 0.042 0.867 0.853 0.853 0.98 ←Avg.

Table 7.2: Experimental Results on Test Dataset of 2938 music clips

a b c d e ← Classified As

704 16 1 0 9 a = excited69 511 7 15 33 b = happy94 16 470 10 20 c = romantic34 5 1 314 28 d = sad36 11 5 23 506 e = silent

The Table 7.1 shows the evaluation results obtained for the said metrics after

performing a test run a dataset of 2938 music clips belonging to Indian popular

Hindi music. The table shows the classification performance for each of the mood

category defined in the mood model.The last row represents the metric values with

weighted average taken over all the classes.

The Table 7.2 displays the confusion matrix for the evaluation of the Test

dataset of 2938 music clips belonging to Indian popular music. As seen from

the matrix, the diagonal elements marked bold are the correctly identified data

instances and denote the True positives. From the data seen in the matrix, fol-

lowing can be inferred:-

Total number of instances: 2938

Number of correctly classified instances: 2505 (85.26%)

Number of incorrectly classifier instances: 433 (14.74%)

44

Chapter 8

Applications

Our work we believe can contribute substantially to a variety of real world appli-

cations involving music. Following are a few of the many fields that can reap the

benefits this system:-

8.1 Music Therapy Applications

The field of Music Therapy involves clinical use of music in a therapeutic way

to treat individuals by addressing their physical, emotional, social and cognitive

needs. As a result of the tremendous research and successful experiments, Music

Therapy has emerged as an important field using music as medium to improve

the quality of life of the people in spite of diversity, disability or illness. Receptive

Musical Therapy is one of the many important streams of this field wherein after

examining the condition of the individual; the Music Therapy Expert plans and

recommends a routine involving listening to a particular type of music. Since this

therapy is more close to emotional and psychological needs of the individual, mood

underlying the music plays an important role in the choice of music. Automatic

mood recognition of music can help to reduce efforts of the expert to manage,

search and recommend the appropriate music relevant for the individual. This

can also be extended to online self-therapy applications wherein the individuals

can themselves choose the appropriate music accurately as directed by the expert

and without much search efforts.

8.2 Music Information Retrieval

MIR systems aim at extracting information from music. This information can

be used for various music applications like Recommender systems, Instrument

45

8.3 Intelligent Automatic Music Composition

recognition and separation applications, Automatic categorization systems and

many more. Our system can contribute to the Automatic categorization systems

wherein the music can be categorized by its corresponding mood recognized by

our system automatically. This will not only help to organize the music in a much

better way but also reduce the overhead on users for selecting a list of songs suiting

the current mood or occasion. With this system in place, user can just choose a

mood and the system can give him the list of all songs belonging to that mood.

From this subset, the user has to select the songs he wishes to listen to and this

subset is very small in size as compared to the whole set of songs wherein by using

traditional technique the user selects the songs list either by song name, album

or artist and then searches for the song that matches the mood. The system can

also find application in recommender system to recommend the songs matching

the mood along with other traditional parameters, which can definitely give better

results.

8.3 Intelligent Automatic Music Composition

Music in today’s world is created and composed by highly skilled and trained

musicians. With the increasing innovations in technology, many softwares and

devices have also proved beneficial in assisting the musicians in easing the efforts

put to compose music from various instruments, singers and merge or process it.

A lot of research is going on from all parts of the world with the aim of building a

system which can compose music automatically and intelligently enough to sound

interesting the way humans compose it. Building such application will not only

require a lot of music signal processing , pattern recognition and matching but

also a great deal of information and data about the music in order to produce a

novel music composition. Mood of the music pieces can form one of the important

parameter in searching music pieces to be put together to generate a new music.

Our system can help at this stage by automatically recognizing and annotating

music pieces.

46

Chapter 9

Conclusion and Future Work

9.1 Conclusion

We successfully experimented with the task of mapping audio features of Indian

Popular Music with respective moods with the top precision ranging in between

75% and 81% with respect to Fmeasure and 70% to 75% precision measure. The

best accuracy w.r.t. area under ROC was observed in the range 0.91 to 0.94 which

seems quite satisfactory. The Bagging of Random Forest approach thus performed

much better as compared to not just other decision tree based algorithms but other

classification algorithms as well. This was a new observation in case on analysis of

Indian popular music unlike western music where SVM and neural network algo-

rithms dominated the classifier accuracy. The classification performance achieved

seems satisfactory so far thus making it useful for use in real applications. The

open source framework developed as a part of the project also serves as a common

framework for music data mining analysis in terms of an end-to-end solution. Al-

though the current approach has proved satisfactory results, we consider this as

just a first step in exploring Indian popular music and it opens avenues for further

research and developments in this are to bring more efficient results.

9.2 Future Work

The path forward involves a further cycle of experimentations and refinement

of the audio features and the mood categories if required, so as to enrich the

dataset in addition to increased number and variety of songs to extract further

valuable information for mood learning. During this development cycle the mood

model also has a chance to likely undergo some changes to suit best the Indian

47

9.2 Future Work

Song scenarios. The current system is capable of recognizing the mood of songs

of 30 second duration. Further this can be extended to derive the mood of the

entire song by collectively recognizing and weighing the moods recognized for the

30-second trimmed clips of the song. In future, this system can be extended

to other genres of Indian songs like Hindustani classical, Carnatic music with

changes involving audio features and classification techniques. Customization of

this system to non-Indian songs cannot be ruled out as well after a thorough

experimentation. Since some of the moods represented by Indian popular music

are very much governed by expressions, which are very well conveyed through

lyrics, lyrics analysis in combination with audio features can make the system

much stronger with a better accuracy.

48

Chapter 10

Project Milestones

10.1 Project Schedule

Table 10.1 outlines the project schedule and major milestones achieved during

the schedule in the direction of completion of the project. The project was on

schedule and completed in the time planned with respect to the scope assigned

for the project.

Table 10.1: Weekly Schedule of Project Starting 1 July, 2011

Week Task Status

1 to 4 Problem Statement Indentification Completed5 to 6 Problem Statement Finalization Completed7 Project Synopsis, Literature Survey - MIR Completed8 to 9 Literature Survey - MIR and Music Analysis Completed10 to 11 Literature Survey - Mood Classification Completed12 Literature Survey - Audio Features Completed13 to 14 Literature Survey - Feature Extraction Tools Completed15-16 Literature Survey- Data Mining for Music Completed17 Data Collection and Preprocessing Completed18 to 19 Detailed System Design, Data Processing Completed20 to 23 Existing mining algorithms training Completed24 Algorithm performance analysis Completed25 to 26 Algorithm refinement Completed27 to 29 Feature selection refinement Completed30 to 31 Mood Model refinement Completed32 Feature and Mood model finalization Completed33 Data collection and Dataset re-structuring Completed34 to 35 Re-analysis and evaluation of model learnt Completed36 to 40 Code integration and testing Completed41 to 44 Bug solving and fixing Completed

49

10.2 Publications’ status

10.2 Publications’ status

Right from the conception of the project till its completion for the said scope,

the project has been through various stages wherein we have received a very good

response, suggestions and critics regarding our work as presented through our

research papers. Few of our papers related to our work done, which have been

reviewed, accepted and appreciated by notable international conferences are listed

below

Table 10.2: Paper publications’ status

Title Conference Status

Automatic mood clas-sification model for In-dian Popular Music

Asia Modelling Symposium 2012, Pro-ceedings to be published in IEEE Com-puter Society Digital Library (CSDL)and I-Xplore http://ams2012.info

Published

Mood based classifica-tion of Indian PopularMusic

CUBE 2012, Proceedingsto be published in ACM,http://www.thecubeconf.com/academic/

Accepted

Music Mood Identifi-cation: A Data Min-ing Approach

5th International conference on Psy-chology of Music and Mental Health,Bangalore, http://www.nada.in/

Accepted

50

Bibliography

[1] Tsunoo, E., Akase, T., Ono, N., Sagayama S., (2010), “Musical mood classifi-

cation by rhythm and bass-line unit pattern analysis”, Proceedings of IEEE In-

ternational Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Attribute-Relation File Format, http://www.cs.waikato.ac.nz/ml/weka/arff.html

[3] Doris Baum, (2006), “Emomusic - Classifying music according to emotionl”,

Proceedings of the 7th Workshop on Data Analysis (WDA), Kosice, Slovakia.

[4] Breiman, L., (1996), “Bagging predictors”. Machine Learning 24(2), 123140.

[5] Capurso, A., Fisichelli, V. R., Gilman, L., Gutheil, E. A., Wright, J. T., (1952),

“Music and Your Emotions”, Liveright Publishing Corporation.

[6] Chia-Chu Liu, Yi-Hsuan Yang, Ping-Hao Wu, Homer H. Chen, (2006), “De-

tecting and classifying emotions in popular music”, JCIS Proceedings’.

[7] Dalibor Mitrovic, Matthias Zeppelzauer, Horst Eidenberger, (2007), “Analysis

of the Data Quality of Audio Descriptions of Environmental Sounds”, Journal

of Digital Information Management, 5(2):48.

[8] Trung-Thanh Dang, Kiyoaki Shirai, (2009), “Machine Learning Approaches

for Mood Classification of Songs toward Music Search Engine”, International

Conference on Knowledge and Systems Engineering.

[9] Atin Das, Pritha Das, (2005), “Classification of Different Indian Songs Based

on Fractal Analysis”, Complex Systems Publications.

[10] Dewi K.C., Harjoko. A., (2010), “Kid’s Song Classification Based on Mood

Parameters Using K-Nearest Neighbor Classification Method and Self Orga-

nizing Map.”, International Conference on Distributed Frameworks for Multi-

media Applications (DFmA).

51

BIBLIOGRAPHY

[11] Duda, R. O., Hart P. E., (2000)“Pattern Classification”,New York Press:

Wiley.

[12] Ekman P., (1982), “Emotion in the Human Face”, Cambridge University

Press, Second ed.

[13] Paul R. Farnsworth, (1958), “The social psychology of music, The Dryden

Press.

[14] Fu, Z., Lu, G., Ting, K. M., Zhang D., (2010), “A survey of audio-based

music classification and annotation”, IEEE Trans. Multimedia.

[15] Geroge Tzanetakis, Perry Cook, (2002), “Musical Genre Classification of Au-

dio Signals”, IEEE Transaction on Speech and Audio Processing.

[16] Han, B., Rho, S., Dannenberg, R. B., Hwang E., (2009), “Smers: Music

emotion recognition using support vector regression”, Proceedings of the 10th

Intl. Society for Music Information Conf., Kobe, Japan.

[17] Han J., Kamber M., Pei J., (2011), “Data Mining: Concepts and Techniques,

3rd Edition”, Morgan Kauffman publications, ISBN: 9780123814791.

[18] Kate Hevner, (1936), “Experimental studies of the elements of expression in

music, American Journal of Psychology, 48:246268.

[19] Ho, T., (1998), “The random subspace method for constructing decision

forests.”, IEEE Transactions on Pattern Analysis and Machine Intelligence

20(8), 832844.

[20] JungHyun Kim, Seungjae Lee, SungMin Kim, WonYoung Yoo, (2011), “Mu-

sic Mood Classification Model Based on Arousal-Valence Values”, ICACT2011,

ISBN 978-89-5519-155-4.

[21] Li T., Ogihara, M., (2003), “Detecting emotion in music”, Proceedings of the

International Symposium on Music Information Retrieval, Washington D.C.,

USA.

[22] Liu, D., Lu, L., Zhang H. J., (2003), “Automatic Mood Detection from Acous-

tic Music Data”, Johns Hopkins University.

[23] McAdams. S., (1999), “Perspectives on the contribution of timbre to musical

structure.”, Computer Music Journal, 23:85102.

52

BIBLIOGRAPHY

[24] McEnnis, D., McKay, C., Fujinaga, I., Depalle P., (2005), “jAudio: A fea-

ture extraction library”, Proceedings of the International Conference on Music

Information Retrieval. 6003.

[25] Marsyas, http://opihi.cs.uvic.ca/marsyas

[26] Masato Miyoshi, Satoru Tsuge, Hillary Kipsang Choge, Tadahiro Oyama,

Momoyo Ito, Minoru Fukumi, (2010), “Music Impression Detection Method for

User Independent Music Retrieval System”,Proc. of KES2010, pages 612621.

[27] Mirenkov, N., Kanev, K., Takezawa, H., (2008), “Quality of Life Supporters

Employing Music Therapy”, Advanced Information Networking and Applica-

tions - Workshops(AINAW).

[28] MPEG-7 Overview, http://mpeg.chiariglione.org/standards/mpeg-7/mpeg-

7.htm

[29] Rabiner, L., Juang, B., (1993), “Fundamentals of speech recognition.”, New

York: Prentice-Hall.

[30] Radocy E., Boyle J. D., (1988), Psychological foundations of musical behav-

ior, Springfield, IL: Charles C. Thomas.

[31] Russell J. A., (1980), “A circumplex model of affect”,Journal of Personality

and Social Psychology, 39: 1161-1178.

[32] Ryo Hirae, Takashi Nishi, (2008), “Mood Classification of Music Audio Sig-

nals”, The Journal of the Acoustical Society of Japan.

[33] Scaringella, N., Zoia, G., Mlynek, D., (2006), “Automatic genre classification

of music content, A survey.”, IEEE Signal Processing Magazine, vol. 23, no.

2, pp. 133141.

[34] Schoen M., Gatewood E. L., (1999), “The mood effects of music”, Interna-

tional Library of Psychology, Routledge.

[35] Schubert E., (1996), “Continuous response to music using a two dimensional

emotion space”, Proceedings of the 4th International Conference of Music Per-

ception and Cognition.

[36] Sloboda J. A., Juslin P. N., (2001), “Music and Emotion: Theory and Re-

search”, New York: Oxford University Press.

53

BIBLIOGRAPHY

[37] Thayer, R. E., (1989), “The Bio-psychology of Mood and Arousal”, New

York: Oxford University Press.

[38] Tyler P., (1996), “Developing A Two-Dimensional Continuous Response

Space for Emotions Perceived in Music”, Doctoral dissertation, Florida State

University.

[39] Wang, M., Zhang, N., Zhu, H., (2004), “User-adaptive Music Emotion Recog-

nition”, IEEE International Conference on Signal Processing, pp. 1352-1355.

[40] Weihs, C., Ligges, U., Morchen, F., Mullensiefen, D., (2007), “Classification

in music research.”, Advance Data Analysis Classification, vol. 1, no. 3, pp.

255291.

[41] Weka, http://www.cs.waikato.ac.nz/ml/weka/

[42] Zhouyu Fu, Guojun Lu, Kai Ming Ting, Dengsheng Zhang, (2011), “A Survey

of Audio-Based Music Classification and Annotation”, IEEE Transactions on

multimedia, Vol. 13, No. 2.

54

Documents

Automatic Mood Classi cation of Indian Popular MusicAutomatic Mood Classi cation of Indian Popular Music Dissertation Submitted in partial ful llment of the requirements for the degree