Graphical visualization of musical emotions

GRAPHICAL VISUALIZATION

OF MUSICAL EMOTIONS

Presented by:

Pranay Prasoon

MT/SC/10002/2012

M. TECH. SCIENTIFIC COMPUTING

Under the guidance of:

Dr. Saubhik Chakraborty

Associate Professor, Dept. of Applied Mathematics

Hindustani Classical Music(ICM)

It is rich in both emotional content and musical content.

There are seven notes in ICM i.e. Sa re ga ma pa dha ni.

The base if Indian classical music is Raga.

RAGA

A raga is simply a group of notes .

Each raga invokes a certain mood/ emotion.

Different sequence of notes represent different raga.

Example:-

Bageshri(sad) : ni sa ga ma dha ni sa

Bhupali(happy): sa re ga pa dha sa

Music relation to mathematics

Mathematics is "the basis of sound" and sound is the

basic of musical aspects.

Some basic terms with which we can relate music to

the mathematics

1. Sound

Music is sound that is organized in a meaningful way

with rhythm, melody, and harmony. These are

consider as the three dimensions of music.

Sound is the form of energy

Music and mathematics(contd.)

2. Frequency

The number of times the sound wave completes a cycle of

oscillation in one second is called its frequency. Frequency is

measured in cycles per second or Hertz (Hz).

Higher the frequency , higher is the pitch value. And vice

versa.

3. Amplitude

Amplitude is the size of the vibration, and this determines

how loud the sound is.Measured in decibels and the range for

human ear is (2-130 db)


4. Pitch Scale

In India Classical Music each note pitch value is

dependent upon the previous note pitch value.

S=1 P=1.5

R=1.125 D=1.6875

G=1.1265625 N=1.8584375

M=1.423828


Time

Tempo: speed of Beats

Rhythm : relative time durations of notes determine

in musical language is called rhythm.

Literature survey

We carefully studied the papers from international publishers to understand

the concept of India classical music, depth study of ragas of Indian classical

music, neural network approach in music, pattern recognition, emotion

recognition and features of musical clips.

Soltani . K, Ainon R.N (2007) and Yongjin Wang, Ling Guan(2008) author

discussed deeply about the emotion recognition in speech signals.

Coutinho, E. &Cangelosi, A. (2010) : In this Book the author discusses

about a model capacity of prediction of human emotion while listening to

music. In this he added both psychoacoustic and physiological features for

the prediction of emotion.

Keshi Dai, Harriet J. Fell, and Joel MacAuslan. (2012) : The author shows

the good use of audio features to recognize emotion and the different

combination of emotion to recognize and find the accuracy of each

combination of emotions.

Literature survey(contd.)

Zhen-GuoChe, Tzu-An Chiang and Zhen-Hua Che. (2010) : In

this paper the author discusses the advantage and

characteristic of the genetic algorithm and back propagation

neural network to train a feed forward neural network to cope

with weight adjustment problem.

We found that very less models have been developed for raga

identification in Indian classical music.

From our survey we find that neural networks gives high

accuracy for identification when uses with multiple features.

Characterization of the Problem

Our aim is to better explain and explore the relationship between musical features and emotion.

The main problem initially identified was to extract musical features for each audio file.

Another problem was to develop a model for recognition of emotion from the musical clips.

Then to graphically visualize the performance of the process of recognition.

Objective of work

1. The main objective of the work is to spread the light on theemotion recognition in Indian classical music.

2. To understand the dependencies of features of music inrecognition process.

3. To understand the ANN concept for recognition process.

4. To analyze the error in each recognition process.

5. Visualize the performance of model. (for training, validationand testing)

Research methodology

Ground truth data with their features will lead the recognition

process with ANN.

Artificial neural network: ANN are computational models

inspired by the nervous system of brain which is capable of

machine learning as well as pattern recognition.

ANN when use with features of audio clips will be used for

emotion classification.

Model Formulation

Data Selection

Serial No. Target

emotions

Ragas in audio

clips

Number of

audio clips

Total

H Bhupali 20

A Bihag 28

1 P Desh 30 98

P Marwa 20

Y

Bageshree 24

S Bhairavi 19

2 A Bhimpalashi 20 98

D Deskar 15

Todi 20

Total 2 9 196 196

Pre-processing

Manually divide all samples into two parts named happy and

sad.

Convert our dataset to standard wave format 44100 Hz.

The audio clips are we take is of 30 second each.

Feature selection

Feature extraction involves analysis of speech signal.

We extracted 13 features for our work:

1. Roll Off 8. pulse clarity

2. Spread 9. mode

3. Zero cross 10. Entropy

4. Centroid 11. Brightness

5. RMS Energy 12. Probability of increment

6. Low Energy of two successive pitch

7. Event density 13. Probability of decrement

of two successive pitch

Root Mean Square Energy

Energy of the signal x can be computed simply by taking the root average

of the square of the amplitude, called root-mean-square (RMS):

Formula:

Happy labelled son have more RMS energy as compared to sad labelled

song.

Root Mean Square Energy(contd.)

S.N Happy Sad

1 .20566 .095603

2 .16613 .07754

Low energy

It is defined as the percentage of analysis windows that have

less RMS energy than the average RMS energy across the

texture window.

As an example, vocal music with silences will have large low-

energy value while continuous strings will have small low-

energy value

Low energy(Contd.)

S.N HAPPY S.N SAD

1 .341212 1 .51122

2 .59215 2 .51104

3 .46183 3 .48924

4 .48755 4 .51524

Entropy

The relation of entropy is with the emotion type of surprise.

If the probability is less for the occurrence of any event than

the expectation is more. And then the surprise element is

more.

The entropy system is based on this equation.

Entropy(contd.)

S.N Happy S.N sad

1 .85721 1 .83682

2 .85596 2 .83643

3 .85319 3 .83185

4 .85982 4 .83892

Zero Cross Rating

The zero-crossing rate is the rate of sign-changes along

a signal.

Where S is a signal of length T and the indicator function |

{} is 1 if its argument is true and 0 otherwise.

Zero cross Rating(contd.)

S.N Happy Sad

1 789.1929 878.5048

2 736.8308 873.3451

3 945.9796 776.6267

4 1045.694 855.9022

Pitch

Pitch is a perceptual property that allows the ordering

of sounds on a frequency related scale.

Pitches are compared as "higher" and "lower”.

Pitches are usually quantified as frequencies in cycles per

second.

With the use of pitch(pitch contour) we find two Landmark

features for our work. Which are Probability of increment in-

between two successive pitches and probability of decrement

in between two successive pitches.

Pitch(contd.)

S.N Happy_Prob_inc Happy_Prob_dec Sad_prob_ind Sad_Prob_dec

10.4403 0.4283 0.4137 0.3673

20.4295 0.4815 0.4133 0.391

30.406 0.4233 0.3317 0.3627

40.4147 0.4487 0.3881 0.3848

Event Density

It estimates number of note onset per second.

S.N Happy Sad

1 2.3038 1.5359

2 2.1369 2.0367

3 3.3723 1.5359

4 2.6711 0.70117

Centroid

The centroid is defined as the center of gravity of the

spectrum

It is calculated as the mean of the frequencies present in the

signal, with their magnitudes as the weights.

x(n) represents the weighted frequency value,

and f(n) represents the centre frequency.

Centroid(contd.)

S.N Happy Sad

11907.3599 1857.3502

22353.4706 1936.1145

32307.5883 1748.8305

41995.0818 1756.3829

Mode

It estimates the modality, i.e. Major vs. Minor.

Mode return value between -1 to +1 .

When closer to +1, the more major the given excerpts is

predicted to be, the closer the value is to -1, the more minor

the excerpt might be.

Mode(contd.)

S.N Happy Sad

1-0.18676 0.14946

2-0.06964 -0.0049425

4-0.16293 -0.051353

4-0.14412 0.061022

Pulse clarity

Pulse clarity is considered as a high-level musical dimension

that conveys how easily in a given musical piece, listeners can

Understand the underlying rhythmic.

S.N Happy S.N Sad

1 .40913 1 .11393

2 .30407 2 .31914

3 .18223 3 .18962

4 .14884 4 .18083

Pulse clarity(contd.)

Roll Off

Roll-off is the steepness of a transmission

function with frequency.

The roll-off refers to the rate at which the filter attenuates the

input frequency after the cut-off frequency point.

S.N Happy S.N Sad

1 4091.8974 1 3543.9783

2 4236.4054 2 3168.4931

3 4525.5054 3 4038.4008

4 4271.0604 4 4007.0263

Classification

Classification

Classify the data in two category.

As input we take list of features.

Classifier Used: ANN

Three process

1. Training 2. Validation 3. Testing

Artificial neural Network

It is a computational model inspired by the human nervoussystem.

ANN generally presented as interconnected neurons, whichcan compute values from input.

ANN can be defined based on three characteristic.

1. Architecture 2. Learning mechanism 3. activation function

Every system basically have 3 layer.

1. Input 2. Hidden layer 3. output

ANN(contd.)

Architecture:

Directed graph : each edge is assign an orientation.

Classification using: Multilayer Feed forward NN.

Learning method: Supervised Learning.

Algorithm used: back propagation.

ANN(contd.)

Steps of Back propagation algorithm.

1. Normalize all I/P values in between 0 to 1.

2. Number of hidden node= No of I/P node × No of O/P Node

2

3. V= weight between I/P and Hidden Nodes.

W= weight between hidden and O/P Nodes.

( weights are initialized to random values between -1 to +1).

4. Input and Out put of Input layer:

{O}I = {I}I

ANN(contd.)

Input to the hidden layer is computed by multiplying I/P

values to their corresponding weights.

{I}H = [V ] {O}I

Output of Hidden layer is computed using sigmoidal function.

ANN(contd.)

Input to output layer are computed by multiplying

corresponding weights.

{I}O = [W] {O}H

output of output layer is calculated as:

ANN(contd.)

Error can be calculated as:

{d} which is local gradient of node is calculated as:

ANN(Contd.)

[y] matrix is calculated :

[Y] = {O}H × d

Change in Weight:

ANN(contd.)

Error in Hidden layer :

And the new d is :

Calculate [x] matrix:

[x]= ×

ANN(contd.)

Change in weight of input-hidden layer:

Updated weights for next training:

The process will be repeated again and again until the error rate will reduce to very small.

RESULT and STUDY

Training Data: 70% of the total (138)

Validation data: 15 % of the total (39)

Testing data : 15% of total(39)

Recognition of two emotional states

1st experiment:

13 features taken as input.

Testing data: 39

Correctly classified for happy = 15 out of 17

Correctly classified for sad = 21 out of 22

Output

Input Happy Sad

Happy 15 2

Sad 21 1

Error Histogram:

Performance graph:

Performance analysing

Performance for 10 training and testing process. And we see that there is

little variation in accuracy in our model.

Test Number Performance Accuracy with LF Performance accuracy without LF

1. 96.687 62.646

2. 96.947 73.454

3. 98.159 65.09

4. 97.31 62.112

5. 98.054 63.878

6. 96.004 63.003

7. 96.808 68.433

8. 98.691 65.06

9. 82.258 70.112

10. 97.243 64.123

conclusion

We proposed a new model for automatic recognition of musical emotion which is

based on Artificial neural network .

We used multilayer feed forward neural network for classification and the

algorithm used is Back propagation.

A total of 13 features were extracted from each audio samples. We proposed two

new features ( probability of increment in between two successive pitch and

probability of decrement in between two successive pitches). Classification process

carried out using neural network toolbox in matlab.

A total of 10 experiments were done and each time the accuracy of recognition of

emotion was more the 90%. When we classify the model without our land mark

features the accuracy is reduced by 30%. The average accuracy we archived for

our model is 95.8161

features Without landmark feature With landmark feature

Accuracy 65.7911 95.8161

scope

Study of work is useful for them who have

difficulties in understanding ragas in India classical

music.

The study will be helpful in psychology science to

study the changes in brain when listen to Indian

classical music.

Our study is useful in medical science.

Future work

we plan to develop an automatic emotion recognizer for

Indian classical music with more emotions category for those

people who have difficulties in understanding and identifying

emotion in the Indian classical music.

We planed to develop model in which we will add some more

physiological features like heart beat rate, skin temperature

and brain signals. We feel that when we include physiological

features the accuracy will increase more for the system.

Publication

P. Prasoon and S. Chakraborty, “Raga Analysis

using Artificial Neural Network” - Communicated to

Computational Music Science (Book Series), Springer

as a research monograph.

Reference

[1] A. Srinivasan (2011). “Speech Recognition Using Hidden Markov Model”. Applied Mathematical Sciences, Vol. 5, 2011, no. 79, 3943 – 3948

[2] BjörnSchuller, Manfred Lang, Gerhard Rigoll (2002): "Multimodal Emotion Recognition in Audiovisual Communication", Proc. ICME 2002, 3rd International Conference on Multimedia and Expo, IEEE, vol. 1, pp. 745-748, Lausanne, Switzerland,

[3] Coutinho, E. &Cangelosi, A. (2010). “A Neural Network Model for the Prediction of Musical Emotions.”In S. Nefti-Meziani& J.G. Grey (Ed.). Advances in Cognitive Systems (pp. 331-368). London: IET Publisher. ISBN: 978-1849190756

Reference(contd.)

[4] Daniela and Bernd Willimek. (2013). Music and Emotions Research on the Theory of Musical Equilibration (die Strebetendenz-Theorie). Copyright © 2011 Daniela und Bernd Willimek

[5] Deryaozkan, Stefan scherer and Louis-philippemorency. (2013) “Step-wise emotion recognition using concatenated-HMM”, IEEE Transactions on Multimedia 15(2): 326-338

[6] Gaurav Pandey, chaitanya Mishra and Paul Ipe, “TANSEN: A system for automatic raga Identification “, (2003). PP.1350-1363. Indian International conference on AI.

[7]Jack H. David Jr.(1995) , “The Mathematics of Music”. Spring,Math1513.5097

Reference(contd.)

[8] Keshi Dai, Harriet J. Fell, and Joel MacAuslan. (2012) “Recognizing emotion in speech using neural network”

[9]Mohammad abd- alrahman mahmaoud Abushariah, Raja NoorAinon, RoziatiZainuddin, MoustafaElshafei, Othman OmranKhalifa: (2012) “ Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus” . Int. Arab J. Inf. Technol. 9(1): 84-93

[10] O. Lartillot and P. Toiviainen, “A matlab toolbox for musical feature extraction from audio,” in Proc. Digital Audio Effects (DAFx-07), Bordeaux, France, Sep. 10-15 2007

[11] Sandeep bagchee, (1998) “Nad: Understanding Raga music” Eshwar, 1st edition. ISBN-13: 978-8186982075

Reference(contd.)

[12] www.wekepedia.org

[13] www.paragchordia.com

[14] www.swarganga.org

[15] www.mathworks.in

[16] www.shadjamadhyam.com

[17] www.22shruti.com

[18] www.knowyourraga.com

[19]www.skeptic.skepticgeek.com

Reference(contd.)

[20] Yading Song, Simon Dixon, Marcus Pearce (2012) .”EVALUATION OF

MUSICAL FEATURES FOR EMOTION CLASSIFICATION”.13th

international society for Music Information Retrieval Conference (ISMIR).

[21]Yongjin Wang, Ling Guan(2008) : Recognizing Human Emotional State

From Audiovisual Signals. IEEE Transactions on Multimedia 10(4): 659-

668

[22] Zhen-GuoChe, Tzu-An Chiang and Zhen-Hua Che. (2010) .“Feed

forward neural networks training: A comparison between genetic algorithm

and back propagation learning algorithm”. International journal of

innovation and computing, information and control .volume 7

Thank you.....