Upload
pranay-prasoon
View
381
Download
7
Embed Size (px)
DESCRIPTION
A unique method for Identifying Emotion in Indian Classical Music.
Citation preview
GRAPHICAL VISUALIZATION
OF MUSICAL EMOTIONS
Presented by:
Pranay Prasoon
MT/SC/10002/2012
M. TECH. SCIENTIFIC COMPUTING
Under the guidance of:
Dr. Saubhik Chakraborty
Associate Professor, Dept. of Applied Mathematics
Hindustani Classical Music(ICM)
It is rich in both emotional content and musical content.
There are seven notes in ICM i.e. Sa re ga ma pa dha ni.
The base if Indian classical music is Raga.
RAGA
A raga is simply a group of notes .
Each raga invokes a certain mood/ emotion.
Different sequence of notes represent different raga.
Example:-
Bageshri(sad) : ni sa ga ma dha ni sa
Bhupali(happy): sa re ga pa dha sa
Music relation to mathematics
Mathematics is "the basis of sound" and sound is the
basic of musical aspects.
Some basic terms with which we can relate music to
the mathematics
1. Sound
Music is sound that is organized in a meaningful way
with rhythm, melody, and harmony. These are
consider as the three dimensions of music.
Sound is the form of energy
Music and mathematics(contd.)
2. Frequency
The number of times the sound wave completes a cycle of
oscillation in one second is called its frequency. Frequency is
measured in cycles per second or Hertz (Hz).
Higher the frequency , higher is the pitch value. And vice
versa.
3. Amplitude
Amplitude is the size of the vibration, and this determines
how loud the sound is.Measured in decibels and the range for
human ear is (2-130 db)
Music and mathematics(contd.)
4. Pitch Scale
In India Classical Music each note pitch value is
dependent upon the previous note pitch value.
S=1 P=1.5
R=1.125 D=1.6875
G=1.1265625 N=1.8584375
M=1.423828
Music and mathematics(contd.)
Time
Tempo: speed of Beats
Rhythm : relative time durations of notes determine
in musical language is called rhythm.
Literature survey
We carefully studied the papers from international publishers to understand
the concept of India classical music, depth study of ragas of Indian classical
music, neural network approach in music, pattern recognition, emotion
recognition and features of musical clips.
Soltani . K, Ainon R.N (2007) and Yongjin Wang, Ling Guan(2008) author
discussed deeply about the emotion recognition in speech signals.
Coutinho, E. &Cangelosi, A. (2010) : In this Book the author discusses
about a model capacity of prediction of human emotion while listening to
music. In this he added both psychoacoustic and physiological features for
the prediction of emotion.
Keshi Dai, Harriet J. Fell, and Joel MacAuslan. (2012) : The author shows
the good use of audio features to recognize emotion and the different
combination of emotion to recognize and find the accuracy of each
combination of emotions.
Literature survey(contd.)
Zhen-GuoChe, Tzu-An Chiang and Zhen-Hua Che. (2010) : In
this paper the author discusses the advantage and
characteristic of the genetic algorithm and back propagation
neural network to train a feed forward neural network to cope
with weight adjustment problem.
We found that very less models have been developed for raga
identification in Indian classical music.
From our survey we find that neural networks gives high
accuracy for identification when uses with multiple features.
Characterization of the Problem
Our aim is to better explain and explore the relationship between musical features and emotion.
The main problem initially identified was to extract musical features for each audio file.
Another problem was to develop a model for recognition of emotion from the musical clips.
Then to graphically visualize the performance of the process of recognition.
Objective of work
1. The main objective of the work is to spread the light on theemotion recognition in Indian classical music.
2. To understand the dependencies of features of music inrecognition process.
3. To understand the ANN concept for recognition process.
4. To analyze the error in each recognition process.
5. Visualize the performance of model. (for training, validationand testing)
Research methodology
Ground truth data with their features will lead the recognition
process with ANN.
Artificial neural network: ANN are computational models
inspired by the nervous system of brain which is capable of
machine learning as well as pattern recognition.
ANN when use with features of audio clips will be used for
emotion classification.
Model Formulation
Data Selection
Serial No. Target
emotions
Ragas in audio
clips
Number of
audio clips
Total
H Bhupali 20
A Bihag 28
1 P Desh 30 98
P Marwa 20
Y
Bageshree 24
S Bhairavi 19
2 A Bhimpalashi 20 98
D Deskar 15
Todi 20
Total 2 9 196 196
Pre-processing
Manually divide all samples into two parts named happy and
sad.
Convert our dataset to standard wave format 44100 Hz.
The audio clips are we take is of 30 second each.
Feature selection
Feature extraction involves analysis of speech signal.
We extracted 13 features for our work:
1. Roll Off 8. pulse clarity
2. Spread 9. mode
3. Zero cross 10. Entropy
4. Centroid 11. Brightness
5. RMS Energy 12. Probability of increment
6. Low Energy of two successive pitch
7. Event density 13. Probability of decrement
of two successive pitch
Root Mean Square Energy
Energy of the signal x can be computed simply by taking the root average
of the square of the amplitude, called root-mean-square (RMS):
Formula:
Happy labelled son have more RMS energy as compared to sad labelled
song.
Root Mean Square Energy(contd.)
S.N Happy Sad
1 .20566 .095603
2 .16613 .07754
Low energy
It is defined as the percentage of analysis windows that have
less RMS energy than the average RMS energy across the
texture window.
As an example, vocal music with silences will have large low-
energy value while continuous strings will have small low-
energy value
Low energy(Contd.)
S.N HAPPY S.N SAD
1 .341212 1 .51122
2 .59215 2 .51104
3 .46183 3 .48924
4 .48755 4 .51524
Entropy
The relation of entropy is with the emotion type of surprise.
If the probability is less for the occurrence of any event than
the expectation is more. And then the surprise element is
more.
The entropy system is based on this equation.
Entropy(contd.)
S.N Happy S.N sad
1 .85721 1 .83682
2 .85596 2 .83643
3 .85319 3 .83185
4 .85982 4 .83892
Zero Cross Rating
The zero-crossing rate is the rate of sign-changes along
a signal.
Where S is a signal of length T and the indicator function |
{} is 1 if its argument is true and 0 otherwise.
Zero cross Rating(contd.)
S.N Happy Sad
1 789.1929 878.5048
2 736.8308 873.3451
3 945.9796 776.6267
4 1045.694 855.9022
Pitch
Pitch is a perceptual property that allows the ordering
of sounds on a frequency related scale.
Pitches are compared as "higher" and "lower”.
Pitches are usually quantified as frequencies in cycles per
second.
With the use of pitch(pitch contour) we find two Landmark
features for our work. Which are Probability of increment in-
between two successive pitches and probability of decrement
in between two successive pitches.
Pitch(contd.)
S.N Happy_Prob_inc Happy_Prob_dec Sad_prob_ind Sad_Prob_dec
10.4403 0.4283 0.4137 0.3673
20.4295 0.4815 0.4133 0.391
30.406 0.4233 0.3317 0.3627
40.4147 0.4487 0.3881 0.3848
Event Density
It estimates number of note onset per second.
S.N Happy Sad
1 2.3038 1.5359
2 2.1369 2.0367
3 3.3723 1.5359
4 2.6711 0.70117
Centroid
The centroid is defined as the center of gravity of the
spectrum
It is calculated as the mean of the frequencies present in the
signal, with their magnitudes as the weights.
x(n) represents the weighted frequency value,
and f(n) represents the centre frequency.
Centroid(contd.)
S.N Happy Sad
11907.3599 1857.3502
22353.4706 1936.1145
32307.5883 1748.8305
41995.0818 1756.3829
Mode
It estimates the modality, i.e. Major vs. Minor.
Mode return value between -1 to +1 .
When closer to +1, the more major the given excerpts is
predicted to be, the closer the value is to -1, the more minor
the excerpt might be.
Mode(contd.)
S.N Happy Sad
1-0.18676 0.14946
2-0.06964 -0.0049425
4-0.16293 -0.051353
4-0.14412 0.061022
Pulse clarity
Pulse clarity is considered as a high-level musical dimension
that conveys how easily in a given musical piece, listeners can
Understand the underlying rhythmic.
S.N Happy S.N Sad
1 .40913 1 .11393
2 .30407 2 .31914
3 .18223 3 .18962
4 .14884 4 .18083
Pulse clarity(contd.)
Roll Off
Roll-off is the steepness of a transmission
function with frequency.
The roll-off refers to the rate at which the filter attenuates the
input frequency after the cut-off frequency point.
S.N Happy S.N Sad
1 4091.8974 1 3543.9783
2 4236.4054 2 3168.4931
3 4525.5054 3 4038.4008
4 4271.0604 4 4007.0263
Classification
Classification
Classify the data in two category.
As input we take list of features.
Classifier Used: ANN
Three process
1. Training 2. Validation 3. Testing
Artificial neural Network
It is a computational model inspired by the human nervoussystem.
ANN generally presented as interconnected neurons, whichcan compute values from input.
ANN can be defined based on three characteristic.
1. Architecture 2. Learning mechanism 3. activation function
Every system basically have 3 layer.
1. Input 2. Hidden layer 3. output
ANN(contd.)
Architecture:
Directed graph : each edge is assign an orientation.
Classification using: Multilayer Feed forward NN.
Learning method: Supervised Learning.
Algorithm used: back propagation.
ANN(contd.)
Steps of Back propagation algorithm.
1. Normalize all I/P values in between 0 to 1.
2. Number of hidden node= No of I/P node × No of O/P Node
2
3. V= weight between I/P and Hidden Nodes.
W= weight between hidden and O/P Nodes.
( weights are initialized to random values between -1 to +1).
4. Input and Out put of Input layer:
{O}I = {I}I
ANN(contd.)
Input to the hidden layer is computed by multiplying I/P
values to their corresponding weights.
{I}H = [V ] {O}I
Output of Hidden layer is computed using sigmoidal function.
ANN(contd.)
Input to output layer are computed by multiplying
corresponding weights.
{I}O = [W] {O}H
output of output layer is calculated as:
ANN(contd.)
Error can be calculated as:
{d} which is local gradient of node is calculated as:
ANN(Contd.)
[y] matrix is calculated :
[Y] = {O}H × d
Change in Weight:
ANN(contd.)
Error in Hidden layer :
And the new d is :
Calculate [x] matrix:
[x]= ×
ANN(contd.)
Change in weight of input-hidden layer:
Updated weights for next training:
The process will be repeated again and again until the error rate will reduce to very small.
RESULT and STUDY
Training Data: 70% of the total (138)
Validation data: 15 % of the total (39)
Testing data : 15% of total(39)
Recognition of two emotional states
1st experiment:
13 features taken as input.
Testing data: 39
Correctly classified for happy = 15 out of 17
Correctly classified for sad = 21 out of 22
Output
Input Happy Sad
Happy 15 2
Sad 21 1
Error Histogram:
Performance graph:
Performance analysing
Performance for 10 training and testing process. And we see that there is
little variation in accuracy in our model.
Test Number Performance Accuracy with LF Performance accuracy without LF
1. 96.687 62.646
2. 96.947 73.454
3. 98.159 65.09
4. 97.31 62.112
5. 98.054 63.878
6. 96.004 63.003
7. 96.808 68.433
8. 98.691 65.06
9. 82.258 70.112
10. 97.243 64.123
conclusion
We proposed a new model for automatic recognition of musical emotion which is
based on Artificial neural network .
We used multilayer feed forward neural network for classification and the
algorithm used is Back propagation.
A total of 13 features were extracted from each audio samples. We proposed two
new features ( probability of increment in between two successive pitch and
probability of decrement in between two successive pitches). Classification process
carried out using neural network toolbox in matlab.
A total of 10 experiments were done and each time the accuracy of recognition of
emotion was more the 90%. When we classify the model without our land mark
features the accuracy is reduced by 30%. The average accuracy we archived for
our model is 95.8161
features Without landmark feature With landmark feature
Accuracy 65.7911 95.8161
scope
Study of work is useful for them who have
difficulties in understanding ragas in India classical
music.
The study will be helpful in psychology science to
study the changes in brain when listen to Indian
classical music.
Our study is useful in medical science.
Future work
we plan to develop an automatic emotion recognizer for
Indian classical music with more emotions category for those
people who have difficulties in understanding and identifying
emotion in the Indian classical music.
We planed to develop model in which we will add some more
physiological features like heart beat rate, skin temperature
and brain signals. We feel that when we include physiological
features the accuracy will increase more for the system.
Publication
P. Prasoon and S. Chakraborty, “Raga Analysis
using Artificial Neural Network” - Communicated to
Computational Music Science (Book Series), Springer
as a research monograph.
Reference
[1] A. Srinivasan (2011). “Speech Recognition Using Hidden Markov Model”. Applied Mathematical Sciences, Vol. 5, 2011, no. 79, 3943 – 3948
[2] BjörnSchuller, Manfred Lang, Gerhard Rigoll (2002): "Multimodal Emotion Recognition in Audiovisual Communication", Proc. ICME 2002, 3rd International Conference on Multimedia and Expo, IEEE, vol. 1, pp. 745-748, Lausanne, Switzerland,
[3] Coutinho, E. &Cangelosi, A. (2010). “A Neural Network Model for the Prediction of Musical Emotions.”In S. Nefti-Meziani& J.G. Grey (Ed.). Advances in Cognitive Systems (pp. 331-368). London: IET Publisher. ISBN: 978-1849190756
Reference(contd.)
[4] Daniela and Bernd Willimek. (2013). Music and Emotions Research on the Theory of Musical Equilibration (die Strebetendenz-Theorie). Copyright © 2011 Daniela und Bernd Willimek
[5] Deryaozkan, Stefan scherer and Louis-philippemorency. (2013) “Step-wise emotion recognition using concatenated-HMM”, IEEE Transactions on Multimedia 15(2): 326-338
[6] Gaurav Pandey, chaitanya Mishra and Paul Ipe, “TANSEN: A system for automatic raga Identification “, (2003). PP.1350-1363. Indian International conference on AI.
[7]Jack H. David Jr.(1995) , “The Mathematics of Music”. Spring,Math1513.5097
Reference(contd.)
[8] Keshi Dai, Harriet J. Fell, and Joel MacAuslan. (2012) “Recognizing emotion in speech using neural network”
[9]Mohammad abd- alrahman mahmaoud Abushariah, Raja NoorAinon, RoziatiZainuddin, MoustafaElshafei, Othman OmranKhalifa: (2012) “ Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus” . Int. Arab J. Inf. Technol. 9(1): 84-93
[10] O. Lartillot and P. Toiviainen, “A matlab toolbox for musical feature extraction from audio,” in Proc. Digital Audio Effects (DAFx-07), Bordeaux, France, Sep. 10-15 2007
[11] Sandeep bagchee, (1998) “Nad: Understanding Raga music” Eshwar, 1st edition. ISBN-13: 978-8186982075
Reference(contd.)
[12] www.wekepedia.org
[13] www.paragchordia.com
[14] www.swarganga.org
[15] www.mathworks.in
[16] www.shadjamadhyam.com
[17] www.22shruti.com
[18] www.knowyourraga.com
[19]www.skeptic.skepticgeek.com
Reference(contd.)
[20] Yading Song, Simon Dixon, Marcus Pearce (2012) .”EVALUATION OF
MUSICAL FEATURES FOR EMOTION CLASSIFICATION”.13th
international society for Music Information Retrieval Conference (ISMIR).
[21]Yongjin Wang, Ling Guan(2008) : Recognizing Human Emotional State
From Audiovisual Signals. IEEE Transactions on Multimedia 10(4): 659-
668
[22] Zhen-GuoChe, Tzu-An Chiang and Zhen-Hua Che. (2010) .“Feed
forward neural networks training: A comparison between genetic algorithm
and back propagation learning algorithm”. International journal of
innovation and computing, information and control .volume 7
Thank you.....