RAGA VERIFICATION IN CARNATIC MUSIC USING ?· RAGA VERIFICATION IN CARNATIC MUSIC USING LONGEST COMMON…

  • Published on
    04-Sep-2018

  • View
    212

  • Download
    0

Transcript

  • RAGA VERIFICATION IN CARNATIC MUSIC USING LONGESTCOMMON SEGMENT SET

    Shrey DuttaDept. of Computer Sci. & Engg.Indian Institute of Technology

    Madrasshrey@cse.iitm.ac.in

    Krishnaraj Sekhar PVDept. of Computer Sci. & Engg.Indian Institute of Technology

    Madraspvkrajpv@gmail.com

    Hema A. MurthyDept. of Computer Sci. & Engg.Indian Institute of Technology

    Madrashema@cse.iitm.ac.in

    ABSTRACT

    There are at least 100 ragas that are regularly performedin Carnatic music concerts. The audience determines theidentity of ragas within a few seconds of listening to anitem. Most of the audience consists of people who are onlyavid listeners and not performers.

    In this paper, an attempt is made to mimic the listener.A raga verification framework is therefore suggested. Theraga verification system assumes that a specific raga isclaimed based on similarity of movements and motivic pat-terns. The system then checks whether this claimed raga iscorrect. For every raga, a set of cohorts are chosen. A ragaand its cohorts are represented using pallavi lines of com-positions. A novel approach for matching, called LongestCommon Segment Set (LCSS), is introduced. The LCSSscores for a raga are then normalized with respect to itscohorts in two different ways. The resulting systems anda baseline system are compared for two partitionings of adataset. A dataset of 30 ragas from Charsur Foundation 1is used for analysis. An equal error rate (EER) of 12% isobtained.

    1 IntroductionRaga identification by machine is a difficult task in Car-

    natic music. This is primarily because a raga is not definedjust by the solfege but by svaras (ornamented notes) [13].The melodic histograms obtained for the Carnatic musicare more or less continuous owing to the gamaka 2 ladensvaras of the raga [23]. Although the svaras in Carnaticmusic are not quantifiable, for notational purposes an oc-tave is divided into 12 semitones: S, R1, R2(G1), R3(G2),G3, M1, M2, P, D1, D2(N1), D3(N2) and N3. Each raga ischaracterised by atleast 5 svaras. Arohana and avarohanacorrespond to an ordering of svaras in the ascent and de-

    1 http://www.charsurartsfoundation.org2 Gamaka is a meandering of a svara encompassing other permissible

    frequencies around it.

    c Shrey Dutta, Krishnaraj Sekhar PV, Hema A. Murthy.

    Licensed under a Creative Commons Attribution 4.0 International Li-cense (CC BY 4.0). Attribution: Shrey Dutta, Krishnaraj Sekhar PV,Hema A. Murthy. Raga Verification in Carnatic Music Using LongestCommon Segment Set, 16th International Society for Music Informa-tion Retrieval Conference, 2015.

    scent of the raga, respectively. Ragas with linear orderingof svaras are referred to as linear ragas such as Mohonamraga (S R2 G3 P D2 S). Similarly, non linear ragas havenon linear ordering such as Ananda Bhairavi raga (S G2R2 G2 M1 P D2 P S). A further complication arises owingto the fact that although the svaras in different ragas maybe identical, the ordering can be different. Even if the or-dering is the same, in one raga the approach to the svaracan be different, for example, todi and dhanyasi.

    There is no parallel in Western classical music to ragaverification. The closest that one can associate with, iscover song detection [6, 16, 22], where the objective is todetermine the same song rendered by different musicians.Whereas, two different renditions of the same raga maynot contain identical renditions of the motifs.

    Several attempts have been made to identify ragas [24,7,8,12,14,26]. Most of these efforts have used small reper-toires or have focused on ragas for which ordering is notimportant. In [26], the audio is transcribed to a sequence ofnotes and string matching techniques are used to performraga identification. In [2], pitch-class and pitch-dyads dis-tributions are used for identifying ragas. Bigrams on pitchare obtained using a twelve semitone scale. In [18], the au-thors assume that an automatic note transcription systemfor the audio is available. The transcribed notes are thensubjected to HMM based raga analysis. In [12,25], a tem-plate based on the arohana and avarohana is used to deter-mine the identity of the raga. The frequency of the svarasin Carnatic music is seldom fixed. Further, as indicatedin [27] and [28], the improvisations in extempore enuncia-tion of ragas can vary across musicians and schools. Thisbehaviour is accounted for in [10, 11, 14] by decreasingthe binwidth for computing melodic histograms. In [14],steady note transcription along with n-gram models is usedto perform raga identification. In [3] chroma features areused in an HMM framework to perform scale indepen-dent raga identification, while in [4] hierarchical randomforest classifier is used to match svara histograms. Thesvaras are obtained using the Western transcription sys-tem. These experiments are performed on 4/8 differentragas of Hindustani music. In [7], an attempt is made toperform raga identification using semi-continuous Gaus-sian mixtures models. This will work only for linear ragas.Recent research indicates that a raga is characterised bestby a time-frequency trajectory rather than a sequence of

    605

  • Vocal Instruments TotalMale Female Violin Veena Saxophone FluteNumber of Ragas 25 27 8 3 2 2 30 (distinct)Number of Artists 53 37 8 3 1 3 105Number of Recordings 134 97 14 4 2 3 254Total Duration of Recordings 30 h 22 h 3 h 31 m 10 m 58 m 57 hNumber of Pallavi Lines 655 475 69 20 10 15 1244Average Duration of Pallavi Lines 11 s 8 s 10 s 6 s 6 s 8 s 8 s (avg.)Total Duration of Pallavi Lines 2 h 1 h 11 m 2 m 55 s 2 m 3 h

    Table 1. Details of the database used. Durations are given in approximate hours (h), minutes (m) or seconds (s).

    quantised pitches [5, 8, 9, 19, 20, 24]. In [19, 20], the samaof the tala (emphasised by the bol of tabla) is used to seg-ment a piece. The repeating pattern in a bandish in Hin-dustani Khyal music is located using the sama informa-tion. In [8, 19], motif identification is performed for Car-natic music. Motifs for a set of five ragas are defined andmarked carefully by a musician. Motif identification is per-formed using hidden Markov models (HMMs) trained foreach motif. Similar to [20], motif spotting in an alapanain Carnatic music is performed in [9]. In [24], a number ofdifferent similarity measures for matching melodic motifsof Indian music was attempted. It was shown that the in-tra pattern melodic motif has higher variation for Carnaticmusic in comparison with that of Hindustani music. It wasalso shown that the similarity obtained is very sensitive tothe measure used. All these efforts are ultimately aimedat obtaining typical signatures of ragas. It is shown in [9]that there can be many signatures for a given raga. To alle-viate this problem in [5], an attempt was made to obtain asmany signatures for a raga by comparing lines of compo-sitions. Here again, it was observed that the typical motifdetection was very sensitive to the distance measure cho-sen. Using typical motifs/signatures for raga identificationis not scalable, when the number of ragas under consider-ation increases.

    In this paper, this problem is addressed in a differentway. The objective is to mimic a listener in a Carnatic mu-sic concert. There are at least 100 ragas that are activelyperformed today. Most listeners identify ragas by refer-ring to the compositions with similar motivic patterns thatthey might have heard before. In raga verification, a ragasname (claim) and an audio clip is supplied. The machinehas to primarily verify whether the clip belongs to a givenraga or not.

    This task therefore requires the definition of cohorts fora raga. Cohorts of a given raga are the ragas which havesimilar movements while at the same time have subtle dif-ferences, for example, darbar and nayaki. In darbar raga,G2 is repeated twice in avarohana. The first is more or lessflat and short, while the second repetition is inflected. TheG2 in nayaki is characterised by a very typical gamaka.In order to verify whether a given audio clip belongs to aclaimed raga, the similarity is measured with respect to theclaimed raga and compared with its cohorts using a novelalgorithm called longest common segment set (LCSS). LCSS

    scores are then normalized using Z and T norms [1, 17].The rest of the paper is organised as follows. Section 2

    describes the dataset used in the study. Section 3 describesthe LCSS algorithm and its relevance for raga verifica-tion. As the task is raga verification, score normalisation iscrucial. Different score normalisation techniques are dis-cussed in Section 4. The experimental results are presentedin Section 5 and discussed in Section 6. The main conclu-sions drawn from the key results in this paper are discussedin Section 7

    2 Dataset usedTable 1 gives the details of the dataset used in this work.

    This dataset is obtained from the Charsur arts foundation 3 .The dataset consists of 254 vocal and instrument live record-ings spread across 30 ragas, including both target ragasand their cohorts. For every new raga that needs to be ver-ified, templates for the raga and its cohorts are required.

    2.1 Extraction of pallavi lines

    A composition in Carnatic music is composed of threeparts, namely, pallavi, anupallavi and caranam. It is be-lieved that the first phrase of the first pallavi line of a com-position contains the important movements in a raga. Abasic sketch is initiated in the pallavi line, developed fur-ther in the anupallavi and caranam [21] and therefore con-tains the gist of the raga. The algorithm described in [21]is used for extracting pallavi lines from compositions. De-tails of the extracted pallavi lines are given in Table 1. Ex-periments are performed on template and test recordings,selected from these pallavi lines, as discussed in greaterdetail in Section 5.

    2.2 Selection of cohorts

    Wherever possible 4-5 ragas are chosen as cohorts ofevery raga. The cohorts of every raga were defined by aprofessional musician. Professionals are very careful aboutthis as they need to ensure that during improvisation, theydo not accidentally sketch the cohort. Interestingly, as in-dicated by the musicians, cohorts need not be symmetric.A raga A can be similar in movement to a raga B, butraga B need not share the same commonality with ragaA. The identity of raga B may depend on phrases similarto raga A with some additional movement. For example,

    3 http://www.charsurartsfoundation.org

    606 Proceedings of the 16th ISMIR Conference, Malaga, Spain, October 26-30, 2015

  • to identify the raga Indolam, the phrase G2 M1 D1 N2 S isadequate, while Jayantashree raga requires the phrase G2M1 D1 N2 S N2 D1 P M1 G2 S.

    3 Longest common segment setIn raga verification, matching needs to be performed

    between two audio clips. The number of similar portionscould be more than one and spread across the entire clip.Therefore, there is a need for a matching approach that canfind these similar portions without issuing large penaltiesfor gaps in between them. In this section, a novel algorithmcalled Longest Common Segment Set is described whichattempts to do the same.

    Let X = hx1, , xm; xi 2 R; i = 1 mi be a se-quence of m symbols and Y = hy1, , xn; yj 2 R; j =1 ni be a sequence of n symbols where xi and yj are thetonic normalized pitch values in cents [9]. The similaritybetween two pitch values, xi and yj , is defined as

    sim(xi, yj) =

    (

    1 |xiyj |3(3st)3 if | xi yj |< 3st0 otherwise

    (1)

    where st represents a semitone in cents. Due to differentstyles of various musicians, an exact match between twopitch values contributing to the same svara cannot be ex-pected. Hence, in this paper a leeway of 3 semitones isallowed between pitch values. Musically two pitch values,3 semitones apart, cannot be called similar but this issueis addressed by the cubic nature of the similarity function.The function reaches its half value when the difference intwo symbols is approximately half a semitone. Therefore,higher similarity scores are obtained when the correspond-ing pitch values are at most half a semitone apart.

    A common subsequence ZXY in sequences X and Y isdefined as

    ZXY =

    8

    >

    >

    >

    >

    >

    >

    >

    :

    (xi1 , yj1), , (xip , yjp)

    1 i1 < < ip m1 j1 < < jp n

    simk=1, ,p

    (xik , yjk) sim(2)

    where sim is a threshold which decides the membershipof the symbol pair (xik , yjk) in a subsequence ZXY . Thevalue of sim is decided empirically based on the domainof the problem as discussed in Section 5. An example com-mon subsequence is shown with red color in Figure 1.

    3.1 Common segments

    Continuous symbol pairs in a common subsequence arereferred to as a segment. Two different types of segmentsare defined, namely hard and soft segments.

    Hard segment is a group of common subsequence sym-bols such that there are no gaps in between as shown ingreen color in Figure 1. Then a hard segment, starting with

    Soft segment running score

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    110

    Common subsequence

    Hard segments

    Soft segments

    450 500 550 600 650 700 750 800

    Sequence 2

    0

    200

    400

    600

    800

    1,000

    1,200

    Sequence 1

    DDDD

    200300400200

    300

    400

    Pitch

    Pitch

    Figure 1. An example of a common segment set betweentwo sequences representing the real data

    a symbol pair (xi, yj), must be of the form

    H lXiYj =

    8

    >

    :

    h(xi, yj), (xi+1, yj+1), , (xi+l, yj+l)i1 i < i + 1 < < i + l m1 j < j + 1 < < j + l n

    (3)where l + 1 represents the length of the hard segment. Thescore of the kth hard segment H lXik Yjk is defined as

    hc

    H lXik Yjk

    =l

    X

    d=0

    sim (xik+d, yjk+d) (4)

    Soft segment is a group of common subsequence sym-bols where gaps are permitted with a penalty. Therefore, asoft segment consists of one or more hard segments (shownwith blue color in Figure 1). The gaps between the hardsegments decides the penalty assigned. Thus, the score ofthe kth soft segment SXik Yjk , consisting of r hard seg-ments, is defined as

    sc

    SXik Yjk

    =rX

    s=1

    hc

    H lXik Yjk

    (5)

    where is the total number of gaps between r hard seg-ments and is the penalty for each gap. The number ofhard segments to be included in a soft segment is decidedby the running score of the soft segment. The runningscore of the soft segment increases during the hard segmentand decreases during the gap due to penalties as shown ingray-scale in Figure 1. During a gap, if the running scoredecreases below a threshold rc (or becomes almost whitein Figure 1) then that gap is ignored and all the hard seg-ments, encountered before it, are included into a soft seg-ment.

    Proceedings of the 16th ISMIR Conference, Malaga, Spain, October 26-30, 2015 607

  • 3.2 Common segment setAll segments together correspond to a segment set. The

    score of a segment set (ss) is defined as

    score (ssXY ) =

    Ppk=1 c

    ZXik Yjk

    2

    min(m, n)2(6)

    where p is the number of segments, c refers to the scorecomputed in either (4) or (5) and Z refers to a segment(hard or soft). This equation gives preference to longersegments. For example, in case 1, there are 10 segmentseach of length 2 and in case 2, there are 4 segments each oflength 5. In both the cases the total length of the...

Recommended

View more >