A STEP TOWARDS AUTOMATIC IDENTIFICATION OF INFLUENCE: LICK ... · A STEP TOWARDS AUTOMATIC IDENTIFICATION OF INFLUENCE: LICK DETECTION IN A MUSICAL PASSAGE Anis Haron Media Arts &

A STEP TOWARDS AUTOMATIC IDENTIFICATION OF INFLUENCE:LICK DETECTION IN A MUSICAL PASSAGE

Anis HaronMedia Arts & Technology, University of California Santa Barbara

ABSTRACT

Using techniques borrowed from speech recognition, thispaper presents an algorithm to detect known licks in a mu-sical passage, in which its results could be used to identifywho influenced a given improviser.

A lick can be understood as a known phrase, or a rela-tively short pattern consisting of a series of notes used insolos or melodic lines. Jazz trumpeter Clark Terry men-tioned, the art to learning jazz (improvisation) consists ofimitation, assimilation, and innovation. Ethnomusicolo-gist, John Murphy explained in a paper on jazz improvi-sors and their precursors, ”by invoking and reworking mu-sic that is familiar to the audience, the jazz performer in-volves the audience in the process and makes it meaningfulfor those who recognize the sources.” As an analytical per-spective on the concept of influence, Murphy went on withextensive analysis of several improvisations by saxophon-ist Joe Henderson, in which he quoted and transformed amelody segment (lick) by Charlie Parker.

1. ALGORITHM

1.1 Licks

For this demo we have arbitrarity choosen a lick consistingof a sequence of seven notes. The sequence starts on thetonic, followed by the major second, minor third and per-fect fourth. It is then followed by descending three semi-tones from the perfect fourth back to major second, fol-lowed by a minor seventh, and finally resolving back totonic. In the key of C major, this sequence translates tothe note sequence C, D, Eb, F, D, Bb, C. It is important tonote that a lick could be played in any key, at any speedand possibly with some degree of rhythmic variation andornamentation.

1.2 Source separation

Since we will be implementing our algorithm on audiofiles, the first step is to separate unwanted sound sources.For example, if we were to analyze the singer (singer im-provises), sources from the accompanying musicians should

c© Anis Haron.Licensed under a Creative Commons Attribution 4.0 International Li-cense (CC BY 4.0). Attribution: Anis Haron. “A step towards au-tomatic identification of influence: Lick detection in a musical passage”,15th International Society for Music Information Retrieval Conference,2014.

Figure 1. Algorithm overview.

be excluded or the very least, suppressed as much as possi-ble. For this task, we will use Flexible Audio Source Sep-aration Toolbox (FASST) developed by Ozerov et. al. [1].

The source we will be using in this demonstration isa 15 seconds excerpt from Melody Gardot’s Baby I’m aFool, taken from 1’ 55” to 2’ 10” mark in the song. Thisparticular phrase is chosen as the lick we are looking forwas quoted in this phrase. Futhermore, the song carries acalm mood hence its sparse accompaniment, making it anideal candidate for good source separation results in ourdemo.

1.3 Pitch extraction

For pitch extraction, we will use Essentia, an open sourceaudio analysis library, developed by Bogdanov et. al. ofthe Music Technology Group at Universitat Pompeu Fabra[2].

Firstly, an equal loudness filter is applied to our signal.As the human ear does not perceive all frequencies as thesame loudness, the equal loudness filter compensates thisknown issue by applying an inverted approximation of theequal-loudness contour. A hanning window is applied tochunks of the signal, zero padded to increase frequencyresolution, and its magnitude spectrum calculated. YINalgorithm is then applied to the magnitude spectrum, re-turning a list of pitch detected along with a list of valuesfor confidence level. Next, we run a peak detection al-gorithm to detect local maximas. We arbitrarily use themean of confidence level values as the thresholding levelfor peak detection. This way, only pitch which confidenceare above average are considered correct, ignoring detected

pitches with below average confidence value. It shouldbe noted that the chosen threshold value might have to betweaked for use with other sources. Our peak detection al-gorithm returns position (time) along with the amplitude(frequency) of the peaks. Lastly, we convert our list of fre-quencies to MIDI note, rounding up to the nearest number.

1.4 Removal of duplicates

The next step is to remove any duplicates in our note list.This step is necessary as our algorithm runs in an over-lapped, windowed segment. As our window moves acrossthe signal, the algorithm is bound to repeat detection of thesame note, provided it lasts longer than our window. Timecorresponding to note duplicates should also be removed.

1.5 Pitch difference as an additional feature

As mentioned earlier, we know a lick could be played inany key, therefore it makes sense to use pitch difference(interval size) as an additional feature. In this demonstra-tion, the unit for pitch difference will be in semitones.

First we will need to transpose our MIDI notes into theirrespective pitch classes, removing octaves. Pitch differ-ence is calculated by subtracting current note (pitch classvalue) with the previous note. This is done for both thepassage we are analyzing and our known lick.

1.6 Hidden Markov Models (HMM)

HMM is a statistical technique used in various fields, mostnotably in speech [3], hand-writing and gesture recogni-tion. HMM is an unsupervised, generative probabilisticmodel whereby a sequence of observable variables (ourknown lick in this instance) is generated by a sequence ofunobservable states. For this task, we will use scikit-learn,an easy to use, open source machine learning library forPython programming language.

At this point, we need to estimate the most likely seriesof states that could generate our observed data. The twofeatures used are our observed data. Using scikit’s ’fit’method, we estimate the most likely hidden states for eachof our observed data, accumulating to a pair of models inthis demonstration. One for the sequence of notes in pitchclass, and another for the sequence of pitch difference.

Next, we compute the probability that the note sequencein the passage we are analyzing contains smaller sequencesthat could be generated using our two models. We willrefer to this process as the evaluation. For this task, wefirst define a function to return ’n’ number of top scores,arbitrarily chosen to be n=3 in this demonstration.

1.6.1 Evaluating the note sequence model and pitchdifference sequence model

In this step we will be evaluating our known lick’s esti-mated note sequence model against the passage we are an-alyzing (Melody Gardot’s Baby I’m a Fool, 15 seconds ex-cerpt). Note that we have already transposed each notein the passage we are analyzing into its respective pitchclasses earlier.

To evaluate, we first apply a rectangular window withthe length of our known lick note sequence (7 in this in-stance), to the passage we’re analyzing. Note sequenceof the analyzed passage inside the rectangular window arecompared against our known lick note sequence model us-ing scikit’s ’score’ method. The rectangular window isthen iterated at every point up to the end of sequence. Atevery iteration, a probability score under the compared modelis recorded. Lastly, we list the location of three highestscores, as we have chosen n=3 earlier.

We will also be evaluating our estimated pitch differ-ence sequence model against the passage we are analyz-ing. Since pitch difference would remain the same evenwhen the observed passage is transposed to any key, thisadditional feature is to assist in filtering out false positives.Just as before, evaluation for pitch difference sequence iscarried out, listing location of three highest scores.

1.7 Distance computation and estimation of licklocation in the analyzed passage

At this stage, we now have the top three location pointsfor sequences that could be generated with our known licknote sequence model, and pitch difference model. We nowcalculate the spatial distance between the two list of vec-tors. The nearest distance between the two list would sug-gest the estimated location of our known lick in the ana-lyzed passage.

In this demonstration, the estimated location of our knownlick is estimated at around the 4.5 seconds mark in our 15seconds excerpt. Since our 15 seconds excerpt started atthe 1’55” mark, the final position of the lick is estimated atroughly around the 2’00” mark of the song. 1

2. CONCLUSION

The algorithm presented have demonstrated the possibil-ity of locating a known phrase embedded in a given mu-sical passage. It suggests the possibility of tracing the in-fluence of an improviser given the availability a relevantdatabase. Additionally, to make the algorithm more ro-bust, additional features should be extracted and studied toincrease the probability of successful detection.

3. REFERENCES

[1] Alexey Ozerov, Emmanuel Vincent, Frdric Bimbot:”A general flexible framework for the handling ofprior information in audio source separation,” IEEETransactions on Audio, Speech and Signal Processing,pp. 1118–1133, 2012.

[2] Bogdanov, Dmitry, et al: ”Essentia: An audio analysislibrary for music information retrieval,” Proceedings ofISMIR, 2013.

[3] Lawrence R. Rabiner: ”A tutorial on hidden markovmodels and selected applications in speech recogni-tion,” Proceedings of the IEEE, pp. 257–286, 1989.

1 Melody Gardot’s Baby I’m a Fool is available for streaming onMelody Gardot’s VEVO channel in YouTube.

Documents

A STEP TOWARDS AUTOMATIC IDENTIFICATION OF INFLUENCE: LICK ... · A STEP TOWARDS AUTOMATIC IDENTIFICATION OF INFLUENCE: LICK DETECTION IN A MUSICAL PASSAGE Anis Haron Media Arts &