GCT731 Fall 2014 Topics in Music Technology - Music Information
Retrieval Tonal Harmony and Chord Recognition Juhan Nam 1
Slide 3
Outlines Tonal Harmony Tonality Critical Bands and Consonance
Perceptual Distance of Two Tones Chords and Scales Chroma and
Chroma Features Pitch helix and Chroma FFT-based approach
Filter-bank approach Key Estimation Chord Recognition 2
Slide 4
Introduction 3 Bachs Chorale Harmonization Jazz Real book Pop
Music
Slide 5
Tonality Tonal music has a tonal center called a key 12 keys
(C, C#, D, , B) Notes on a music scale has different roles given a
key note. E.g., C major scale A sequence of notes or chord
progressions provide certain degree of stability or instability
E.g., cadence (V-I, ), tension (sus2, sus4) Why the tonality is
formed? In other words, why we perceive different degrees of
stability or tension from notes? 4
Slide 6
Critical Bands and Consonance Critical Bands Bandwidth within
which two sinusoidal signals interact If one is less than a certain
level of the other, it is masked Otherwise, they create beats or
harshness Consonance and Dissonance If two sinusoidal tones are
within a critical band, they become dissonant. Otherwise, they are
consonant. A single tone can sound dissonant: e.g. impulse train 5
Deflation of basilar membrane for a 200 Hz wave Tonotopic
organization of the cochlea
Slide 7
Perceptual Distance of Two Tones Critical bands are a little
less than 3 semitones (minor 3 rd ) Two sinusoidal tones whose F0s
are within 3 ST become dissonant. Most dissonant when apart about
one quarter of the critical band. Critical bands become wider below
500 Hz; two low notes can sound dissonant. Consonance of two
harmonics tones Determined by how much two tones have
closely-located overtones within critical bands 6 First eight
harmonics of two tones a fifth apart
Slide 8
Consonance Rating of Intervals in Music Perceptual distance
between two notes are different from semi-tone distance between
them. 7
Slide 9
Chords The basic units of tonal harmony Triads, 7 th, 9 th, 11
th, Triads are formed by choosing three notes that make the most
consonant (or most harmonized) sounds This ends up with stacking up
major or minor 3rds 7 th, 9 th are obtained by stacking up 3rds
more. The quality of consonance becomes more sophisticated as more
notes are added Music theory is basically about how to make
tensions and resolve it with different quality of consonance 8
Slide 10
Scales in Tonal Harmony Major Scale Formed by spreading notes
from three major chords Minor scale Formed by spreading notes from
three minor chords (natural minor scale) Harmonic or melodic minor
scale can be formed by using both minor and major chords 9
Slide 11
Chord Recognition Identifying chord progression of tonal music
It is a very challenging task Chords are not explicit in music
Non-chord notes or passing notes Key change and chromaticism:
requires in-depth knowledge of music theory In audio, multiple
musical instruments are mixed Relevant: harmonically arranged notes
Irrelevant: percussive sounds (but can help detecting chord
changes) What kind of audio features can be extracted to recognize
chords in a robust way? 10
Slide 12
Pitch Helix The basic assumption in tonal harmony is that
octave-distance notes belong to the same pitch class No dissonance
among them As a result, there are 12 pitch class Shepard
represented the octave equivalence with pitch helix Chroma:
represents the inherent circularity of pitch organization Height:
naturally increase and have one octave apart for one rotation 11
Pitch Helix and Chroma (Shepard, 2001)
Slide 13
Chroma Chroma is independent of the height Shepard tone: single
pitch class in harmonics Constant rising and falling Chroma
contains the relative distribution of pitch classes and pitch
height is noisy variation in chord recognition Thus, chroma is
considered to be well-suited for analyzing harmony. 12 Optical
illusion stairsShepard tone
Slide 14
Chroma Features Chroma features are audio feature vectors that
contain the chroma characteristics Ideally, obtained by polyphonic
note transcription but too expensive In addition, as notes are more
harmonized, separating polyphonic notes become harder In practice,
chroma features are obtained by projecting all time-frequency
energy onto 12 pitch classes Used for not only for chord
recognition but also key estimation, segmentation, synchronization,
cover-song detection 13
Slide 15
Chroma Features: FFT-based approach Compute spectrogram Compute
mapping matrix Convert frequency to music pitch scale and get the
pitch class Set one to the corresponding pitch class and,
otherwise, set zero Adjust non-zeros values such that low-frequency
content have more weights 14
Slide 16
Improvements Blurring Intrinsic problem with STFT Solutions:
find amplitude peaks and use them only De-tuning Notes can be
deviated from reference tuning Compute 36 bin chroma features: add
two neighboring bins to each pitch class Use only a peak value
among the three bins per pitch class Normalization Divide the frame
chroma features by the local maximum or mean to regularize the
volume change 15
Slide 17
Chroma Features: Filter-bank approach Alternatively, a
filter-bank can be used to get a log-scale time- frequency
representation Center frequencies are arranged over 88 piano notes
band widths are set to have constant-Q and robust to +/- 25 cent
detune The outputs that belong to the same pitch class are wrapped
and summed. 16 (Muller, 2011)
Slide 18
Beat-Synchronous Chroma Features Make chroma features
homogeneous within a beat (Bartsch and Wakefield, 2001) 17 (From
Ellis slides)
Slide 19
Key Estimation Overview Estimate music key from music data One
of 24 keys: 12 pitch classes (C, C#, D,.., B) + major/minor General
Framework (Gomez, 2006) 18 G major Similarity Measure Chroma
Features Average Key Template Key Strength
Slide 20
Key Template Probe tone profile (Krumhansl and Kessler, 1982)
Relative stability or weight of tones Listeners rated which tones
best completed the first seven notes of a major scale. For example,
in C major key, C, D, E, F, G, A, B, what? 19 Probe Tone Profile -
Relative Pitch Ranking
Slide 21
Key Estimation Similarity by cross-correlation between chroma
features and templates Find the key that produces the maximum
correlation 20
Slide 22
Chord Recognition Estimate chords from music data Typically,
one of 24 keys: 12 pitch classes + major/minor Often, diminish
chords are added (36 chords) General Framework 21 Chords Decision
Making Audio/ Transform Chroma Features Chord Template or Models
Template Matching HMM, SVM
Slide 23
Template-Based Approach Use chord templates (Fujishima, 1999;
Harte and Sandler, 2005) and find the best matches Chord Templates
22 (from Bellos Slides)
Slide 24
Template-Based Approach Compute the cross-correlation between
chroma features and chord templates and select chords that have
maximum values 23 (from Bellos Slides)
Slide 25
Problems Template approach is too straightforward The binary
templates are hard assignments Temporal dependency of chords is not
considered The majority of tonal music have certain types of chord
progression The recognized chords are not smooth Some
post-processing (smoothing) is necessary 24
Slide 26
Hidden Markov Model (HMM) A probabilistic model of time series
Speech, gesture, DNA sequence, financial data, weather data,
Assumes that Time series data are generated from hidden states The
hidden states follows Markov model Learning-based approach Need
training data annotated with labels The labels usually correspond
to hidden states. 25
Slide 27
Markov Model A random variable q has N states (s 1, s 2, , s N
) and, at each time step, one of the states are chosen. The
probability distribution for the next state is determined only by
the current state (the first-order Markov model) Thus, joint
probability of a sequence of states is simplified as P(q=s 1, s 2,
s 3,, s N )=P(q 1 =s 1 )P(q 2 =s 2 |q 1 =s 1 ) P(q 3 =s 3 |q 2 =s 2
) P(q N =s N |q N-1 =s N-1 ) Example: chord recognition 26 F C G
P(q t+1 =C|q t =F) = 0.2 P(q t+1 =F|q t =F) = 0.6 P(q t+1 =G|q t
=F) = 0.2 P(q t+1 =C|q t =G) = 0.3 P(q t+1 =F|q t =G) = 0.1 P(q t+1
=G|q t =G) = 0.6 P(q t+1 =C|q t =C) = 0.7 P(q t+1 =F|q t =C) = 0.1
P(q t+1 =G|q t =C) = 0.2 St End
Slide 28
What can we do with a Markov Model ? Generate a chord sequence
E.g.) C C C C F F C C G G C C - (beat-wise) Evaluate if a chord
progression is more likely than others P(q=C,G,C) is more likely
than P(q=C,F,C) (P(q=C) = 1 P(q=C,G,C)= P(q 1 =C)P(q 2 =G|q 1
=C)P(q 3 =C|q 2 =G) = 1*0.2*0.3 =0.06, P(q=C,F,C)=P(q 1 =C) P(q 2
=F|q 1 =C) P(q 3 =C|q 2 =F) = 1*0.1*0.2 =0.02 Compute the
probability that the chord at time T is C (or F or G) Stupid
method: count all paths that reach C chord at time T: exponential
Clever method: use recursive induction P(q T =C)= P(q T =C|q T-1
=C)P(q T-1 =C)+ P(q T =C|q T-1 =F)P(q T-1 =F) + P(q T =C q T- 1
=G)P(q T-1 =G) Repeat this for P(q i =C), P(q i =F) and P(q i =G)
where i is T-1, T-2, T-3, 27
Slide 29
HMM for Chord Recognition What we observe are not chords but
audio features We are going to treat chords as hidden states Infer
chords from audio features (i.e. chroma features) Hidden Markov
Model Hidden states follow the Markov model Given a state, the
corresponding observation distribution is independent of previous
states or observations Model parameters Initial state
probabilities: P (q 0 ) Transition probability matrix: P (q j |q i
) or a ij (first-order Markov) Observation distribution given a
state: P (O|q j ) or b j (e.g. Gaussian) 28 q t-1 qtqt q t+1 o t-1
o t+1... F C G P (O|q t )
Slide 30
Training HMM for Chord Recognition Model parameters are trained
with labeled data If labeled every time frame easy to train but
expensive to obtain such data Transition probability: count
chord-to- chord transition and normalize them Observation
distribution: fit chroma features to single Gaussian or Gaussian
mixture model (GMM) for each chord If labeled without time
information Use the Baum-Welch: the forward- backward algorithm
(expectation maximization) 29 Chord Transition Probability Matrix
(Lee, 2008)
Slide 31
Evaluating HMM for Chord Recognition Find the most likely
sequence of hidden states given observations and HMM model
parameters Viterbi algorithm Define a probability variable
Initialization: Recursion: Termination: 30 (from start state) (to
end state)
Slide 32
The Viterbi Trellis 31 C F G St... C F G End t=1t=2 t=3 C F G C
F G C F G t=T-1 t=T Recall the Dynamic Programming!
Slide 33
Chord Recognition Result Trained with the Beatles data set (141
songs) Viterbi: 71.5%, maximum likelihood (without Markov model) :
44.9 % 32 trueViterbi ML (From Ellis E4896 practicals)
Slide 34
Demo Yanno: chord recognition for Youtube videos
http://yanno.eecs.qmul.ac.uk/ http://yanno.eecs.qmul.ac.uk/ 33
Slide 35
References P. R. Cook (Editor), Music, Cognition, and
Computerized Sound: An Introduction to Psychoacoustics, book, 2001
C. Krumhansl, Cognitive Foundations of Musical Pitch, 1990 M.A.
Bartsch and G. H. Wakefield,To catch a chorus: Using chroma-based
representations for audio thumbnailing, 2001 E. Gomez, P. Herrera,
Estimating The Tonality Of Polyphonic Audio Files: Cognitive Versus
Machine Learning Modeling Strategies, 2004. M. Mller and S. Ewert,
Chroma Toolbox: MATLAB Implementations for Extracting Variants of
Chroma-Based Audio Features, 2011. T. Fujishima, Real-time chord
recognition of musical sound: A system using common lisp music,
1999 A. Sheh and D. Ellis, Chord Segmentation and Recognition using
EM- Trained Hidden Markov Models, 2003. K Lee, M Slaney, Acoustic
Chord Transcription and Key Extraction from Audio Using
Key-Dependent HMMs Trained on Synthesized Audio, 2008 34