GCT634: Musical Applications of Machine Learning Rhythm …juhan/gct634/2018/slides/08... · 2018. 9. 14. · Overview of Automatic Music Transcription (AMT) •Predicting musical

GCT634: Musical Applications of Machine LearningRhythm Transcription

Dynamic Programming

Graduate School of Culture Technology, KAISTJuhan Nam

Outlines

• Overview of Automatic Music Transcription (AMT)- Types of AMT Tasks

• Rhythmic Transcription- Introduction- Onset detection- Tempo Estimation

• Dynamic Programming- Beat Tracking

Overview of Automatic Music Transcription (AMT)

• Predicting musical score information from audio- Primary score information is note but they are arranged based on rhythm,

harmony and structure- Equivalent to automatic speech recognition (ASR) for speech signals

Model

Beat

Key Chord

Structure

TempoOnsets

Types of AMT Tasks

• Rhythm transcription- Onset detection- Tempo estimation- Beat tracking

• Tonal analysis - Key estimation- Chord recognition

• Timbre analysis- Instrument identification

• Note transcription- Monophonic note- Polyphonic note- Expression detection

(e.g. vibrato, pedal)

• Structure analysis- Musical structure- Musical boundary / repetition

detection- Highlight detection

Types of AMT Tasks

• Rhythm transcription- Onset detection- Tempo estimation- Beat tracking

• Tonal analysis - Key estimation- Chord recognition

• Timbre analysis- Instrument identification

• Note transcription- Monophonic note- Polyphonic note- Expression detection

(e.g. vibrato, pedal)

• Structure analysis- Musical structure- Musical boundary / repetition

detection- Highlight detection

We will mainly focus on these topics!

Overview of AMT Systems

• Acoustic model- Estimate the target information given input audio (usually short segment)

• Musical knowledge- Music theory (e.g. rhythm, harmony), performance (e.g. playability)

• Prior/Lexical model- Statistical distribution of the score-level music information (e.g. chord

progression)

AcousticModel

Musical Knowledge

TranscriptionModel

Beat, Tempo

Key, Chords

Notes

Prior or Lexical Model

Audio-Level

Score-Level

Introduction to Rhythm

• Rhythm- A strong, regular, and repeated pattern of sound- Distinguish music from speech

• The most primitive and foundational element of music- Melody, harmony and other musical elements are arranged on the basis of

rhythm

• Human and rhythm- Human has innate ability of rhythm perception: heart beat, walking - Associated with motor control: dance, labor song

Introduction to Rhythm

• Hierarchical structure of rhythm- Beat (tactus): the most prominent level,

foot tapping rate- Division (tatum): temporal atom, eighth

or sixteenth- Measure (bar): the unit of rhythm

pattern (and also harmonic changes)

• Notations- Tempo: beats per minute, e.g. 90 bpm - Time signature: e.g. 4/4, 3/4, 6/8

[Wikipedia]

Human Perception of Tempo

• Mckinney and Moelant (2006)- Collect tapping data from 40 human subjects- Initial synchronization delay and anticipation (by tempo estimation)- Ambiguity in tempo: beat or its division ?

[D. Ellis’ e4896 slides]

Overview of Rhythm Transcription Systems

• Consists of several cascaded tasks that detect moments of musical stress (accents) and their regularity

Beat Tracking

Tempo Estimation

OnsetDetection

Musical Knowledge

Onset Detection

• Identify the starting times of musical events- Notes, drum sounds

• Types of onsets- Hard onsets: percussive sounds- Soft onsets: source-driven sounds (e.g. singing voice, woodwind, bowed

strings)

[M.Muller]

Example: Onset Detection

0 1 2 3 4 5 6−1

−0.5

0

0.5

1

time [sec]

ampl

itude

?

“Eat (꺼내먹어요) ”Zion.T

Onset Detection Systems

• Onset detection function (ODF)- Instantaneous measure of temporal change, often called “novelty” function- Types: time-domain energy, spectral or sub-band energy, phase difference

• Decision algorithm- Ruled-based approach- Learning-based approach

DecisionAlgorithm

Onset Detection Function

AudioRepresentations

(Feature Extraction) (Classifier)

Onset Detection Function (ODF)

• Types of ODFs- Time-domain energy- Spectral or sub-band energy- Phase difference

Time-Domain Onset Detection

• Local energy - Usually have high energy at onsets - Effective for percussive sounds

• Various versions- Frame-level energy

- Half-wave rectification

𝑂𝐷𝐹(𝑛) = 𝐸 𝑛 = ) 𝑥 𝑛 +𝑚 𝑤(𝑚) ./

012/

𝑂𝐷𝐹(𝑛) = 𝐻(𝐸 𝑛 + 1 − 𝐸 𝑛 )

𝐻 𝑟 =𝑟 + 𝑟2

= 8𝑟, 𝑟 ≥ 00, 𝑟 < 0

0 1 2 3 4 5 6−1

−0.5

0

0.5

1

time [sec]

ampl

itude

Waveform

0 1 2 3 4 5 60

5

10

15

20

time [sec]

OD

F

0 1 2 3 4 5 60

2

4

6

8

10

time [sec]

OD

F

Spectral-Based Onset Detection

• Spectral Flux- Sum of the positive differences from

log spectrogram- ODF changes depending on the

amount of compression 𝜌

time [sec]

frequ

ency−k

Hz

1 2 3 4 50

0.5

1

1.5

2

x 104

0 1 2 3 4 50

100

200

300

400

time [sec]

OD

F𝑂𝐷𝐹(𝑛) = ) 𝐻(𝑌 𝑛 + 1, 𝑘 − 𝑌 𝑛, 𝑘 )/2A

B1C

𝑌 𝑛, 𝑘 = log 1 + 𝜌 𝑋 𝑛, 𝑘 𝑋 𝑛, 𝑘 : STFT

Phase Deviation

• Sinusoidal components of a note is continuous while the note is sustained- Abrupt change in phase means that there may be a new event


Deviation from the steady-statefor all frequency bins

ϕk (n)−ϕk (n−1) ≈ϕk (n−1)−ϕk (n− 2) Phase continuation (e.g. during sustain of a single note)

Δϕk (n) =ϕk (n)− 2ϕk (n−1)+ϕk (n− 2) ≈ 0

ζ p =1N

Δϕk (n)k=1

N

∑

Post-Processing

• DC removal - Subtract the mean of ODF

• Normalization- Scaling level of ODF

• Low-pass filtering- Remove small peaks

• Down-sampling- For data reduction

Low-pass Filtering (Solid line)

(Tzanetakis, 2010)

Onset Decision Algorithm

• Rule-based Approach: peak detection rule- Peaks above thresholds are determined as onsets- The thresholds are often adaptively computed from the ODF- Averaging and median are popular choices to compute the thresholds

threshold =α +β ⋅median(ODF) α : offset,β : scaling

Median with window size 5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5time [sec]

0

50

100

150

200

250

300

350

OD

F

ODFThreshold

Challenging Issue in Onset Detection: Vibrato

Onset detection using spectral flux

SuperFlux

• A state-of-the-art rule-based onset detection function- S. Bock et al., “Maximum Filter Vibrato Suppression For Onset Detection”,

DAFx, 2013

• Step1: log-spectrogram- Make harmonic partials have the same depth of vibrato contour

• Step2: max-filtering - Take the maximum in a window on the frequency axis- The vibrato contours become thicker

𝑌 𝑛,𝑚 = log 1 + 𝑋 𝑛, 𝑘 L 𝐹 𝑘,𝑚 𝑋 𝑛, 𝑘 : STFT

𝑌0MN 𝑛,𝑚 = max(𝑌 𝑛,𝑚 − 𝑙:𝑚 + 𝑙 )

SuperFlux

• A state-of-the-art rule-based onset detection function - S. Bock et al., “Maximum Filter Vibrato Suppression For Onset Detection”,

DAFx, 2013

• Step1: log-spectrogram- Make harmonic partials have the same depth of vibrato contours

• Step2: max-filtering - Take the maximum in a window on the frequency axis- The vibrato contours become thicker

𝑌 𝑛,𝑚 = log 1 + 𝑋 𝑛, 𝑘 L 𝐹 𝑘,𝑚 𝑋 𝑛, 𝑘 : STFT

𝑌0MN 𝑛,𝑚 = max(𝑌 𝑛,𝑚 − 𝑙:𝑚 + 𝑙 )

SuperFlux

Log-spectrogram

Max-filteredLog-spectrogram

SuperFlux

• Step3: Super-flux- Take the difference with some distance- Assumption: frame-rate is high in onset detection (i.e. small hop size)

• Step 4: pick-picking- 1) 𝑆𝐹∗(𝑛) = max(𝑆𝐹∗ 𝑛 − 𝑝𝑟𝑒0MN: 𝑛 + 𝑝𝑜𝑠𝑡0MN )- 2) 𝑆𝐹∗(𝑛) ≥ mean(𝑆𝐹∗ 𝑛 − 𝑝𝑟𝑒M\]: 𝑛 + 𝑝𝑜𝑠𝑡M\] ) + 𝛿- 3) 𝑛 − 𝑛_`a\bcde2cfeag > 𝑐𝑜𝑚𝑏𝑖𝑛𝑎𝑡𝑖𝑜𝑛𝑤𝑖𝑑𝑡ℎ

𝑆𝐹∗(𝑛) = ) 𝐻(𝑌 𝑛 + 𝜇, 𝑘 − 𝑌 𝑛, 𝑘 )/2A

B1C𝜇 = max(1,

(𝑁2 − min 𝑛 𝑤 𝑛 > 𝑟 )ℎ

+ 0.5

(0 ≤ 𝑟 ≤ 1)

SuperFlux

Peak-picking

Max-filteredLog-spectrogram

Tempo Estimation

• Estimate a regular time interval between beats- Tempo is a global attribute of a song: e.g. bpm or mid-tempo song

• Tempo often changes within a song - Intentionally: e.g. dramatic effect: Top 10 tempo changes- Unintentionally: e.g. re-mastering, live performance

• There are also local tempo changes: e.g. rubato

Tempo Estimation Methods

• Auto-Correlation- Find the periodicity as used in pitch detection

• Discrete Fourier Transform- Use DFT over ODF and find the periodicity

• Comb-filter Banks- Leverage the “oscillating nature” of musical beats

Auto-Correlation

• ACF is a generic method to detect periodicity of a signal- Thus, this can be applied to ODF to find a dominant period that may

correspond to tempo- The ACF shows the dominant peaks that indicate dominant tempi

0 1 2 3 4 5−1

0

1

2

3 x 105

time [sec]O

DF

0 1 2 3 4 50

100

200

300

400

time [sec]

OD

F

Onset Detection Function (spectral flux) Auto-Correlation

Tempo Estimation Using Tempo Prior

• Tempo is estimated by multiplying the prior with the auto-correlation (observation)- The auto-correlation corresponds to a likelihood function- Tempo prior can be calculated from beat annotations of a dataset- The distribution fits to a log-normal distribution well

Histogram of beats from a dataset


(Klapuri, 2003)

Beat Spectrum

• Leverage the repetitive nature of music

• Algorithm- Step1: compute cosine distance between two

frames of magnitude spectrogram

- Step 2: sum the elements on the diagonals

(Foote, 2001)

𝑆(𝑖, 𝑗) =𝑉b L 𝑉b𝑉b L 𝑉w

𝐵(𝑙) =)𝑆(𝑘, 𝑘 + 𝑙)�

B

Beat Spectrum

• A more robust version can be obtained from the 2D auto-correlation of the similarity matrix

• The final beat spectrum is derived by summing over one axis- The left plot shows five beats and a triplet

within a beat.

• “Beat spectrogram” can be also obtained by successive beat spectra

𝐵(𝑘, 𝑙) =)𝑆(𝑖, 𝑗) L 𝑆(𝑖 + 𝑘, 𝑗 + 𝑙)�

b,w

(Foote, 2001)

Five beats and a triplet within a beat

Tempogram

• Algorithm- Step 1: compute ODF from the half-wave

rectified spectral flux- Step2: obtain the frequency and phase

that provide the maximum correlation with for the ODF and form a local sinusoidal kernel

- Step 3: accumulate the successive local sinusoidal kernels to form a PLP curve

- Step 4: take DFT or auto-correlation(Grosche, 2009)

k(m) = w(m− n)cos(2π (ŵm− ϕ̂ ))

• Modeling the onset function using sinusoid as predominant local periodicity (PLP)

Tempogram

• Cyclic Tempogram- Accumulate the tempogram

for integer multiples of a tempo (up to four octaves)

- Conceptually similar to “Chromagram”

(Grosche, 2011)

Comb-Filter Banks

• Also called resonant filter banks- Comb filter equation

• Builds up rhythmic evidences (by anticipation?)

(Klapuri, 2006)

𝑦 𝑛 = 𝑥 𝑛 + 𝛼𝑦 𝑛 − 𝜏

Sub-band Resonant Filter Banks

• Algorithm- A sub-band filter bank as a front-end

processing - Parallel ODFs for 6 bands- 150 resonators for each band and all

possible tempo values (60 - 240 bpm)

- Pick up the delay that provides the highest peak as a tempo

(Scheirer, 1998)

Beat Tracking

• Estimate the position of beats in music - Usually a subset of detected onsets selected by the tempo

Beat Tracking by the Resonator Model

• Once the resonator model chooses the tempo that returns the highest peaks, the output produces a sequence of resonated peaks- They correspond to the beats

(Scheirer, 1998)

• Find the optimal “hopping” path on music (Ellis, 2007)

- 𝐶 𝑡b : cost of the path 𝑡b- 𝑂 𝑡b : onset strength function (i.e. ODF)

- 𝐹(∆𝑡, 𝑇): tempo (𝑇) consistency score: e.g. 𝐹 ∆𝑡, 𝑇 = −(𝑙𝑜𝑔 ∆g).

Beat Tracking by Dynamic Programming

𝐶 𝑡b =)𝑂 𝑡b

b1A

+ 𝛼)𝐹 𝑡b − 𝑡b2A, 𝑇

b1.

. . .

1

Finding the Minimum-Cost-Path

• Naïve approach- Find all paths from A to K and calculate the cost for each, and choose the

path that has the minimum cost.- As the number of nodes increases, the number of possible paths increases

exponentially

A C

B

D

E

F

G

H

24

3

36

2

42

2

32

5

4 12

33

1

53

I

J

K7

45

6

3

3

5

74

3 23

2

Dynamic Programming (DP)

• Observation- Say the minimum-cost-path passes by a node p, - What is the minimum-cost-path from A to p ?- It is just a sub-path of the minimum-cost-path from A to K.- Thus, we don’t have to compute the cost from scratch; we can use the cost

computed from the previous nodes.

A C

B

D

E

F

G

H

24

3

36

2

42

2

32

5

4 12

33

1

53

I

J

K7

45

6

3

3

5

74

3 23

2

Dynamic Programming (DP)

• The minimum cost is computed by the following equation:

• The minimum-cost-path can be found by tracing back the computation

Ck ( j) =Ok ( j)+mini {Ck−1(i)+ cij}Ck ( j)Ok ( j)

: cost up to node j: local cost at node j

cij : transition cost from i to j

A C

B

D

E

F

G

H

24

3

36

2

42

2

32

5

4 12

33

1

53

I

J

K7

45

6

3

3

5

74

3 23

2

Applying DP to Beat Tracking

• To optimize:

- Define 𝐶∗ 𝑡 as best score up to time 𝑡 and compute it for every 𝑡

- Also, store the time that returns maximum score 𝑃 𝑡

- At the end of the sequence, traceback 𝑃 𝑡 , which returns the best path 𝑡b

𝐶 𝑡b =)𝑂 𝑡b

b1A

+ 𝛼)𝐹 𝑡b − 𝑡b2A, 𝑇

b1.

𝐶∗ 𝑡 = 𝑂 𝑡 + max{𝛼𝐹 𝑡 − 𝜏, 𝑇 + 𝐶∗ 𝜏 }

𝑃 𝑡 = argmax

{𝛼𝐹 𝑡 − 𝜏, 𝑇 + 𝐶∗ 𝜏 }

0 1 2 3 4 50

100

200

300

400

time [sec]

ODF

𝑡𝜏

𝐶∗ 𝑡

Example of DP to Beat Tracking

References

• E. Scheirer, “Tempo and Beat Analysis of Acoustic Musical Signals”, 1998• J. Foote and S. Uchihashi, “The Beat Spectrum: A New Approach to Rhythm

Analysis”, 2001• G. Tzanekatis, “Musical Genre Classification of Audio Signals”, 2002• A. Klapuri, “Analysis of the Meter of Acoustic Musical Signals”, 2006• P. Grosche and M. Muller, “Computing Predominant Local Periodicity

Information In Music Recordings”, 2009• P. Grosche and M. Muller, “Cyclic Tempogram – A Mid-Level Tempo

Representation For Music Signals”, 2010• D. Ellis, “Beat Tracking by Dynamics Programming”, 2007• S. Bock and G. Widmer, “Maximum Filter Vibrato Suppression For Onset

Detection”, 2013

Documents

GCT634: Musical Applications of Machine Learning Rhythm …juhan/gct634/2018/slides/08... · 2018. 9. 14. · Overview of Automatic Music Transcription (AMT) •Predicting musical