47
Meinard Müller, Christof Weiß Audio Features International Audio Laboratories Erlangen [email protected], [email protected] Tutorial T3 A Basic Introduction to Audio-Related Music Information Retrieval

2017 MuellerWeiss ISMIR AudioFeatures

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2017 MuellerWeiss ISMIR AudioFeatures

Meinard Müller, Christof Weiß

Audio Features

International Audio Laboratories [email protected], [email protected]

Tutorial T3A Basic Introduction to Audio-Related Music Information Retrieval

Page 2: 2017 MuellerWeiss ISMIR AudioFeatures

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de

Page 3: 2017 MuellerWeiss ISMIR AudioFeatures

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de

Page 4: 2017 MuellerWeiss ISMIR AudioFeatures

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de

Page 5: 2017 MuellerWeiss ISMIR AudioFeatures

Chapter 2: Fourier Analysis of Signals

2.1 The Fourier Transform in a Nutshell2.2 Signals and Signal Spaces2.3 Fourier Transform2.4 Discrete Fourier Transform (DFT)2.5 Short-Time Fourier Transform (STFT)2.6 Further Notes

Important technical terminology is covered in Chapter 2. In particular, weapproach the Fourier transform—which is perhaps the most fundamental toolin signal processing—from various perspectives. For the reader who is moreinterested in the musical aspects of the book, Section 2.1 provides a summaryof the most important facts on the Fourier transform. In particular, the notion ofa spectrogram, which yields a time–frequency representation of an audiosignal, is introduced. The remainder of the chapter treats the Fourier transformin greater mathematical depth and also includes the fast Fourier transform(FFT)—an algorithm of great beauty and high practical relevance.

Page 6: 2017 MuellerWeiss ISMIR AudioFeatures

Chapter 3: Music Synchronization

3.1 Audio Features3.2 Dynamic Time Warping3.3 Applications3.4 Further Notes

As a first music processing task, we study in Chapter 3 the problem of musicsynchronization. The objective is to temporally align compatiblerepresentations of the same piece of music. Considering this scenario, weexplain the need for musically informed audio features. In particular, weintroduce the concept of chroma-based music features, which captureproperties that are related to harmony and melody. Furthermore, we study analignment technique known as dynamic time warping (DTW), a concept that isapplicable for the analysis of general time series. For its efficient computation,we discuss an algorithm based on dynamic programming—a widely usedmethod for solving a complex problem by breaking it down into a collection ofsimpler subproblems.

Page 7: 2017 MuellerWeiss ISMIR AudioFeatures

Fourier Transform

Sinusoids

Time (seconds)

Time (seconds)

Idea: Decompose a given signal into a superpositionof sinusoids (elementary signals).

Signal

Page 8: 2017 MuellerWeiss ISMIR AudioFeatures

Each sinusoid has a physical meaningand can be described by three parameters:

Fourier Transform

Sinusoids

Time (seconds)

Page 9: 2017 MuellerWeiss ISMIR AudioFeatures

Each sinusoid has a physical meaningand can be described by three parameters:

Fourier Transform

Sinusoids

Time (seconds)

Time (seconds)

Signal

Page 10: 2017 MuellerWeiss ISMIR AudioFeatures

Each sinusoid has a physical meaningand can be described by three parameters:

Fourier Transform

Frequency (Hz)

Fourier transform

1 2 3 4 5 6 7 8

1

0.5

Time (seconds)

Signal

Page 11: 2017 MuellerWeiss ISMIR AudioFeatures

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: Superposition of two sinusoids

Page 12: 2017 MuellerWeiss ISMIR AudioFeatures

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: C4 played by piano

Page 13: 2017 MuellerWeiss ISMIR AudioFeatures

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: C4 played by trumpet

Page 14: 2017 MuellerWeiss ISMIR AudioFeatures

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: C4 played by violin

Page 15: 2017 MuellerWeiss ISMIR AudioFeatures

Fourier Transform

Frequency (Hz)

Example: C4 played by flute

Time (seconds)

Page 16: 2017 MuellerWeiss ISMIR AudioFeatures

Fourier Transform

Frequency (Hz)

Example: C-major scale (piano)

Time (seconds)

Page 17: 2017 MuellerWeiss ISMIR AudioFeatures

Fourier Transform

Signal

Fourier representation

Fourier transform

Page 18: 2017 MuellerWeiss ISMIR AudioFeatures

Fourier Transform

Tells which frequencies occur, but does not tell when the frequencies occur.

Frequency information is averaged over the entiretime interval.

Time information is hidden in the phase

Signal

Fourier representation

Fourier transform

Page 19: 2017 MuellerWeiss ISMIR AudioFeatures

Fourier Transform

Frequency (Hz)

Time (seconds) Time (seconds)

Frequency (Hz)

Page 20: 2017 MuellerWeiss ISMIR AudioFeatures

Idea (Dennis Gabor, 1946):

Consider only a small section of the signal for the spectral analysis

→ recovery of time information

Short Time Fourier Transform (STFT)

Section is determined by pointwise multiplication of the signal with a localizing window function

Short Time Fourier Transform

Page 21: 2017 MuellerWeiss ISMIR AudioFeatures

Short Time Fourier Transform

Frequency (Hz)

Time (seconds)

Page 22: 2017 MuellerWeiss ISMIR AudioFeatures

Short Time Fourier Transform

Frequency (Hz)

Time (seconds)

Page 23: 2017 MuellerWeiss ISMIR AudioFeatures

Short Time Fourier Transform

Frequency (Hz)

Time (seconds)

Page 24: 2017 MuellerWeiss ISMIR AudioFeatures

Short Time Fourier Transform

Frequency (Hz)

Time (seconds)

Page 25: 2017 MuellerWeiss ISMIR AudioFeatures

Definition

Signal

Window function ( , )

STFT

with for

Short Time Fourier Transform

Page 26: 2017 MuellerWeiss ISMIR AudioFeatures

Short Time Fourier TransformIntuition:

Freq

uenc

y(H

z)

Time (seconds)

4

8

12

1 2 3 4 5 60

0

is “musical note” of frequency ω centered at time t Inner product measures the correlation

between the musical note and the signal

Page 27: 2017 MuellerWeiss ISMIR AudioFeatures

Time-Frequency Representation

Time (seconds)

SpectrogramFr

eque

ncy

(Hz)

Page 28: 2017 MuellerWeiss ISMIR AudioFeatures

Time-Frequency Representation

Time (seconds)

Freq

uenc

y(H

z)Spectrogram

Frequency (Hz)

Page 29: 2017 MuellerWeiss ISMIR AudioFeatures

Time-Frequency Representation

Time (seconds)

Freq

uenc

y(H

z)

Intensity

Spectrogram

Frequency (Hz)

Page 30: 2017 MuellerWeiss ISMIR AudioFeatures

Time-Frequency Representation

Time (seconds)

SpectrogramFr

eque

ncy

(Hz)

Intensity

Page 31: 2017 MuellerWeiss ISMIR AudioFeatures

Time-Frequency Representation

Time (seconds)

SpectrogramFr

eque

ncy

(Hz)

Intensity (dB)

Page 32: 2017 MuellerWeiss ISMIR AudioFeatures

Time-Frequency RepresentationSignal and STFT with Hann window of length 20 ms

Freq

uenc

y(H

z)

Time (seconds)

Page 33: 2017 MuellerWeiss ISMIR AudioFeatures

Time-Frequency RepresentationSignal and STFT with Hann window of length 100 ms

Freq

uenc

y(H

z)

Time (seconds)

Page 34: 2017 MuellerWeiss ISMIR AudioFeatures

Size of window constitutes a trade-off between time resolution and frequency resolution:

Large window : poor time resolutiongood frequency resolution

Small window : good time resolutionpoor frequency resolution

Heisenberg Uncertainty Principle: there is nowindow function that localizes in time andfrequency with arbitrary precision.

Time-Frequency RepresentationTime-Frequency Localization

Page 35: 2017 MuellerWeiss ISMIR AudioFeatures

Audio Features

C124

C236

C348

C460

C572

C684

C796

C8108

Example: Chromatic scaleFr

eque

ncy

(Hz)

Inte

nsity

(dB

)

Inte

nsity

(dB

)

Freq

uenc

y(H

z)

Time (seconds)

Spectrogram

Page 36: 2017 MuellerWeiss ISMIR AudioFeatures

Audio Features

C124

C236

C348

C460

C572

C684

C796

C8108

Example: Chromatic scaleFr

eque

ncy

(Hz)

Inte

nsity

(dB

)

Inte

nsity

(dB

)

Freq

uenc

y(H

z)

Time (seconds)

Spectrogram

Page 37: 2017 MuellerWeiss ISMIR AudioFeatures

Audio FeaturesExample: Chromatic scale

C4: 262 HzC5: 523 Hz

C6: 1046 Hz

C7: 2093 Hz

C8: 4186 Hz

C3: 131 Hz

Inte

nsity

(dB

)

Time (seconds)

SpectrogramC124

C236

C348

C460

C572

C684

C796

C8108

Page 38: 2017 MuellerWeiss ISMIR AudioFeatures

Audio FeaturesExample: Chromatic scale

C4: 262 Hz

C5: 523 Hz

C6: 1046 Hz

C7: 2093 Hz

C8: 4186 Hz

C3: 131 Hz Inte

nsity

(dB

)

Time (seconds)

Log-frequency spectrogramC124

C236

C348

C460

C572

C684

C796

C8108

Page 39: 2017 MuellerWeiss ISMIR AudioFeatures

Audio FeaturesExample: Chromatic scale

Pitc

h (M

IDI n

ote

num

ber)

Inte

nsity

(dB

)

Time (seconds)

Log-frequency spectrogramC124

C236

C348

C460

C572

C684

C796

C8108

Page 40: 2017 MuellerWeiss ISMIR AudioFeatures

Audio FeaturesExample: Chromatic scale

Chroma C

Inte

nsity

(dB

)

Pitc

h (M

IDI n

ote

num

ber)

Time (seconds)

Log-frequency spectrogramC124

C236

C348

C460

C572

C684

C796

C8108

Page 41: 2017 MuellerWeiss ISMIR AudioFeatures

Audio FeaturesExample: Chromatic scale

Chroma C#

Inte

nsity

(dB

)

Pitc

h (M

IDI n

ote

num

ber)

Time (seconds)

Log-frequency spectrogramC124

C236

C348

C460

C572

C684

C796

C8108

Page 42: 2017 MuellerWeiss ISMIR AudioFeatures

Audio FeaturesExample: Chromatic scale

Chromagram

Inte

nsity

(dB

)

Time (seconds)

Chr

oma

C124

C236

C348

C460

C572

C684

C796

C8108

Page 43: 2017 MuellerWeiss ISMIR AudioFeatures

Audio FeaturesChroma features

Chromatic circle Shepard’s helix of pitch

Page 44: 2017 MuellerWeiss ISMIR AudioFeatures

Audio FeaturesChroma features

Freq

uenc

y(p

itch)

Time (seconds)

Page 45: 2017 MuellerWeiss ISMIR AudioFeatures

Audio FeaturesChroma features

Freq

uenc

y(p

itch)

Time (seconds)

Page 46: 2017 MuellerWeiss ISMIR AudioFeatures

Audio FeaturesChroma features

Time (seconds)

Chr

oma

Page 47: 2017 MuellerWeiss ISMIR AudioFeatures

Audio Features

There are many ways to implement chroma features

Properties may differ significantly

Appropriateness depends on respective application

Chroma Toolbox (MATLAB)https://www.audiolabs-erlangen.de/resources/MIR/chromatoolbox

LibROSA (Python)https://librosa.github.io/librosa/

Feature learning: “Deep Chroma”[Korzeniowski/Widmer, ISMIR 2016]