53
MIRtoolbox: Sound and music analysis of audio recordings using Matlab MUS4831, Olivier Lartillot, 26.10.2017

MIRtoolbox - UiO

  • Upload
    others

  • View
    31

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MIRtoolbox - UiO

MIRtoolbox: Sound and music analysis of

audio recordings using MatlabMUS4831, Olivier Lartillot, 26.10.2017

Page 2: MIRtoolbox - UiO

Part 1

• MIRtoolbox overview

• Basic signal processing operators

• Auditory models

• Pitch estimation

• Timbral descriptions

Page 3: MIRtoolbox - UiO

Part 2 (in 2 weeks)

• Rhythm, metrical structure

• Tonal analysis

• Segmentation, structure

• Statistical descriptions, similarity

• Music & emotion

• Advanced use

Page 4: MIRtoolbox - UiO

Lecture-Workshop

• Lecture slides in PDF

• Workshop handout

• We will install MIRtoolbox together…

• Sound examples

• Useful: MIRtoolbox User’s Guide

Page 5: MIRtoolbox - UiO

1. Overview & syntax

Page 6: MIRtoolbox - UiO

General Principles• Why did we create MIRtoolbox?

• Research project about music & emotion

• Analysis tool for students from various background

• Modular framework: Building blocks

• Simple and adaptive syntax

• User can focus on the general design.

• MIRtoolbox takes care of the technical details.

• Free software, open source

• One standard tool for MIR study and research (10000s downloads)

Page 7: MIRtoolbox - UiO

MIRtoolbox Features

miraudio

mirinharmonicitymirrms mirlowenergy

mirplay

mirsave

mirlength

mirfilterbankmirframe

mirsegment

mirsum

mirnovelty mirpeaks

mirgetdata

mirexport

mirstat

mirzerocross

mirhisto

mircentroid

mirspread

mirskewness

mirkurtosis

mirflatness

mircluster

mirclassify

mirfeatures

mirzerocross

mirdist

mirquery

mirpeaks mirtempomirautocor

mirspectrum

*

mirpulseclarity

mirattackslopemirpeaks

mirattacktime

mireventdensity

mirsimatrix

mirtonalcentroid mirflux mirhcdf

mirpeaks

mirmode

mirkey

mirkeysom

mirchromagram mirkeystrength

mirpeaks mirpitch*

mirautocor

mircepstrum

mirpeaks

mirrolloff

mirbrightness

mirmfcc

mirroughness

mirregularity

mirspectrum

mirflux

mironsetsmirenvelope

mirbeatstrength

mirfluctuationmirspectrum

Page 8: MIRtoolbox - UiO

Let’s now install MIRtoolbox…

Requires:

• Matlab,

• Signal Processing toolbox,

• Statistics and Machine Learning toolbox

http://bit.ly/mirtoolbox

Page 9: MIRtoolbox - UiO

Basic Operationsmiraudio(’ragtime.wav’)

miraudio(’Folder’) ‘Folder’ = all files in Current Directory

.wav.mp3.mp4.m4a.ogg.flac.au

Page 10: MIRtoolbox - UiO

miraudio(..., ‘Extract’) extraction options

• miraudio(..., ‘Extract’, 1, 2)extracts signal from 1 s to 2 s after the start

a = miraudio(‘ragtime.wav’)b = miraudio(a, ‘Extract’,1,2)

mirplay(b)b = miraudio(‘ragtime.wav’, ‘Extract’,1,2)

mirsave(b)

Page 11: MIRtoolbox - UiO

miraudio(..., ‘Trim’) trimming options

• miraudio(‘ragtime.wav’, ‘Trim’)

trims (pseudo-)silence at start and end

• miraudio(..., ‘TrimStart’) at start only

• miraudio(..., ‘TrimEnd’) at end only

• miraudio(..., ‘TrimThreshold’, t)

specifies the silence threshold t = .06

Silent frames have RMS amplitude below t times the medium RMS amplitude of the whole audio file.

Page 12: MIRtoolbox - UiO

2. Basic signal processing operators

Page 13: MIRtoolbox - UiO

mirspectrum(‘trumpet.wav’) Discrete Fourier Transform

of audio signal x:

• mirspectrum(..., ‘Min’, f1) f1=0 Hz

• mirspectrum(..., ‘Max’, f2) f2=sampling rate/2

• mirspectrum(..., ‘Window’, ‘hamming’ )

0 1000 2000 3000 4000 5000 60000

200

400

600

800

1000Spectrum

frequency (Hz)

magnitude

f2f1

Xk =

�����

N�1X

n=0

xne� 2⇡i

N kn

����� , k = 0, . . . , N/2

Page 14: MIRtoolbox - UiO

mirspectrum(…, ‘Terhardt’) auditory model: outer-ear filter

0 0.5 1 1.5 2 2.5

x 104

0

200

400Spectrum

frequency (Hz)

magnitude

0 0.5 1 1.5 2 2.5

x 104

0

50

100Spectrum

frequency (Hz)

magnitude

based on MA toolbox

• mirspectrum

• mirspectrum(..., ‘Terhardt’)

5000Hz

5000Hz

Page 15: MIRtoolbox - UiO

mirspectrum(…, ‘Mel’) auditory model: Mel-band spectrum

based on Auditory toolbox

mfcc

28

mfcc

PurposeMel-frequency cepstral coefficient transform of an audio signal

Synopsis[ceps,freqresp,fb,recon] = mfcc(input, samplingRate)

DescriptionFind the cepstral coefficients (ceps) corresponding to the input. Three other quanti-ties are optionally returned that represent the detailed FFT magnitude (freqresp), the log10 mel-scale filter bank output (fb), and the reconstruction of the filter bank output by inverting the cosine transform.

The sequence of processing includes for each chunk of data:Window the data with a hamming window,Shift it into FFT order,Find the magnitude of the FFT,Convert the FFT data into filter bank outputs,Find the log base 10,Find the cosine transform to reduce dimensionality.

The filter bank is constructed using 13 linearly-spaced filters (133.33Hz between center frequencies,) followed by 27 log-spaced filters (separated by a factor of 1.0711703 in frequency.) Each filter is constructed by combining the amplitude of FFT bin as shown in the figure below.

The forty filters look like this.

CF - 133Hz or CF CF + 133Hz orCF/1.0718 CF*1.0718

Frequency

1030

0.005

0.01

Frequency

0 5 10 15 20 25 30 35 400

5

10Mel!Spectrum

Mel bands

ma

gn

itu

de

• frequency bands equally spaced on mel scale

• in each mel band, perceptually same pitch range

Page 16: MIRtoolbox - UiO

mirspectrum(…, ‘Bark’) auditory model: Bark-band spectrum

based on MA toolbox

0 5 10 15 20 250

5000

10000Bark!Spectrum

Bark bands

ma

gn

itu

de

• another similar auditory model, decomposing the frequency axis into bands

Page 17: MIRtoolbox - UiO

mirspectrum pitch-based distribution

• mirspectrum(..., ‘Cents’)

6000 7000 8000 9000 10000 11000 12000 130000

500

1000Spectrum

pitch (in midicents)

mag

nitu

de0 2000 4000 6000 8000 10000 12000

0

500

1000Spectrum

frequency (Hz)

mag

nitu

de

• mirspectrum(‘...’)

0 200 400 600 800 1000 12000

1000

2000Spectrum

Cents

magnitude• mirspectrum(...,

‘Collapsed’)

0000

0000

0000

0000

0000

00

0000

0000

0000

C D E F G A B C

Page 18: MIRtoolbox - UiO

mirautocor autocorrelation function

!"#$% !"#& !"#&% !"#' !"#'% !"#( !"#(% !") !")!% !")*

!!"%

!

!"%

+,-./012345/67

8.7409:;

27<=.8,-4

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05!0.05

0

0.05

0.1

0.15Waveform autocorrelation

lag (s)

co

eff

icie

nts

x

j

xR x

miraudio(‘trumpet.wav’)

mirautocor(‘trumpet.wav’)

Page 19: MIRtoolbox - UiO

mirautocor autocorrelation function

• mirautocor(…, ‘Min’, t1, ‘s’) t1=0 s

• mirautocor(..., ‘Max’, t2, ‘s’) t2=.05 s (audio) or t2=2 s (envelope)

• mirautocor(..., ‘Freq’) lags in Hz.

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05!0.05

0

0.05

0.1

0.15Waveform autocorrelation

lag (s)

co

eff

icie

nts

Page 20: MIRtoolbox - UiO

mirautocor(..., ‘Compres’) “compressed” autocorrelation

mirautocor(‘Cmaj.wav’, ’Freq’)

mirautocor(‘Cmaj.wav’, ’Freq’, ‘Compres’)

• Autocorrelation (by default):

• y = IDFT(|DFT(x)|2)

• “Compressed” autocorrelation:

• y = IDFT(|DFT(x)|k)

• mirautocor(…, ‘Compres’, k) k=.67

Tolonen, Karjalainen. A Computationally Efficient Multipitch Analysis Model, IEEE Transactions on Speech and Audio Processing, 8(6), 2000.

Page 21: MIRtoolbox - UiO

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014!0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

0 0.002 0.004 0.006 0.008 0.01 0.012 0.0140

0.005

0.01

0.015

0.02

0.025

0.03

0 0.002 0.004 0.006 0.008 0.01 0.012 0.0140

0.005

0.01

0.015

0.02

0.025

0.03

0 0.002 0.004 0.006 0.008 0.01 0.012 0.0140

0.005

0.01

0.015

0.02

0.025

0.03

0 0.002 0.004 0.006 0.008 0.01 0.012 0.0140

0.005

0.01

0.015

0.02

0.025

0.03

0 0.002 0.004 0.006 0.008 0.01 0.012 0.0140

0.005

0.01

0.015

0.02

0.025

0.03

mirautocor(..., ‘Enhanced’) enhanced autocorrelation

• mirautocor(‘Amin3’, ‘Enhanced’, 2:10)

Tolonen, Karjalainen. A Computationally Efficient Multipitch Analysis Model, IEEE Transactions on Speech and Audio Processing, 8(6), 2000.

234567

mirautocor(‘Cmaj.wav’, ’Freq’, ‘Compres’, ‘Enhanced’)

mirautocor(‘Cmaj.wav’, ’Freq’, ‘Compres’, ‘Enhanced’, 2:20)

Page 22: MIRtoolbox - UiO

• mirframe(..., ‘WinLength’, l, ‘s’) unit: ‘s’ (seconds), ‘sp’ (samples)

• mirframe(..., ‘Hop’, h, ‘/1’) unit: ‘/1’ (ratio from 0 to 1),‘%’ (percentage), ‘s’, ‘sp’

• mirframe(‘.., l, ‘s’, h, ‘/1’)

mirframe frame decomposition

0 1 2 3 4 5 6!0.4

!0.3

!0.2

!0.1

0

0.1

0.2

0.3Audio waveform

time (s)

am

plit

ude

h

llmirframe(‘ragtime.wav’,1,.5)

Page 23: MIRtoolbox - UiO

mirframe syntax

mirframef

miraudioa

a = miraudio(‘mysong’)

f = mirframe(a)

or: s = mirspectrum(‘mysong’, ‘Frame’)

mirspectrums

f = mirframe(‘mysong’) s = mirspectrum(f)

Page 24: MIRtoolbox - UiO

•miraudio(..., ‘Frame’, l, ‘s’, h, ‘/1’)

•mirspectrum(..., ‘Frame’, l, ‘s’, h, ‘/1’)

•mirspectrum(’mysong’, ’Frame’, 1, .5, ’Mel’)

‘Frame’ option syntax

Page 25: MIRtoolbox - UiO

mirflux distance between successive frames

s = mirspectrum(a, ‘Frame’)

mirflux(s)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.01

0.02

0.03

frames

co

eff

icie

nt

va

lue

(in

)

Spectral flux

• mirflux(a) = mirflux(mirspectrum(a, ‘Frame’, .05, .5))

• ac = mirautocor(a, ‘Frame’), mirflux(ac)

• mirflux(..., ‘Dist’, d) d = ‘Euclidean’, ‘City’, ‘Cosine’

Page 26: MIRtoolbox - UiO

mirrms root mean square

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.01

0.02

0.03

0.04

frames

coeffic

ient valu

e (

in )

RMS energy

mirrms(‘ragtime.wav’)

The RMS energy related to file ragtime is 0.017932

mirrms(‘ragtime.wav’, ‘Frame’)

Default frame size .05 s, frame hop = .5

Page 27: MIRtoolbox - UiO

mirenvelope envelope extraction

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.005

0.01

0.015

0.02

0.025Envelope

time (s)

am

plit

ud

e

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.1

!0.05

0

0.05

0.1Audio waveform

time (s)

am

plit

ud

e

a:

e = mirenvelope(a) mirplay(e)mirsave(e)

Page 28: MIRtoolbox - UiO

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.005

0.01

0.015

0.02

0.025Envelope

time (s)

am

plit

ude

e = mirenvelope(a)

mirrms(a, ‘Frame’)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.01

0.02

0.03

0.04

frames

co

eff

icie

nt

va

lue

(in

)

RMS energy

Page 29: MIRtoolbox - UiO

mirenvelope(..., ‘Filter’) based on low-pass filtering

mirenvelope(..., ‘Tau’, .02): time constant (in s.)

Down-Sampling N mirenvelope(..., ‘PostDecim’, N) N=16

mirenvelope(..., ‘Sampling’, f)

Low-Pass Filter

LPF

Full-wave rectificationabs0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

!0.1

!0.05

0

0.05

0.1Audio waveform

time (s)

am

plit

ud

e

Page 30: MIRtoolbox - UiO

mirenvelope post-processing options

• mirenvelope(..., ‘Center’)

‘HalfWaveCenter’)

‘Diff’)

‘HalfWaveDiff’)

• mirenvelope(..., ‘Power’)

• mirenvelope(..., ‘Normal’)

• mirenvelope(..., ‘Smooth’,o) moving average, order o = 30

• mirenvelope(..., ‘Gauss’,o) gaussian, std deviation o = 30 sp

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.02

0

0.02Envelope

time (s)

am

plit

ud

e

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.01

0.02Envelope (half!wave rectified)

time (s)

am

plit

ude

! !"# $ $"# % %"# & &"# ' '"#!$

!

$

%

()$!!' *+,,-.-/0+10-2)-/3-456-

0+7-)89:

1764+0;2-

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

2

4x 10

!4 Envelope (half!wave rectified)

time (s)a

mp

litu

de

mirplay

Page 31: MIRtoolbox - UiO

MakeERBFilters

24

MakeERBFilters

PurposeDesign the filters needed to implement an ERB cochlear model.

Synopsisfcoefs = MakeERBFilters(fs,numChannels,lowFreq)

DescriptionThis function computes the filter coefficients for a bank of Gammatone filters. These filters were defined by Patterson and Holdworth for simulating the cochlea.

The result is returned as an array of filter coefficients. Each row of the filter arrays contains the coefficients for four second order filters. The transfer function for these four filters share the same denominator (poles) but have different numerators (zeros). All of these coefficients are assembled into one vector that the ERBFilterBank func-tion can take apart to implement the filter.

The filter bank contains numChannels channels that extend from half the sampling rate (fs) to lowFreq.

Note this implementation fixes a problem in the original code. The original version calculated a single eighth-order polynomial representing the entire filter function. This new version computes four separate second order filters. This avoids a problem with round off errors in cases with very small characteristic frequencies (<100Hz) and large sample rates (>44kHz). The problem is caused by roundoff error when a number of poles are combined, all very close to the unit circle. Small errors in the eighth-order coefficient, are magnified when the eighth root is taken to give the pole location. These small errors lead to poles outside the unit circle and instability. Thanks to Julius Smith for pointer me to the proper explanation.

ExamplesThe ten ERB filters between 100 and 8000Hz are computed using

»fcoefs = MakeERBFilters(16000,10,100);

The resulting frequency response is given by»y = ERBFilterBank([1 zeros(1,511)], fcoefs);

»resp = 20*log10(abs(fft(y')));

»freqScale = (0:511)/512*16000;

»semilogx(freqScale(1:255),resp(1:255,:));

»axis([100 16000 -60 0])

»xlabel('Frequency (Hz)');

»ylabel('Filter Response (dB)');

A simple cochlear model can be formed by filtering an utterance with these filters. To convert this data into an image we pass each row of the cochleagram through a half-wave-rectifier, a low-pass filter, and then decimate by a factor of 100. A cochleagram of the ‘A huge tapestry hung in her hallway’ utterance from the TIMIT database

102

103

104

-60

-40

-20

0

• mirfilterbank(..., ‘Gammatone’)

Equivalent Rectangular Bandwidth (ERB) Gammatone filterbank

• f = mirfilterbank(..., ‘NbChannels’, N=10)

• mirplay(f)

miraudio

filterN

filter2filter1

...

Filter frequency responses

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.01

0

0.01

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.05

0

0.05

2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.1

0

0.1

3

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.1

0

0.1

4

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.05

0

0.05

5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.02

0

0.02

6

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.02

0

0.02

7

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.01

0

0.01

8

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!2

0

2x 10

!3

9

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!1

0

1x 10

!3 Audio waveform

10

Hz

dB

mirfilterbankfilterbank decomposition

Page 32: MIRtoolbox - UiO

mirfilterbank filterbank decomposition

• mirfilterbank(..., ‘Manual’, [44, 88, 176, 352, 443])

0 0.005 0.01 0.015 0.02 0.025

−40

−30

−20

−10

0

N li d F ( d/ l )

Mag

nitu

de (d

B)

Magnitude Response (dB)

• mirfilterbank(..., ‘Manual’, [-Inf 200 400 800 1600 Inf])

0.02 0.04 0.06 0.08 0.1 0.12

−50

−40

−30

−20

−10

0

N li d F ( d/ l )

Mag

nitu

de (d

B)

Magnitude Response (dB)

Page 33: MIRtoolbox - UiO

mirsum across-channels summation

e = mirenvelope(f)

a

e 0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.1

0.2

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.5

1

2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.2

0.4

3

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.05

0.1

4

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

5x 10

!3 Envelope

5

mirenvelope

mirfilterbank f = mirfilterbank(a)0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

!1

0

1

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!5

0

5

2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!2

0

2

3

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!1

0

1

4

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.1

0

0.1Audio waveform (centered)

5

f

mirsum

s

s = mirsum(e)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.08

!0.06

!0.04

!0.02

0

0.02

0.04

0.06Envelope

time (s)

am

plit

ud

e

Page 34: MIRtoolbox - UiO

mirsum stereo summation

e = mirenvelope(a)

miraudioa

a = miraudio(..., ‘Mono’, 0)

emirenvelope

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.01

0.02

0.03

time (s)

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.01

0.02

0.03Envelope

2

mirsum

s

s = mirsum(e)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.01

0.02

0.03

0.04

0.05Envelope

time (s)

ampl

itude

Page 35: MIRtoolbox - UiO

mirsum summary

e = mirenvelope(f)

amirfilterbank

ff = mirfilterbank(a)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!1

0

1

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!5

0

5

2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!2

0

2

3

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!1

0

1

4

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.1

0

0.1Audio waveform (centered)

5

e0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

0

0.1

0.2

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.5

1

2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.2

0.4

3

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

0.05

0.1

4

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

5x 10

!3 Envelope

5

mirenvelope

ac = mirautocor(e)ac

mirautocor

0 0.2 0.4 0.6 0.8 1 1.2 1.40

1

2x 10

!7

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4!5

0

5x 10

!6

2

0 0.2 0.4 0.6 0.8 1 1.2 1.4!1

0

1x 10

!6

3

0 0.2 0.4 0.6 0.8 1 1.2 1.4!5

0

5x 10

!8

4

0 0.2 0.4 0.6 0.8 1 1.2 1.4

!0.50

0.5

x 10!10 Envelope autocorrelation

5

mirsums

s = mirsum(e)

0 0.2 0.4 0.6 0.8 1 1.2 1.4!5

0

5x 10

!6 Envelope autocorrelation

lag (s)

coeffic

ients

Page 36: MIRtoolbox - UiO

mirpeaks peak picking

• p = mirpeaks(..., ‘Total’, 3, ‘NoBegin’)

• To get peak positions:

• mirgetdata(p)

• To get peak amplitudes:

• get(p, ‘PeakVal’)

Page 37: MIRtoolbox - UiO

default: t=0

default: c=.1

! " #! #" $! $" %! %" &!!

"

#!

#"

$!'()!*+(,-./0

'()123456

03748-/5(

*

* OKOK

OKnot OK

mirpeaks parameters specification

• mirpeaks(..., ‘Threshold’, t)

• mirpeaks(..., ‘Contrast’, c)

t

1

00

1

c

Page 38: MIRtoolbox - UiO

3. Feature extractors

• Pitch / f0

• Timbre

• Tempo

• Tonality

• Segmentation(Wed

nesd

ay)

Page 39: MIRtoolbox - UiO

mirpitch pitch estimation

mirautocor

mirfilterbank

mirframe

mirpeaks

mirsum

mirpitch(...,

‘2Channels’,

‘Enhanced’, 2:10, ‘Compress’, .5

‘Total’, Inf, ‘Min’, 75, ‘Max’, 2400, ‘Contrast’, .1, ‘Threshold’, .4)

or ‘NoFilterbank’,

Page 40: MIRtoolbox - UiO

Timbre

• Zero-crossing rate: mirzerocross

• Spectral distribution: mirrolloff, mirbrightness, mircentroid, mirspread, …

• Mel-Frequency Cepstral Coefficients: mirmfcc

• Sensory Dissonance: mirroughness

• mirregularity

Page 41: MIRtoolbox - UiO

mirzerocross waveform sign-change rate

• Is supposed to indicate how noisy the sound is.

• But highly dependent on the presence of low or high frequency components in the sound.

1.701 1.702 1.70& 1.704 1.705 1.706 1.707 1.708 1.709

!0.0&

!0.02

!0.01

0

0.01

0.02

0.0&

Audio waveform

time (s)

am

plit

ude

Page 42: MIRtoolbox - UiO

mirrolloff high-frequency energy (I)

• mirrolloff(..., ‘Threshold’, .85)

0 2000 4000 6000 8000 10000 120000

500

1000

1500

2000

2500

3000

3500

Spectrum

frequency (Hz)

magnitude 85% of the energy

5640.53 Hz

Page 43: MIRtoolbox - UiO

mirbrightness high-frequency energy (II)

• mirbrightness(..., ‘CutOff’, 1500) (in Hz)

• mirbrightness(..., ‘Unit’, u) u = ‘/1’ or ‘%’

0 2000 4000 6000 8000 10000 120000

500

1000

1500

2000

2500

3000

3500

Spectrum

frequency (Hz)

magnitude

1500 Hz

53.96% of the energy

Page 44: MIRtoolbox - UiO

mircentroid geometric center of spectral distribution

µ1 =�

xf(x)dx

µ1

Page 45: MIRtoolbox - UiO

mirspread spectral dispersion

⇥2 = µ2 =�

(x� µ1)2f(x)dxsecond moment:

µ1

Page 46: MIRtoolbox - UiO

mirkurtosis spectral pickiness

< 0 > 00-2

mirflatness smooth vs. spiky

geometric mean

arithmetic mean

N

⌅⇤N�1n=0 x(n)

�PN�1n=0 x(n)

N

Page 47: MIRtoolbox - UiO

mirmfcc mel-frequency cepstral coefficients

• Description of spectral shape.

mirspectrum (‘Mel’)Abs LogDiscreteCosine

Transform

Page 48: MIRtoolbox - UiO

mirroughness sensory dissonance

• mirroughness(..., ‘Sethares’)

Dissonance produced by two sinusoids

depending on their frequency ratio

mirspectrum mirpeaks dissonanceestimated for

each pair of peaks

sum

Page 49: MIRtoolbox - UiO

mironsets onset detection function

time (s)0.5 1 1.5 2 2.5 3 3.5 4

ampl

itude

0.2

0.4

0.6

0.8

1Onset curve (Envelope)

• mironsets• mirpeaks(mirsum(mirspectrum(…, ‘Frame’)))

• mironsets(…, ‘Filter’)• mirpeaks(mirsum(mirenvelope(mirfilterbank(…, ‘NbChannels’, 40))))

• mironsets(…, ‘SpectralFlux’)• mirpeaks(mirflux(…, ‘Inc’, ‘Halfwave’))

mironsets(…, ‘Contrast’,

…)

Page 50: MIRtoolbox - UiO

mironsets(..., ‘Attack’)

time (s)0.5 1 1.5 2 2.5 3 3.5 4

ampl

itude

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Onset curve (Envelope)

Page 51: MIRtoolbox - UiO

mirattackslope average slope of note attacks

• o = mironsets(‘george.wav’, …)

• mirattackslope(o)

4 5 6 7 8 9

!0.5

0

0.5

1

1.5

2

2.5

Envelope (centered)

time (s)

am

plit

ude

Page 52: MIRtoolbox - UiO

mirattackleap amplitude of note attacks

• o = mironsets(‘george.wav’, …)

• mirattackleap(o)

4 5 6 7 8 9

!0.5

0

0.5

1

1.5

2

2.5

Envelope (centered)

time (s)

am

plit

ude

Page 53: MIRtoolbox - UiO

Part 2 (in 2 weeks)

• Rhythm, metrical structure

• Tonal analysis

• Segmentation, structure

• Statistical descriptions, similarity

• Music & emotion

• Advanced use