Upload
others
View
31
Download
0
Embed Size (px)
Citation preview
MIRtoolbox: Sound and music analysis of
audio recordings using MatlabMUS4831, Olivier Lartillot, 26.10.2017
Part 1
• MIRtoolbox overview
• Basic signal processing operators
• Auditory models
• Pitch estimation
• Timbral descriptions
Part 2 (in 2 weeks)
• Rhythm, metrical structure
• Tonal analysis
• Segmentation, structure
• Statistical descriptions, similarity
• Music & emotion
• Advanced use
Lecture-Workshop
• Lecture slides in PDF
• Workshop handout
• We will install MIRtoolbox together…
• Sound examples
• Useful: MIRtoolbox User’s Guide
1. Overview & syntax
General Principles• Why did we create MIRtoolbox?
• Research project about music & emotion
• Analysis tool for students from various background
• Modular framework: Building blocks
• Simple and adaptive syntax
• User can focus on the general design.
• MIRtoolbox takes care of the technical details.
• Free software, open source
• One standard tool for MIR study and research (10000s downloads)
MIRtoolbox Features
miraudio
mirinharmonicitymirrms mirlowenergy
mirplay
mirsave
mirlength
mirfilterbankmirframe
mirsegment
mirsum
mirnovelty mirpeaks
mirgetdata
mirexport
mirstat
mirzerocross
mirhisto
mircentroid
mirspread
mirskewness
mirkurtosis
mirflatness
mircluster
mirclassify
mirfeatures
mirzerocross
mirdist
mirquery
mirpeaks mirtempomirautocor
mirspectrum
*
mirpulseclarity
mirattackslopemirpeaks
mirattacktime
mireventdensity
mirsimatrix
mirtonalcentroid mirflux mirhcdf
mirpeaks
mirmode
mirkey
mirkeysom
mirchromagram mirkeystrength
mirpeaks mirpitch*
mirautocor
mircepstrum
mirpeaks
mirrolloff
mirbrightness
mirmfcc
mirroughness
mirregularity
mirspectrum
mirflux
mironsetsmirenvelope
mirbeatstrength
mirfluctuationmirspectrum
Let’s now install MIRtoolbox…
Requires:
• Matlab,
• Signal Processing toolbox,
• Statistics and Machine Learning toolbox
http://bit.ly/mirtoolbox
Basic Operationsmiraudio(’ragtime.wav’)
miraudio(’Folder’) ‘Folder’ = all files in Current Directory
.wav.mp3.mp4.m4a.ogg.flac.au
miraudio(..., ‘Extract’) extraction options
• miraudio(..., ‘Extract’, 1, 2)extracts signal from 1 s to 2 s after the start
a = miraudio(‘ragtime.wav’)b = miraudio(a, ‘Extract’,1,2)
mirplay(b)b = miraudio(‘ragtime.wav’, ‘Extract’,1,2)
mirsave(b)
miraudio(..., ‘Trim’) trimming options
• miraudio(‘ragtime.wav’, ‘Trim’)
trims (pseudo-)silence at start and end
• miraudio(..., ‘TrimStart’) at start only
• miraudio(..., ‘TrimEnd’) at end only
• miraudio(..., ‘TrimThreshold’, t)
specifies the silence threshold t = .06
Silent frames have RMS amplitude below t times the medium RMS amplitude of the whole audio file.
2. Basic signal processing operators
mirspectrum(‘trumpet.wav’) Discrete Fourier Transform
of audio signal x:
• mirspectrum(..., ‘Min’, f1) f1=0 Hz
• mirspectrum(..., ‘Max’, f2) f2=sampling rate/2
• mirspectrum(..., ‘Window’, ‘hamming’ )
0 1000 2000 3000 4000 5000 60000
200
400
600
800
1000Spectrum
frequency (Hz)
magnitude
f2f1
Xk =
�����
N�1X
n=0
xne� 2⇡i
N kn
����� , k = 0, . . . , N/2
mirspectrum(…, ‘Terhardt’) auditory model: outer-ear filter
0 0.5 1 1.5 2 2.5
x 104
0
200
400Spectrum
frequency (Hz)
magnitude
0 0.5 1 1.5 2 2.5
x 104
0
50
100Spectrum
frequency (Hz)
magnitude
based on MA toolbox
• mirspectrum
• mirspectrum(..., ‘Terhardt’)
5000Hz
5000Hz
mirspectrum(…, ‘Mel’) auditory model: Mel-band spectrum
based on Auditory toolbox
mfcc
28
mfcc
PurposeMel-frequency cepstral coefficient transform of an audio signal
Synopsis[ceps,freqresp,fb,recon] = mfcc(input, samplingRate)
DescriptionFind the cepstral coefficients (ceps) corresponding to the input. Three other quanti-ties are optionally returned that represent the detailed FFT magnitude (freqresp), the log10 mel-scale filter bank output (fb), and the reconstruction of the filter bank output by inverting the cosine transform.
The sequence of processing includes for each chunk of data:Window the data with a hamming window,Shift it into FFT order,Find the magnitude of the FFT,Convert the FFT data into filter bank outputs,Find the log base 10,Find the cosine transform to reduce dimensionality.
The filter bank is constructed using 13 linearly-spaced filters (133.33Hz between center frequencies,) followed by 27 log-spaced filters (separated by a factor of 1.0711703 in frequency.) Each filter is constructed by combining the amplitude of FFT bin as shown in the figure below.
The forty filters look like this.
CF - 133Hz or CF CF + 133Hz orCF/1.0718 CF*1.0718
Frequency
1030
0.005
0.01
Frequency
0 5 10 15 20 25 30 35 400
5
10Mel!Spectrum
Mel bands
ma
gn
itu
de
• frequency bands equally spaced on mel scale
• in each mel band, perceptually same pitch range
mirspectrum(…, ‘Bark’) auditory model: Bark-band spectrum
based on MA toolbox
0 5 10 15 20 250
5000
10000Bark!Spectrum
Bark bands
ma
gn
itu
de
• another similar auditory model, decomposing the frequency axis into bands
mirspectrum pitch-based distribution
• mirspectrum(..., ‘Cents’)
6000 7000 8000 9000 10000 11000 12000 130000
500
1000Spectrum
pitch (in midicents)
mag
nitu
de0 2000 4000 6000 8000 10000 12000
0
500
1000Spectrum
frequency (Hz)
mag
nitu
de
• mirspectrum(‘...’)
0 200 400 600 800 1000 12000
1000
2000Spectrum
Cents
magnitude• mirspectrum(...,
‘Collapsed’)
0000
0000
0000
0000
0000
00
0000
0000
0000
C D E F G A B C
mirautocor autocorrelation function
!"#$% !"#& !"#&% !"#' !"#'% !"#( !"#(% !") !")!% !")*
!!"%
!
!"%
+,-./012345/67
8.7409:;
27<=.8,-4
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05!0.05
0
0.05
0.1
0.15Waveform autocorrelation
lag (s)
co
eff
icie
nts
x
j
xR x
miraudio(‘trumpet.wav’)
mirautocor(‘trumpet.wav’)
mirautocor autocorrelation function
• mirautocor(…, ‘Min’, t1, ‘s’) t1=0 s
• mirautocor(..., ‘Max’, t2, ‘s’) t2=.05 s (audio) or t2=2 s (envelope)
• mirautocor(..., ‘Freq’) lags in Hz.
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05!0.05
0
0.05
0.1
0.15Waveform autocorrelation
lag (s)
co
eff
icie
nts
mirautocor(..., ‘Compres’) “compressed” autocorrelation
mirautocor(‘Cmaj.wav’, ’Freq’)
mirautocor(‘Cmaj.wav’, ’Freq’, ‘Compres’)
• Autocorrelation (by default):
• y = IDFT(|DFT(x)|2)
• “Compressed” autocorrelation:
• y = IDFT(|DFT(x)|k)
• mirautocor(…, ‘Compres’, k) k=.67
Tolonen, Karjalainen. A Computationally Efficient Multipitch Analysis Model, IEEE Transactions on Speech and Audio Processing, 8(6), 2000.
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014!0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
0 0.002 0.004 0.006 0.008 0.01 0.012 0.0140
0.005
0.01
0.015
0.02
0.025
0.03
0 0.002 0.004 0.006 0.008 0.01 0.012 0.0140
0.005
0.01
0.015
0.02
0.025
0.03
0 0.002 0.004 0.006 0.008 0.01 0.012 0.0140
0.005
0.01
0.015
0.02
0.025
0.03
0 0.002 0.004 0.006 0.008 0.01 0.012 0.0140
0.005
0.01
0.015
0.02
0.025
0.03
0 0.002 0.004 0.006 0.008 0.01 0.012 0.0140
0.005
0.01
0.015
0.02
0.025
0.03
mirautocor(..., ‘Enhanced’) enhanced autocorrelation
• mirautocor(‘Amin3’, ‘Enhanced’, 2:10)
Tolonen, Karjalainen. A Computationally Efficient Multipitch Analysis Model, IEEE Transactions on Speech and Audio Processing, 8(6), 2000.
234567
mirautocor(‘Cmaj.wav’, ’Freq’, ‘Compres’, ‘Enhanced’)
mirautocor(‘Cmaj.wav’, ’Freq’, ‘Compres’, ‘Enhanced’, 2:20)
• mirframe(..., ‘WinLength’, l, ‘s’) unit: ‘s’ (seconds), ‘sp’ (samples)
• mirframe(..., ‘Hop’, h, ‘/1’) unit: ‘/1’ (ratio from 0 to 1),‘%’ (percentage), ‘s’, ‘sp’
• mirframe(‘.., l, ‘s’, h, ‘/1’)
mirframe frame decomposition
0 1 2 3 4 5 6!0.4
!0.3
!0.2
!0.1
0
0.1
0.2
0.3Audio waveform
time (s)
am
plit
ude
h
llmirframe(‘ragtime.wav’,1,.5)
mirframe syntax
mirframef
miraudioa
a = miraudio(‘mysong’)
f = mirframe(a)
or: s = mirspectrum(‘mysong’, ‘Frame’)
mirspectrums
f = mirframe(‘mysong’) s = mirspectrum(f)
•miraudio(..., ‘Frame’, l, ‘s’, h, ‘/1’)
•mirspectrum(..., ‘Frame’, l, ‘s’, h, ‘/1’)
•mirspectrum(’mysong’, ’Frame’, 1, .5, ’Mel’)
‘Frame’ option syntax
mirflux distance between successive frames
s = mirspectrum(a, ‘Frame’)
mirflux(s)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.01
0.02
0.03
frames
co
eff
icie
nt
va
lue
(in
)
Spectral flux
• mirflux(a) = mirflux(mirspectrum(a, ‘Frame’, .05, .5))
• ac = mirautocor(a, ‘Frame’), mirflux(ac)
• mirflux(..., ‘Dist’, d) d = ‘Euclidean’, ‘City’, ‘Cosine’
mirrms root mean square
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.01
0.02
0.03
0.04
frames
coeffic
ient valu
e (
in )
RMS energy
mirrms(‘ragtime.wav’)
The RMS energy related to file ragtime is 0.017932
mirrms(‘ragtime.wav’, ‘Frame’)
Default frame size .05 s, frame hop = .5
mirenvelope envelope extraction
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.005
0.01
0.015
0.02
0.025Envelope
time (s)
am
plit
ud
e
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.1
!0.05
0
0.05
0.1Audio waveform
time (s)
am
plit
ud
e
a:
e = mirenvelope(a) mirplay(e)mirsave(e)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.005
0.01
0.015
0.02
0.025Envelope
time (s)
am
plit
ude
e = mirenvelope(a)
mirrms(a, ‘Frame’)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.01
0.02
0.03
0.04
frames
co
eff
icie
nt
va
lue
(in
)
RMS energy
mirenvelope(..., ‘Filter’) based on low-pass filtering
mirenvelope(..., ‘Tau’, .02): time constant (in s.)
Down-Sampling N mirenvelope(..., ‘PostDecim’, N) N=16
mirenvelope(..., ‘Sampling’, f)
Low-Pass Filter
LPF
Full-wave rectificationabs0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
!0.1
!0.05
0
0.05
0.1Audio waveform
time (s)
am
plit
ud
e
mirenvelope post-processing options
• mirenvelope(..., ‘Center’)
‘HalfWaveCenter’)
‘Diff’)
‘HalfWaveDiff’)
• mirenvelope(..., ‘Power’)
• mirenvelope(..., ‘Normal’)
• mirenvelope(..., ‘Smooth’,o) moving average, order o = 30
• mirenvelope(..., ‘Gauss’,o) gaussian, std deviation o = 30 sp
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.02
0
0.02Envelope
time (s)
am
plit
ud
e
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.01
0.02Envelope (half!wave rectified)
time (s)
am
plit
ude
! !"# $ $"# % %"# & &"# ' '"#!$
!
$
%
()$!!' *+,,-.-/0+10-2)-/3-456-
0+7-)89:
1764+0;2-
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
2
4x 10
!4 Envelope (half!wave rectified)
time (s)a
mp
litu
de
mirplay
MakeERBFilters
24
MakeERBFilters
PurposeDesign the filters needed to implement an ERB cochlear model.
Synopsisfcoefs = MakeERBFilters(fs,numChannels,lowFreq)
DescriptionThis function computes the filter coefficients for a bank of Gammatone filters. These filters were defined by Patterson and Holdworth for simulating the cochlea.
The result is returned as an array of filter coefficients. Each row of the filter arrays contains the coefficients for four second order filters. The transfer function for these four filters share the same denominator (poles) but have different numerators (zeros). All of these coefficients are assembled into one vector that the ERBFilterBank func-tion can take apart to implement the filter.
The filter bank contains numChannels channels that extend from half the sampling rate (fs) to lowFreq.
Note this implementation fixes a problem in the original code. The original version calculated a single eighth-order polynomial representing the entire filter function. This new version computes four separate second order filters. This avoids a problem with round off errors in cases with very small characteristic frequencies (<100Hz) and large sample rates (>44kHz). The problem is caused by roundoff error when a number of poles are combined, all very close to the unit circle. Small errors in the eighth-order coefficient, are magnified when the eighth root is taken to give the pole location. These small errors lead to poles outside the unit circle and instability. Thanks to Julius Smith for pointer me to the proper explanation.
ExamplesThe ten ERB filters between 100 and 8000Hz are computed using
»fcoefs = MakeERBFilters(16000,10,100);
The resulting frequency response is given by»y = ERBFilterBank([1 zeros(1,511)], fcoefs);
»resp = 20*log10(abs(fft(y')));
»freqScale = (0:511)/512*16000;
»semilogx(freqScale(1:255),resp(1:255,:));
»axis([100 16000 -60 0])
»xlabel('Frequency (Hz)');
»ylabel('Filter Response (dB)');
A simple cochlear model can be formed by filtering an utterance with these filters. To convert this data into an image we pass each row of the cochleagram through a half-wave-rectifier, a low-pass filter, and then decimate by a factor of 100. A cochleagram of the ‘A huge tapestry hung in her hallway’ utterance from the TIMIT database
102
103
104
-60
-40
-20
0
• mirfilterbank(..., ‘Gammatone’)
Equivalent Rectangular Bandwidth (ERB) Gammatone filterbank
• f = mirfilterbank(..., ‘NbChannels’, N=10)
• mirplay(f)
miraudio
filterN
filter2filter1
...
Filter frequency responses
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.01
0
0.01
1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.05
0
0.05
2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.1
0
0.1
3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.1
0
0.1
4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.05
0
0.05
5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.02
0
0.02
6
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.02
0
0.02
7
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.01
0
0.01
8
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!2
0
2x 10
!3
9
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!1
0
1x 10
!3 Audio waveform
10
Hz
dB
mirfilterbankfilterbank decomposition
mirfilterbank filterbank decomposition
• mirfilterbank(..., ‘Manual’, [44, 88, 176, 352, 443])
0 0.005 0.01 0.015 0.02 0.025
−40
−30
−20
−10
0
N li d F ( d/ l )
Mag
nitu
de (d
B)
Magnitude Response (dB)
• mirfilterbank(..., ‘Manual’, [-Inf 200 400 800 1600 Inf])
0.02 0.04 0.06 0.08 0.1 0.12
−50
−40
−30
−20
−10
0
N li d F ( d/ l )
Mag
nitu
de (d
B)
Magnitude Response (dB)
mirsum across-channels summation
e = mirenvelope(f)
a
e 0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.1
0.2
1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.5
1
2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.2
0.4
3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.05
0.1
4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
5x 10
!3 Envelope
5
mirenvelope
mirfilterbank f = mirfilterbank(a)0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
!1
0
1
1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!5
0
5
2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!2
0
2
3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!1
0
1
4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.1
0
0.1Audio waveform (centered)
5
f
mirsum
s
s = mirsum(e)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.08
!0.06
!0.04
!0.02
0
0.02
0.04
0.06Envelope
time (s)
am
plit
ud
e
mirsum stereo summation
e = mirenvelope(a)
miraudioa
a = miraudio(..., ‘Mono’, 0)
emirenvelope
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.01
0.02
0.03
time (s)
1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.01
0.02
0.03Envelope
2
mirsum
s
s = mirsum(e)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.01
0.02
0.03
0.04
0.05Envelope
time (s)
ampl
itude
mirsum summary
e = mirenvelope(f)
amirfilterbank
ff = mirfilterbank(a)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!1
0
1
1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!5
0
5
2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!2
0
2
3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!1
0
1
4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5!0.1
0
0.1Audio waveform (centered)
5
e0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
0
0.1
0.2
1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.5
1
2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.2
0.4
3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
0.05
0.1
4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
5x 10
!3 Envelope
5
mirenvelope
ac = mirautocor(e)ac
mirautocor
0 0.2 0.4 0.6 0.8 1 1.2 1.40
1
2x 10
!7
1
0 0.2 0.4 0.6 0.8 1 1.2 1.4!5
0
5x 10
!6
2
0 0.2 0.4 0.6 0.8 1 1.2 1.4!1
0
1x 10
!6
3
0 0.2 0.4 0.6 0.8 1 1.2 1.4!5
0
5x 10
!8
4
0 0.2 0.4 0.6 0.8 1 1.2 1.4
!0.50
0.5
x 10!10 Envelope autocorrelation
5
mirsums
s = mirsum(e)
0 0.2 0.4 0.6 0.8 1 1.2 1.4!5
0
5x 10
!6 Envelope autocorrelation
lag (s)
coeffic
ients
mirpeaks peak picking
• p = mirpeaks(..., ‘Total’, 3, ‘NoBegin’)
• To get peak positions:
• mirgetdata(p)
• To get peak amplitudes:
• get(p, ‘PeakVal’)
default: t=0
default: c=.1
! " #! #" $! $" %! %" &!!
"
#!
#"
$!'()!*+(,-./0
'()123456
03748-/5(
*
* OKOK
OKnot OK
mirpeaks parameters specification
• mirpeaks(..., ‘Threshold’, t)
• mirpeaks(..., ‘Contrast’, c)
t
1
00
1
c
3. Feature extractors
• Pitch / f0
• Timbre
• Tempo
• Tonality
• Segmentation(Wed
nesd
ay)
mirpitch pitch estimation
mirautocor
mirfilterbank
mirframe
mirpeaks
mirsum
mirpitch(...,
‘2Channels’,
‘Enhanced’, 2:10, ‘Compress’, .5
‘Total’, Inf, ‘Min’, 75, ‘Max’, 2400, ‘Contrast’, .1, ‘Threshold’, .4)
or ‘NoFilterbank’,
Timbre
• Zero-crossing rate: mirzerocross
• Spectral distribution: mirrolloff, mirbrightness, mircentroid, mirspread, …
• Mel-Frequency Cepstral Coefficients: mirmfcc
• Sensory Dissonance: mirroughness
• mirregularity
mirzerocross waveform sign-change rate
• Is supposed to indicate how noisy the sound is.
• But highly dependent on the presence of low or high frequency components in the sound.
1.701 1.702 1.70& 1.704 1.705 1.706 1.707 1.708 1.709
!0.0&
!0.02
!0.01
0
0.01
0.02
0.0&
Audio waveform
time (s)
am
plit
ude
mirrolloff high-frequency energy (I)
• mirrolloff(..., ‘Threshold’, .85)
0 2000 4000 6000 8000 10000 120000
500
1000
1500
2000
2500
3000
3500
Spectrum
frequency (Hz)
magnitude 85% of the energy
5640.53 Hz
mirbrightness high-frequency energy (II)
• mirbrightness(..., ‘CutOff’, 1500) (in Hz)
• mirbrightness(..., ‘Unit’, u) u = ‘/1’ or ‘%’
0 2000 4000 6000 8000 10000 120000
500
1000
1500
2000
2500
3000
3500
Spectrum
frequency (Hz)
magnitude
1500 Hz
53.96% of the energy
mircentroid geometric center of spectral distribution
µ1 =�
xf(x)dx
µ1
mirspread spectral dispersion
⇥2 = µ2 =�
(x� µ1)2f(x)dxsecond moment:
µ1
mirkurtosis spectral pickiness
< 0 > 00-2
mirflatness smooth vs. spiky
geometric mean
arithmetic mean
N
⌅⇤N�1n=0 x(n)
�PN�1n=0 x(n)
N
⇥
mirmfcc mel-frequency cepstral coefficients
• Description of spectral shape.
mirspectrum (‘Mel’)Abs LogDiscreteCosine
Transform
mirroughness sensory dissonance
• mirroughness(..., ‘Sethares’)
Dissonance produced by two sinusoids
depending on their frequency ratio
mirspectrum mirpeaks dissonanceestimated for
each pair of peaks
sum
mironsets onset detection function
time (s)0.5 1 1.5 2 2.5 3 3.5 4
ampl
itude
0.2
0.4
0.6
0.8
1Onset curve (Envelope)
• mironsets• mirpeaks(mirsum(mirspectrum(…, ‘Frame’)))
• mironsets(…, ‘Filter’)• mirpeaks(mirsum(mirenvelope(mirfilterbank(…, ‘NbChannels’, 40))))
• mironsets(…, ‘SpectralFlux’)• mirpeaks(mirflux(…, ‘Inc’, ‘Halfwave’))
mironsets(…, ‘Contrast’,
…)
mironsets(..., ‘Attack’)
time (s)0.5 1 1.5 2 2.5 3 3.5 4
ampl
itude
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Onset curve (Envelope)
mirattackslope average slope of note attacks
• o = mironsets(‘george.wav’, …)
• mirattackslope(o)
4 5 6 7 8 9
!0.5
0
0.5
1
1.5
2
2.5
Envelope (centered)
time (s)
am
plit
ude
mirattackleap amplitude of note attacks
• o = mironsets(‘george.wav’, …)
• mirattackleap(o)
4 5 6 7 8 9
!0.5
0
0.5
1
1.5
2
2.5
Envelope (centered)
time (s)
am
plit
ude
Part 2 (in 2 weeks)
• Rhythm, metrical structure
• Tonal analysis
• Segmentation, structure
• Statistical descriptions, similarity
• Music & emotion
• Advanced use