A comprehensive framework for audio and music analysisOlivier Lartillot Aalborg University, Denmark
MIRtoolbox MiningSuite
Outline
Audio and musical feature extraction from recordings
MIRtoolbox
Music analysis from symbolic representation:
grouping, reduction, motivic patterns
MiningSuite
Comprehensive set of audio / musical features
Modular syntactic layer on top of Matlab
both simple and powerful language
Referential framework in Music Information Retrieval
10000s of downloads, 500+ citations in papers
MIRtoolbox
Dynamics: Global energy
Dvorak, Symphony 8 G major, Allegretto grazioso - Molto vivace
10 20 30 40 50 60 70 80 900
0.01
0.02
0.03
0.04
0.05
0.06RMS energy, son13
Temporal location of events (in s.)
coef
ficie
nt v
alue
pp
ff>>>
>>
>>>
Timbre Energy decomposed into frequencies (spectrum)
0 2000 4000 6000 8000 10000 120000
500
1000
1500
2000
2500
3000
3500
Spectrum
frequency (Hz)
magnitude
Low frequency energy: dark
sound
High frequency energy: bright sound
Energy in close frequency range: rough sound
Harmonic sound: regular frequency series vs. noise
Timbral gesture: brightness
Beethoven, 9th Symphony, Scherzo70 75 80 85 90 95 100
0.2
0.4
0.6
Brightness, son5
Temporal location of events (in s.)
coef
ficie
nt v
alue
dark
bright
20 40 60 80 100 120 1400.10.20.30.40.5
Brightness, son6
Temporal location of events (in s.)
coef
ficie
nt v
alue
Beethoven, 7th Symphony, Allegrettodark
bright
Pitch gesture
30 35 40 45 50
400
500
600
700
800
900
Pitch, ANemo1.wav
Temporal location of events (in s.)
coef
ficie
nt v
alue
(in
Hz)
Freq
uenc
ies
Hz
Low
High
Van Zijl, Toiviainen, Lartillot, Luck. The sound of emotion: The effect of performers’ experienced emotions on auditory performance characteristics. Music Perception (2014)
Harty, Miniature for oboe and piano, Orientale
Dynamics: Note attacks
CPE Bach, Cello Concerto in A major, WQ172, 3rd movement
0 5 10 150
0.2
0.4
0.6
0.8
1Onset curve (Envelope)
time (s)
ampl
itude
sf>
>>
o = mironsets(’mysong’, ‘Detect’, ‘No’)
do = mironsets(o, ‘Diff’)
ac = mirautocor(do)
pa = mirpeaks(ac, ’Total’, 1)
In short:
[t, pa] = mirtempo(’mysong’)
0 1 2 3 4 50
0.02
0.04
0.06
0.08Onset curve (Envelope)
time (s)
am
plit
ude
o
0 1 2 3 4 5!0.05
0
0.05
0.1
0.15Onset curve (Differentiated envelope)
time (s)
am
plit
ud
e
do
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6!0.4
!0.2
0
0.2
0.4
0.6Envelope autocorrelation
lag (s)
co
eff
icie
nts
ac
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6!0.4
!0.2
0
0.2
0.4
0.6Envelope autocorrelation
lag (s)
coeffic
ients
pa
t = 129.6333 bpm
Tempo estimation
ac
Tempo estimationo = mironsets(’mysong’, ‘Detect’, ‘No’)
do = mironsets(o, ‘Diff’)
f = mirframe(do)
ac = mirautocor(do)
pa = mirpeaks(ac, ’Total’, 1)
In short:
[t, pa] = mirtempo(’mysong’, ‘Frame’)
0 2 4 $ % 10 12 14!1
0
1
2Onset curve (Differentiated envelope)
time (s)
am
plit
ud
e
f
pa
0 2 4 6 8 10 12130
140
150
160
170
180Tempo
Temporal location of events (in s.)
coeffic
ient valu
e (
in b
pm
) t
Tonal gestureK
eys
5 10 15 20 25 30
CC#
DD#
EF
F#G
G#A
Temporal location of events (in s.)
coef
ficie
nt v
alue
Key, Umile01
majmin
5 10 15 20 25 30
CC#
DD#
EF
F#G
G#A
Temporal location of events (in s.)
coef
ficie
nt v
alue
Key, Umile01
majmin
Monteverdi, Hor che’l ciel e la terra, 1st part
20 40 60 80 100 120 140 160
CC#
DD#
EF
F#G
G#A
A#B
Temporal location of events (in s.)
coef
ficie
nt v
alue
Key, son5.wav
Beethoven, 9th Symphony, Scherzo20 40 60 80 100 120 140
CC#
DD#
EF
F#G
G#A
A#B
Temporal location of events (in s.)
coef
ficie
nt v
alue
Key, son27.wav
Schönberg, Verklärte Nacht, Sehr Ruhig
20 40 60 80 100 120C#
DD#
EF
F#G
G#A
A#B
Temporal location of events (in s.)
coef
ficie
nt v
alue
Key, son32.wav
Tiersen, Comptine d’un autre été : L’après-midi
* Sykes, Glowinski, Grandjean, Lartillot, Eliard. Is a contemporary listener able to distinguish between the musical emotional figures created by Monteverdi? SysMus 2013
*
Sound
Notes
Audio level
“Symbolic” level
Mode Metre Motifs
Patterns
Dynamics Timbre Pitch
Segments
MIRtoolbox
Timbral structure
0 20 40 60 80 100 120 140 160
2
4
6
8
10
12
MFCC
time axis (in s.)
coef
ficie
nt ra
nks MFCC representation (timbre)
Similarity matrix
temporal location of frame centers (in s.)tem
pora
l loc
atio
n of
fram
e ce
nter
s (in
s
20 40 60 80 100 120 140 160
50
100
150 Similarity matrix
20 40 60 80 100 120 140 1600
0.2
0.4
0.6
0.8
1Novelty, son8
Temporal location of events (in s.)
coef
ficie
nt v
alue
Novelty curveBrahms, Symphony No.3
in F major, Poco Allegretto
Lartillot, Cereghetti, Eliard, Grandjean. A simple, high-yield method for assessing structural novelty. International Conference on Music & Emotion (2013)
0 50 100 150CM
C#MDM
D#MEMFM
F#MGM
G#MAM
A#MBMCm
C#mDm
D#mEmFm
F#mGm
G#mAm
A#mBm
Key strength
time axis (in s.)
tona
l cen
ter
Similarity matrix
temporal location of frame centers (in s.)
tem
pora
l loc
atio
n of
fram
e ce
nter
s (in
s.)
20 40 60 80 100 120 140
20
40
60
80
100
120
140 Similarity matrix
20 40 60 80 100 120 1400
0.5
1Novelty, son4
Temporal location of events (in s.)
coef
ficie
nt v
alue Novelty curve
Bazzini, Dance of the GoblinsTonal structure
Sequential repetition
Notes“Symbolic” level
Mode Metre Motifs
Patterns
Groups Segments
SoundAudio level
Dynamics Timbre Pitch
MIRtoolboxMiningSuite
Groups Segments
! !"!"!! ! #$ ! "!"!42!# "!"!!
!% !! !" ! ! # " !& !
" ! ! ! !!5
$ !"
!" ! !
"
!! &
'"!!
#
! # !! !!$7 !(!!
Music engraving by LilyPond 2.18.2—www.lilypond.org
Mozart, Variation XI on “Ah, vous dirai-je maman”, K.265/300e
Local grouping
Sound
Notes
Audio level
“Symbolic” level
Groups Segments
Dynamics Timbre Pitch
Ornament reduction
MiningSuite
! !"!"!! ! #$ ! "!"!42!# "!"!!
!% !! !" ! ! # " !& !
" ! ! ! !!5
$ !"
!" ! !
"
!! &
'"!!
#
! # !! !!$7 !(!!
Music engraving by LilyPond 2.18.2—www.lilypond.org
Mozart, Variation XI on “Ah, vous dirai-je maman”, K.265/300e
Ornamentation reduction
syntagmatic network
head
Sound
Notes
Audio level
“Symbolic” level
Groups Segments
Dynamics Timbre Pitch
Ornament reduction
MiningSuite
Motifs
Patterns
!! ! !" !#! !"!!!$ !L1%& !'$"
!! ! ! !($ ' !! ! !" " ! !$) %M1
! !" ! ! !!
!#! ! ! !($ ' !! ! !" ! ! !$) %
U1
!" ! ! !!
!#!' # *!#!
#!! !L2% $!& !
!#! !"!!$"! ! ! ' ! !$) %
U2 !" !
(!!!!!(!' !M2%) $ !'$
"
#!#!!!+!!!U3
%) $ " !!'$
,!,! ! !$! ! !" ' !# !!& %L3
!!! !
Music engraving by LilyPond 2.18.2—www.lilypond.org
J.S. Bach, Well-Tempered Clavier, Book II, Fugue XX Detected subject entries
Lartillot, ISMIR 2014
Sound
Notes
Audio level
“Symbolic” level
Groups Segments
Motifs
Patterns
Dynamics
Emotion
Timbre Pitch
Ornament reduction
Mode Metre
MiningSuiteFormStyleLrn2
Cre8
MiningSuitecode.google.com/p/miningsuite
SIGMINR!signal processing
AUDMINR!auditory modelling
SEQMINR!sequence processing
PATMINR!pattern mining
MUSMINR!music analysis
MiningSuitecode.google.com/p/miningsuite
• New syntactic layer in the source code itself
• Matlab optimisation & code readability
• Significantly optimised (in speed and memory) Fully rewritten using recent Matlab object-oriented programming capabilities
• Open source. Developers’ community
• Tutorial at ISMIR 2014 conferenceLrn2Cre8