ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Lecture 1: Course Introduction

Dan Ellis Dept. Electrical Engineering, Columbia University

[email protected] http://www.ee.columbia.edu/~dpwe/e4896/1

1. Course Structure2. DSP: The Short-Time Fourier Transform

ELEN E4896 MUSIC SIGNAL PROCESSING

mailto:[email protected]

http://labrosa.ee.columbia.edu


Signal Processing and Music

• Music is a very rich signal

• Signal Processing can expose some of it

• We want to get inside it2

freq

/ kHz

0

1

2

3

4

freq

/ kHz

0

1

2

3

4fre

q / k

Hz

0

1

2

3

4

freq

/ kHz

0

1

2

3

4

time / sec0 2 4 6 8 10 12 14 16 18 20


1. Course Goals• Survey of applications

of signal processing to music audiomusic synthesismusic/audio processing (modification)music audio analysis

• Connect basic DSP theory to sound phenomena and effects

• Hands-on, live investigations of audio processing algorithmsusing Matlab, Pd, ...

3


Course Structure• Two weekly sessions

Monday: presentations, practical exposition, discussionWednesday: presentations, practical, sharing

• Grade structure20% practicals participation10% one presentation30% three mini-project assignments40% one final project

4


Flipped Classroom• Video Lectures are required viewing

before Monday session

30-50 mins each

recorded in 2013

bring one question (at least) to class

pop quizzes

5


Hands-on Practicals• Music Signal Processing involves

connecting algorithms with perceptions

• Our goal is to be able to ‘play’ with algorithms to feel how they workIn class, on laptops, in small groups

• We will use several platforms:

6

MatlabPureData (Pd) Python


Projects• There will be 3 “mini projects” assigned

approx. week 3, week 5, week 7provide basic pieces, but open-endedinvolve implementation & experimentation

• Also, Final projecton a topic of your choiceimplement some kind of music signal processingdue at end of semester

• Good projects ...... start with clear goals... apply ideas from class... evaluate performance & modify accordingly

• Could continue into summer...7


Project Examples

• Music Transcription

8

Figure 1: Screenshot of the PD patch implementing the Scheirer algorithm.Much of the logic is hidden inside the “combfilterbank” object at the bottom.

2.2 Tempo Selection

Each of the six onset pulse signals (one for each band) is fed into its ownbank of 150 comb filters – you can think of these filters as mapping to 150possible tempos that we want to detect. A comb filter is essentially just a delayimplemented a circular buer. The delay time on each filters corresponds tothe tempo that that filter is meant to detect – so for example, the filter thatis detecting a tempo of 90 bpm has a delay time of 90

60 = 1.5 Hz, which yieldsa buer length of 44100

1.5 = 29400 samples. The tempo selector then, for eachbpm, sums the energy across all 6 subbands at that tempo. The tempo withthe highest total energy is chosen as the tempo of the piece.

Because of the sheer number of filters (150 6 = 900), it was impossible toimplement this using standard PD objects. Therefore I created a PD Externalwritten in C [3] and used this within my PD patch. The external has six inputs,one for each subband, and has two float outlets – one indicating the detectedtempo, and the other indicating the phase. I added a signal outlet as well for

2

• Beat Tracking

• Mood Classification

Fig. 4. Comparison of effectiveness of different combination of features.

the most effective combination of features. The result is shownin Figure 4 Although there are some features especially useful,I also observe that the number of features enhance the overallclassification accuracy. Therefore, the best combination is touse all the features.

Fig. 5. Composition of hits in each class.

VI. EVALUATION AND RESULTS

In the evaluation part, I will use 100 test samples (20samples for each mood class) to evaluate the effectivenessof my classifier. Essentially, there are two tasks I wouldlike to investigate in the evaluation. First, I use the bestcombination of feature found in the feature selection part,which is to use all of the features. The average hit rate is0.64. However, I noticed that the accuracy of each moodclass are dramatically different. The hit rates for angry and

Fig. 6. Comparison of effectiveness of different features for distinguishingtwo classes

happy are almost 1 while hit rate for calm is 0. Thus I furtherinvestigate the composition of hits in each class. The result isshown in Figure 5It is quite intuitive that happy songs will bemistaken as angry songs since they both have fast tempo. Inthe same logic, it is also intuitive that calm songs, sad songsand scary songs are easily obfuscated. However, it is hard toenvision that calm songs would be largely classified as angry.This phenomenon is still left to be investigated. Second, Iwould like to investigate which features are capable to tellthe difference between any two different classes. Differentcombination of features will be applied in the above two tasksto find out the most effective combination of features. Theresult is shown in Figure 6. Previously we have known thattimber features are very useful. In this investigation, we furtherfound that the harmony features are useful when the two songsare melodically different. For example, harmony features areeffective in terms of classifying calm songs and happy songs,while they are not useful when distinguishing calm songs andsad songs.

VII. CONCLUSION

In this project, I implemented and evaluated a music moodclassifier. Specifically, I focus on investigating the effective-ness of each feature. Compared with other feature selectedworks, we found that tempo feature does not always work wellwhen classifying music moods, while timbre features play asa key role of recognizing music moods. Also we found thatnumber of features also affects classification accuracy. Whenfurther analyzing the classification results, we found that someclasses are easily recognized to another one, and then thefeatures that are suitable for distinguishing two mood class arefound. For example, harmony features are useful when the twosongs are melodically different. However, there are still someunexplainable phenomenon that two classes are obfuscated.In the future work, I plan to find the explanation for this


Presentations• Everyone will make at least one

in-class presentatione.g. presentation of a relevant paper... or your latest mini-project... or just something you’re curious about

• We will assign time slots first-come-first servedslots each classcheck course calendar to choose a timeemail TA

• Encourage discussioninformal presentations of practical findings

9


Web Site• Information distributed via:

http://www.ee.columbia.edu/~dpwe/e4896/

10

http://www.ee.columbia.edu/~dpwe/e4896/


Course Outline

• Anything missing?

11

Jan 20: Intro Mar 07: Pitch tracking

Jan 25: Acoustics Mar 21: Time & pitch scaling

Feb 01: Perception Mar 28: Beat tracking

Feb 08: Analog synthesis Apr 04: Chroma & Chords

Feb 15: Sinusoidal models Apr 11: Audio alignment

Feb 22: LPC models Apr 18: Music fingerprinting

Feb 29: Filtering & Reverb Apr 25: Source separation


2. Digital Signals

• Sampling interval

• Sampling frequency

• Quantizer

12

time

xd[n] = Q( xc(nT ) )

Discrete-time samplinglimits bandwidth

Discrete-levelquantization

limitsdynamic range

T

0 = 2/TT

Q(x) = · round(x/)


Fourier Series• Observation:

Any periodic signal can be constructed from sinusoids at integer multiples of the fundamental frequency

• E.g., square wave

13

k = 0; ak =

(1)

k12 1

k k = 1, 3, 5, . . .

0 otherwise

k1 2 3 5 6 74

|ak|1.0

1.5 1 0.5 0 0.5 1 1.5 1

0.5

0

0.5

t

x(t)

x(t) = x(t + T ) x(t) M

k=0

ak cos(2k

Tt + k)


Fourier Transform• Complex form of Fourier Series + analysis

• Let Harmonics become infinitely close:

are equivalent, dual descriptions14

x(t) M

k=M

ckej 2kT t ck =

1T

T/2

T/2x(t)ej 2k

T tdt

T

x(t) =12

X(j)ejtd

X(j) =

x(t)ejtdt

x(t), X(j)

0 0.002 0.004 0.006 0.008time / sec

level / dB

-0.01

0

0.01

0.02

x(t)

0 2000 4000 6000 8000freq / Hz

-80

-60

-40

-20 |X(jΩ)|


Spectrum of Periodic Sounds• If sound is (nearly) periodic (i.e., pitched),

Fourier Transform approaches Fourier Series

• n.b.: plotted in log units:

15

x(t) x(t + T )X(j)

k ck( k 2

T )

1.52 1.54 1.56 1.58-0.1

0

0.1

0 1000 2000 3000 4000-100

-80

-60

-40

time / s freq / Hzlev

el / d

B

ampli

tude

|X(j)|dB(x) = 20 log10(x)


Discrete-Time Fourier Transf. (DTFT)• Sampled has same FT form

• ... but results in spectrum with period 2π:

16

x[n] = x(nT )

x[n] =12

X(ej)ejnd

X(ej) =

n=x[n]ejn

0 π

|X(ejω)|

ω

2π 3π 4π 5π

1

2

3 πargX(ejω)

ω

−π

π 2π 3π 4π 5π


Discrete Fourier Transform (DFT)• N-point finite-length sampled signal

the kind you can have on a computer!

• Just a matrix-multiply between vectors

17

x[n] =1N

N1

k=0

X[k]WnkN

X[k] =N1

n=0

x[n]WnkN

WN = ej 2N

X[0]X[1]X[2]

...X[N 1]

=

1 1 1 · · · 11 W 1

N W 2N · · · W (N1)

N

1 W 2N W 4

N · · · W 2(N1)N

......

.... . .

...1 W (N1)

N W 2(N1)N · · · W (N1)2

N

x[0]x[1]x[2]

...x[N 1]


Short-Time Fourier Transform• To localize energy in time and frequency...

chop signal into N-point windows every L pointstake DFT of each one

18

2.35 2.4 2.45

0

4000

3000

2000

1000

2.5 2.55 2.6-0.1

0

0.1

time / s

freq

/ Hz

k →

short-timewindow

DFT

m = 0 m = 1 m = 2 m = 3

L 2L 3L

X[k, m] =N1

n=0

x[n + mL]w[n]ej 2knN


The Spectrogram• Plot STFT |X[k, m]| as a grayscale image

19

time / s

time / s

freq

/ Hz

intensity / dB

2.35 2.4 2.45 2.5 2.55 2.60

1000

2000

3000

4000

freq

/ Hz

0

1000

2000

3000

4000

0

0.1

-50

-40

-30

-20

-10

0

10

0 0.5 1 1.5 2 2.5


Time-Frequency Tradeoff• Shorter window

better resolution of time detail worse resolution of frequency detail

201.4 1.6 1.8 2 2.2 2.4 2.6

freq

/ Hz

time / s

level/ dB

0

1000

2000

3000

4000

freq

/ Hz

0

1000

2000

3000

4000

0

0.2

Win

dow

= 2

56 p

t N

arro

wba

nd

Win

dow

= 4

8 pt

Wid

eban

d

-50

-40

-30

-20

-10

0

10


Spectrogram in Matlab-

• https://www.ee.columbia.edu/~dpwe/e4810/matlab/M14-fft.diary

21

https://www.ee.columbia.edu/~dpwe/e4810/matlab/M14-fft.diary


Live Matlab Spectrogram

• http://www.wakayama-u.ac.jp/~kawahara/MatlabRealtimeSpeechTools/

22

http://www.wakayama-u.ac.jp/~kawahara/MatlabRealtimeSpeechTools/


Summary + Assignment• Music Signal Processing

synthesis, modification, analysishands-on investigation

• Courseparticipation, practicals, projects, presentations

• Digital Signal Processingsignals on computersFourier analysis & spectrogram

• AssignmentWatch Acoustics video & prepare questionDo Pd tutorial http://en.flossmanuals.net/pure-data/ through to “Amplitude Modulation”

23

http://en.flossmanuals.net/pure-data/

Documents

ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP