23
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23 Lecture 1: Course Introduction Dan Ellis Dept. Electrical Engineering, Columbia University [email protected] http://www.ee.columbia.edu /~dpwe/e4896/ 1 1. Course Structure 2. DSP: The Short-Time Fourier Transform ELEN E4896 MUSIC SIGNAL PROCESSING

ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Lecture 1: Course Introduction

Dan Ellis Dept. Electrical Engineering, Columbia University

[email protected] http://www.ee.columbia.edu/~dpwe/e4896/1

1. Course Structure2. DSP: The Short-Time Fourier Transform

ELEN E4896 MUSIC SIGNAL PROCESSING

Page 2: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Signal Processing and Music

• Music is a very rich signal

• Signal Processing can expose some of it

• We want to get inside it2

freq

/ kHz

0

1

2

3

4

freq

/ kHz

0

1

2

3

4fre

q / k

Hz

0

1

2

3

4

freq

/ kHz

0

1

2

3

4

time / sec0 2 4 6 8 10 12 14 16 18 20

Page 3: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

1. Course Goals• Survey of applications

of signal processing to music audiomusic synthesismusic/audio processing (modification)music audio analysis

• Connect basic DSP theory to sound phenomena and effects

• Hands-on, live investigations of audio processing algorithmsusing Matlab, Pd, ...

3

Page 4: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Course Structure• Two weekly sessions

Monday: presentations, practical exposition, discussionWednesday: presentations, practical, sharing

• Grade structure20% practicals participation10% one presentation30% three mini-project assignments40% one final project

4

Page 5: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Flipped Classroom• Video Lectures are required viewing

before Monday session

30-50 mins each

recorded in 2013

bring one question (at least) to class

pop quizzes

5

Page 6: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Hands-on Practicals• Music Signal Processing involves

connecting algorithms with perceptions

• Our goal is to be able to ‘play’ with algorithms to feel how they workIn class, on laptops, in small groups

• We will use several platforms:

6

MatlabPureData (Pd) Python

Page 7: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Projects• There will be 3 “mini projects” assigned

approx. week 3, week 5, week 7provide basic pieces, but open-endedinvolve implementation & experimentation

• Also, Final projecton a topic of your choiceimplement some kind of music signal processingdue at end of semester

• Good projects ...... start with clear goals... apply ideas from class... evaluate performance & modify accordingly

• Could continue into summer...7

Page 8: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Project Examples

• Music Transcription

8

Figure 1: Screenshot of the PD patch implementing the Scheirer algorithm.Much of the logic is hidden inside the “combfilterbank” object at the bottom.

2.2 Tempo Selection

Each of the six onset pulse signals (one for each band) is fed into its ownbank of 150 comb filters – you can think of these filters as mapping to 150possible tempos that we want to detect. A comb filter is essentially just a delayimplemented a circular buer. The delay time on each filters corresponds tothe tempo that that filter is meant to detect – so for example, the filter thatis detecting a tempo of 90 bpm has a delay time of 90

60 = 1.5 Hz, which yieldsa buer length of 44100

1.5 = 29400 samples. The tempo selector then, for eachbpm, sums the energy across all 6 subbands at that tempo. The tempo withthe highest total energy is chosen as the tempo of the piece.

Because of the sheer number of filters (150 6 = 900), it was impossible toimplement this using standard PD objects. Therefore I created a PD Externalwritten in C [3] and used this within my PD patch. The external has six inputs,one for each subband, and has two float outlets – one indicating the detectedtempo, and the other indicating the phase. I added a signal outlet as well for

2

• Beat Tracking

• Mood Classification

Fig. 4. Comparison of effectiveness of different combination of features.

the most effective combination of features. The result is shownin Figure 4 Although there are some features especially useful,I also observe that the number of features enhance the overallclassification accuracy. Therefore, the best combination is touse all the features.

Fig. 5. Composition of hits in each class.

VI. EVALUATION AND RESULTS

In the evaluation part, I will use 100 test samples (20samples for each mood class) to evaluate the effectivenessof my classifier. Essentially, there are two tasks I wouldlike to investigate in the evaluation. First, I use the bestcombination of feature found in the feature selection part,which is to use all of the features. The average hit rate is0.64. However, I noticed that the accuracy of each moodclass are dramatically different. The hit rates for angry and

Fig. 6. Comparison of effectiveness of different features for distinguishingtwo classes

happy are almost 1 while hit rate for calm is 0. Thus I furtherinvestigate the composition of hits in each class. The result isshown in Figure 5It is quite intuitive that happy songs will bemistaken as angry songs since they both have fast tempo. Inthe same logic, it is also intuitive that calm songs, sad songsand scary songs are easily obfuscated. However, it is hard toenvision that calm songs would be largely classified as angry.This phenomenon is still left to be investigated. Second, Iwould like to investigate which features are capable to tellthe difference between any two different classes. Differentcombination of features will be applied in the above two tasksto find out the most effective combination of features. Theresult is shown in Figure 6. Previously we have known thattimber features are very useful. In this investigation, we furtherfound that the harmony features are useful when the two songsare melodically different. For example, harmony features areeffective in terms of classifying calm songs and happy songs,while they are not useful when distinguishing calm songs andsad songs.

VII. CONCLUSION

In this project, I implemented and evaluated a music moodclassifier. Specifically, I focus on investigating the effective-ness of each feature. Compared with other feature selectedworks, we found that tempo feature does not always work wellwhen classifying music moods, while timbre features play asa key role of recognizing music moods. Also we found thatnumber of features also affects classification accuracy. Whenfurther analyzing the classification results, we found that someclasses are easily recognized to another one, and then thefeatures that are suitable for distinguishing two mood class arefound. For example, harmony features are useful when the twosongs are melodically different. However, there are still someunexplainable phenomenon that two classes are obfuscated.In the future work, I plan to find the explanation for this

Page 9: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Presentations• Everyone will make at least one

in-class presentatione.g. presentation of a relevant paper... or your latest mini-project... or just something you’re curious about

• We will assign time slots first-come-first servedslots each classcheck course calendar to choose a timeemail TA

• Encourage discussioninformal presentations of practical findings

9

Page 10: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Web Site• Information distributed via:

http://www.ee.columbia.edu/~dpwe/e4896/

10

Page 11: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Course Outline

• Anything missing?

11

Jan 20: Intro Mar 07: Pitch tracking

Jan 25: Acoustics Mar 21: Time & pitch scaling

Feb 01: Perception Mar 28: Beat tracking

Feb 08: Analog synthesis Apr 04: Chroma & Chords

Feb 15: Sinusoidal models Apr 11: Audio alignment

Feb 22: LPC models Apr 18: Music fingerprinting

Feb 29: Filtering & Reverb Apr 25: Source separation

Page 12: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

2. Digital Signals

• Sampling interval

• Sampling frequency

• Quantizer

12

time

xd[n] = Q( xc(nT ) )

Discrete-time samplinglimits bandwidth

Discrete-levelquantization

limitsdynamic range

T

0 = 2/TT

Q(x) = · round(x/)

Page 13: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Fourier Series• Observation:

Any periodic signal can be constructed from sinusoids at integer multiples of the fundamental frequency

• E.g., square wave

13

k = 0; ak =

(1)

k12 1

k k = 1, 3, 5, . . .

0 otherwise

k1 2 3 5 6 74

|ak|1.0

1.5 1 0.5 0 0.5 1 1.5 1

0.5

0

0.5

t

x(t)

x(t) = x(t + T ) x(t) M

k=0

ak cos(2k

Tt + k)

Page 14: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Fourier Transform• Complex form of Fourier Series + analysis

• Let Harmonics become infinitely close:

are equivalent, dual descriptions14

x(t) M

k=M

ckej 2kT t ck =

1T

T/2

T/2x(t)ej 2k

T tdt

T

x(t) =12

X(j)ejtd

X(j) =

x(t)ejtdt

x(t), X(j)

0 0.002 0.004 0.006 0.008time / sec

level / dB

-0.01

0

0.01

0.02

x(t)

0 2000 4000 6000 8000freq / Hz

-80

-60

-40

-20 |X(jΩ)|

Page 15: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Spectrum of Periodic Sounds• If sound is (nearly) periodic (i.e., pitched),

Fourier Transform approaches Fourier Series

• n.b.: plotted in log units:

15

x(t) x(t + T )X(j)

k ck( k 2

T )

1.52 1.54 1.56 1.58-0.1

0

0.1

0 1000 2000 3000 4000-100

-80

-60

-40

time / s freq / Hzlev

el / d

B

ampli

tude

|X(j)|dB(x) = 20 log10(x)

Page 16: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Discrete-Time Fourier Transf. (DTFT)• Sampled has same FT form

• ... but results in spectrum with period 2π:

16

x[n] = x(nT )

x[n] =12

X(ej)ejnd

X(ej) =

n=x[n]ejn

0 π

|X(ejω)|

ω

2π 3π 4π 5π

1

2

3 πargX(ejω)

ω

−π

π 2π 3π 4π 5π

Page 17: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Discrete Fourier Transform (DFT)• N-point finite-length sampled signal

the kind you can have on a computer!

• Just a matrix-multiply between vectors

17

x[n] =1N

N1

k=0

X[k]WnkN

X[k] =N1

n=0

x[n]WnkN

WN = ej 2N

X[0]X[1]X[2]

...X[N 1]

=

1 1 1 · · · 11 W 1

N W 2N · · · W (N1)

N

1 W 2N W 4

N · · · W 2(N1)N

......

.... . .

...1 W (N1)

N W 2(N1)N · · · W (N1)2

N

x[0]x[1]x[2]

...x[N 1]

Page 18: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Short-Time Fourier Transform• To localize energy in time and frequency...

chop signal into N-point windows every L pointstake DFT of each one

18

2.35 2.4 2.45

0

4000

3000

2000

1000

2.5 2.55 2.6-0.1

0

0.1

time / s

freq

/ Hz

k →

short-timewindow

DFT

m = 0 m = 1 m = 2 m = 3

L 2L 3L

X[k, m] =N1

n=0

x[n + mL]w[n]ej 2knN

Page 19: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

The Spectrogram• Plot STFT |X[k, m]| as a grayscale image

19

time / s

time / s

freq

/ Hz

intensity / dB

2.35 2.4 2.45 2.5 2.55 2.60

1000

2000

3000

4000

freq

/ Hz

0

1000

2000

3000

4000

0

0.1

-50

-40

-30

-20

-10

0

10

0 0.5 1 1.5 2 2.5

Page 20: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Time-Frequency Tradeoff• Shorter window

better resolution of time detail worse resolution of frequency detail

201.4 1.6 1.8 2 2.2 2.4 2.6

freq

/ Hz

time / s

level/ dB

0

1000

2000

3000

4000

freq

/ Hz

0

1000

2000

3000

4000

0

0.2

Win

dow

= 2

56 p

t N

arro

wba

nd

Win

dow

= 4

8 pt

Wid

eban

d

-50

-40

-30

-20

-10

0

10

Page 21: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Spectrogram in Matlab-

• https://www.ee.columbia.edu/~dpwe/e4810/matlab/M14-fft.diary

21

Page 22: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Live Matlab Spectrogram

• http://www.wakayama-u.ac.jp/~kawahara/MatlabRealtimeSpeechTools/

22

Page 23: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Outline & DSP

E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23

Summary + Assignment• Music Signal Processing

synthesis, modification, analysishands-on investigation

• Courseparticipation, practicals, projects, presentations

• Digital Signal Processingsignals on computersFourier analysis & spectrogram

• AssignmentWatch Acoustics video & prepare questionDo Pd tutorial http://en.flossmanuals.net/pure-data/ through to “Amplitude Modulation”

23