22
1 CSP05-06 - Auditory input processing Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

Embed Size (px)

Citation preview

Page 1: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

1

CSP05-06 - Auditory input processing

Auditory input processing

Lecturer:Smilen Dimitrov

Cross-sensorial processing – MED7

Page 2: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

2

CSP05-06 - Auditory input processing

Introduction

• The immobot base exercise• Work on the auditory input• Goal – sound source localization

in 3D• Setup:

– PC– Two microphones– Sound card

Page 3: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

3

CSP05-06 - Auditory input processing

Setup – microphone problems

• We need to use two microphones to obtain a stereo signal• For regular PC microphones (like our Sandbergs):

– Take note they are electret!– They demand +5V from the PC in order to work– All PC mic inputs follow this standard:

although we have a tip-ring-sleeve jack connector, it is NOT a stereo jack.

• Thus a PC mic input will always show as mono (stereo button will be greyed out in Recording control of Windows mixer)

Page 4: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

4

CSP05-06 - Auditory input processing

Setup – microphone problems

• We need to use two microphones to obtain a stereo signal• For regular PC microphones (like our Sandbergs):

• Hence the connection cable below will NOT work (as it assumes that the electret connector is a stereo one)

Page 5: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

5

CSP05-06 - Auditory input processing

Setup – microphone problems

• Hence, we will have to use :– a dedicated audio card, – with two microphone inputs,

even if we want to use cheap electrets for stereo!

• One possible soundcard: M-Audio mobilePre USB

Page 6: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

6

CSP05-06 - Auditory input processing

Setup – microphone problems

• Interfacing two electrets for stereo input:– would involve a schematic cable like below:

• (assuming we have a stereo plug mic input on the card)

Page 7: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

7

CSP05-06 - Auditory input processing

Setup – microphone problems

• To avoid these problems with electrets, we are going to use capacitor microphones (Generis)

• Note that these microphones must be connected using an XLR cable (the M-Audio card has such mic inputs)

• Note that condenser/capacitor microphones demand a power supply – so called “phantom power” (the M-Audio card has such facility)

• Thus, we should make sure the sound card and the microphones are compatible.

Page 8: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

8

CSP05-06 - Auditory input processing

Setup

• Setup for a PC:

(In addition to the microphones and the sound card): 1. M-Audio MobilePre USB drivers2. Max/MSP/Jitter

• Microphone parameters need not be specified in the algorithm discussed today.

Page 9: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

9

CSP05-06 - Auditory input processing

Goal of the auditory processing algorithm

• Object detection: – the application needs to detect the presence of

a new object whenever it enters the monitored environment (say, a sound louder that threshold)

• Object recognition: – Once a new object is detected, it needs to be

classified to determine its type (e.g., a car versus a truck, a tiger versus a deer) (involves comparing sounds – spectrum signatures)

• Object tracking: – Assuming the new object is of interest to the

application, it can be tracked as it moves through the environment. Tracking involves computing current location of the object and its trajectory

Preprocess-audio

Estimation of 3D location

through ITD / cross-correlation

• Relation to the model we had for visual input processing – Not really applicable for the algorithm discussed, but could

be – here we will directly do tracking

Page 10: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

10

CSP05-06 - Auditory input processing

Goal of the auditory processing algorithm

Page 11: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

11

CSP05-06 - Auditory input processingSound-source localization using ITD and cross-correlation

• Small comparison between stereo camera and microphones system

– Camera – 2D sensor (2D array of photocells)– Single camera can give a vector of direction to tracked object– Two cameras can give a point (intersection of direction vectors

– CPA)

– Microphone – 1D sensor (senses values at a single point – corresponds to a single photocell in camera)

– Single microphone cannot give any geometric information– Two microphones can only give azimuthal angle – which

corresponds to a vector of direction, confined to the “horizontal” plane

Page 12: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

12

CSP05-06 - Auditory input processingSound-source localization using ITD and cross-correlation

• Algorithm – computing the the time delay of arrival (TDOA) of the wave front at the two microphones– In biological terms this is the equivalent of the Interaural Time

Difference (ITD)

– We compute the lag of the wave at a specific point received at both microphones (the Interaural Phase Difference (IPD) )

– Must find the time difference between two identical points in the left and right sound signal – using cross-correlation

Page 13: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

13

CSP05-06 - Auditory input processingSound-source localization using ITD and cross-correlation

• Cross-correlation – two arrays, representing the left and right audio signal: g and h – their correlation is also an array

• The length of the cross-correlation array is

1

0

)(,N

kkkjj hgthgCorr

1))()(()( BlengthAlengthClength

Page 14: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

14

CSP05-06 - Auditory input processingSound-source localization using ITD and cross-correlation

• Cross-correlation – in essence, what we are doing is taking one array, and “sliding” it across the another, finding the sum of the products between respective elements.

Page 15: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

15

CSP05-06 - Auditory input processingSound-source localization using ITD and cross-correlation

• Cross-correlation – algorithm• First, find the time increment between sampling:

• Assume the sound can be analyzed through the diagram below:

• Sound arriving at left channel, will arrive at right channel after crossing distance b – we know the speed of sound, so we can also calculate time difference

s53

102676.2101.44

1

Page 16: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

16

CSP05-06 - Auditory input processingSound-source localization using ITD and cross-correlation

• Cross-correlation – algorithm• Assume the sound can be analyzed through the diagram below:

• Trigonometry: b

a

c

b

c

a tan,cos,sin

Page 17: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

17

CSP05-06 - Auditory input processingSound-source localization using ITD and cross-correlation

• Cross-correlation – algorithm• Assume the sound can be analyzed through the diagram below:

• The time difference:– Where Δ = time between sound sampling,, and σ = the

number of delay samples returned from the cross-correlation function.

t

Page 18: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

18

CSP05-06 - Auditory input processingSound-source localization using ITD and cross-correlation

• Cross-correlation – algorithm

• Calc length of line a

– Speed of sound v = 384m/s at room temperature

• Finally, calc the angle θ

– Where c is a known distance between the microphones

soundsound vvta

c

a

c

a 1sinsin c

vsound

1sin

Page 19: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

19

CSP05-06 - Auditory input processingSound-source localization using ITD and cross-correlation

• When θ is finally computed, we obtain a direction vector, by rotating the unit vector in the horizontal plane (xz), around the vertical axis (y) for amount θ

• So, the vector DA with components (-sin θ, 0, cos θ) will represent the direction of detected audio source

cos

0

sin

1

0

0

cos0sin

010

sin0cos

z

y

x

DA

Page 20: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

20

CSP05-06 - Auditory input processingSound-source localization using ITD and cross-correlation

• Overview of the algorithm (architecture)

Page 21: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

21

CSP05-06 - Auditory input processingSound-source localization using ITD and cross-correlation

• Problems with the approach

• We only retrieve a direction vector in a plane (azimuthal angle) – information about the “vertical” position of the sound source is lost

• 3D localization of audio as a 3D point is possible using two microphones, if some medium (that changes sound) is placed between the microphones (a “head”), and then a head-related transfer function is calculated.

Page 22: CSP05-06 - Auditory input processing 1 Auditory input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

22

CSP05-06 - Auditory input processing

Implementation in Max/MSP

• Will program own MSP object, to perform audio cross-correlation realtime – then proceed to vector calculation and display