Communicating in a Natural Cocktail Party: Relating Human and Animal Behavior to Neural Response

Cocktail Party: Shinn-Cunningham et al. CNS Colloquium, 14 April 2006

Communicating ina Natural Cocktail Party:Relating Human and Animal

Behavior to Neural Response

Barbara Shinn-CunninghamBoston University

Auditory Neuroscience Laboratory


Erol Ozmeral

Erick Gallun

Gin Best

Michele Dent

Liz McClaine

Kamal Sen

Rajiv Narayan

With funding from ONR, AFOSR, & NIH


(Cocktail Party by SLAW, Maniscalco Gallery)

The everyday acoustic environment is full of competition and clutter

The “Cocktail Party Problem”


Penguins and other social birds suffer from the cocktail party problem

(thanks to M. Dent)

Penguins recognize their mates and offspring amidst thousands of birds

Chicks identify parents from 11 m -- when call is 6 dB below the level of the background noise


Zebra finches learn to make a call by listening to a tutor… while in a

large colony

Zebra finches are a model system for studying• vocal production learning

• hierarchical encoding of complex signals (e.g., “birds own song” neurons; Narayan et al., 2006)


How do we figure out what is in the world from the sound mixtures we hear?

air pressure

time (sec)


How do we figure out what is in the world from the sound mixtures we hear?

air pressure

time (sec)

Syllables / words heard as units

Confusions occur between sources (streaming over time)


Frequency analysis breaks sound into parallel

channels

time (sec)

air

pressure When sounds

overlap in their spectral content, neural responses are a mixture

low

high

mechanical vibration in air

neural firing (electrical spikes) in auditory nerve


Spectrotemporal structure of sound is critical

(in contrast with simple, “traditional” stimuli)

Supports segregation of competing source “units”• harmonicity• common onsets• comodulation

Reduces likelihood of spectrotemporal overlap (important elements are unlikely to be masked)

Moreover, the important information is contained in the spectrotemporal structure of sound.

BUT…

removing linguistic/semantic effects may tease out different contributing mechanisms


This project uses birdsongs from male zebra

finch

Can compare results with avian behavior (Dent lab) and neurophysiological responses (Sen lab)

Spectrotemporal structure supports segregation


Listeners were trained to identify individual bird

songs

Moe Uno

Toro Nibbles

Junior


A quick test for you…

What is this?

Moe Uno

Toro Nibbles

Junior

What is this?


Short-term “units” segregate, but streaming errors occur for similar

sources What is this?

Moe Uno

Toro Nibbles

Junior

Moe Nibbles

What is this?

Uno

+

noise

+


Three maskers, to tease apart different types of

interference



interference



interference


Why these maskers?

Noise

Mod. Noise

Chorus

Spectrotemporalstructure

DenseMuch overlap

SparseKey target features audible

More sparseTemporal control for chorus

Main type of interference

Reduce audibility

Cause streaming confusions

?

Effect of spatial separationImprove audibility through acoustic better-ear effects

Allow spatial attention to combat confusions

?


Better-ear effects: the Target-to-Masker Energy

Ratio improves with separation

separated co-located



Better-ear effects: the Target-to-Masker Energy

Ratio improves with separation




Binaural effects: Interaural decorrelation causes masked signal to

be audible

Running cross-correlation output for 500-Hz channel

(simple model of brainstem processing in Medial Superior Olive)


Binaural effects: Interaural decorrelation causes masked signal to

be audible

Running cross-correlation output for 500-Hz channel

(simple model of brainstem processing in Medial Superior Olive)

Important for signals below about 1500 Hz, but the birdsongs have a lot of high-frequency information


Hypothesized role of spatial attention in

complex settings

?


Measure performance with and without spatial

separation of target / masker

Masker

Masker


Quantify spatial unmasking =

improvement in threshold

M

Co-located

M

Separated


Quantify spatial unmasking =

improvement in threshold

M

Co-located

M

Separated

Spatial Unmasking

M

M

-


For birdsong, better-ear energy effects are large

M

M

Better earNo better ear


Is there an additional benefit of perceived

separation? Diotic versus binaural

M

Diotic




M

Diotic

- Better-ear benefit

- No binaural processing

- Sources perceived at same location




M

Diotic


- No binaural processing

- Sources perceived at same location

M

Binaural


- Maybe binaural processing

- Sources perceived at different locations


Why these maskers?

Noise

Mod. Noise

Chorus

Main type of interference

Reduce audibility

Cause streaming confusions

?

Effect of spatial separationImprove audibility through acoustic better-ear effects

Allow spatial attention to combat confusions

?

Diotic vs. binaural performance?

Identical

Binaural much better than diotic performance

?


Masker

Noise Mod Noise Chorus

Threshold (dB)

-20

-10

0

10

Humans

worse performance

(need louder target)

For co-located target/masker,

the chorus causedthe most interference

Target threshold

(dB re: masker target)


Spatial separation causes unmasking due to better-ear acoustics (diotic

presentation)

IdentifyTarget

improvement withspatialseparation

Best et al. 2005


Spatial separation causes unmasking due to better-ear acoustics (diotic

presentation)

IdentifyTarget

Size of acoustic effect decreases as masker becomes sparser (audibility less of a problem)


Best et al. 2005


For dissimilar maskers,there is no added benefit

of perceived spatial

separation

No advantage from spatial attention or binaural processing (high-frequency content)

IdentifyTarget


Best et al. 2005


For a chorus masker, perceived location differences improve

identification

Perceived separation adds 10 dB of spatial unmasking, for confusable masker

IdentifyTarget


No advantage from spatial attention or binaural processing (high-frequency content)

Best et al. 2005


LEDs on the speakers:- no information- which speaker- which time- or both

Ask listener to identify a song from a random

location, occurring at a random time

… …

…… …

Five simultaneous, similar sources, every 15 deg


For identification of familiar birdsongs in a

chorus,when and where both help

For best subjects, when cue less important; they report “pop out” of familiar songs


For identifying digits intime-reversed digits,

when doesn’t help

For all subjects, when cue less important; forward digits “pop out” of reversed speech


Ongoing work

How prior knowledge affects spatial attention

The role of visual cuing of spatial attention

Divided auditory attention

Comparisons with visual attention

Modeling spatial release from different interference


Psychophysics shows different maskers cause

different forms of perceptual interference Noise and modulated noise

• reduce audibility of song elements• are dissimilar from targets• are easily segregated • don’t cause confusion• show spatial release due to acoustic better-ear improvements

Chorus (or reversed speech)• is sparse enough that overall interference is not as great• consists of “units” (syllables) like those in the targets• is hard to segregate from target• causes confusion between target and masker• shows spatial release through spatial attention


Comparing to avian behavior

Dent Lab

SUNY Buffalo


PECK left key to begin variable waiting period (2-7 s) HEAR a call from

one of six individuals

PECK left or right key

Correct: food reward

Incorrect: lights extinguished

Dent Lab: Teaching birds to recognize the songs

RECOGNIZE and CATEGORIZE call


Zebra finch and budgerigars learn zebra

finch songs

*Average sessions to criterion = 34.25

100 Trial Sessions

0 10 20 30 40 50 60

% Correct

30

40

50

60

70

80

90

100

MaddoxTrumanMiloZolaP

ercent Correct

100 Trial Sessions

0 10 20 30 40

% Correct

30

40

50

60

70

80

90

100

BuckyBundy CosmoDixie

= 22


Masker

Noise Mod Noise Chorus

Threshold (dB)

-20

-15

-10

-5

0

5

10

15

Humans

Zebra Finches

Budgerigars

worse performance

(need louder target)

Relative effectiveness of the maskers differs

across species Target threshold

(dB re: masker target)


Next stage: measuring whether the birds use spatial attention like

humans

*Dent et al., Behav. Neurosci., 1997

N

S + N

S + N1.00 2.00 2.86 4.00

Masked Threshold (in dB)

0

5

10

15

20

25

30

35 Unilateral Sound SourceBilateral Sound Source

Binaural Hearing

Frequency (in kHz)

Budgerigars exhibit spatial release for noise maskers


Avian psychophysics shows different maskers cause

different levels of interference

Degree of interference differs from species to species

Next stage will explore whether effect of spatial separation differs with masker type, as in humans


Comparing to avian physiology

Sen Lab

Biomedical Engineering


Recording from zebra finch forebrain Field L (homologue of primary

auditory cortex)Record neural spike trains in response to multiple copies of clean songs from five birds

Record neural spike trains in response to repetitions of each song embedded in each masker


Each neuron has a set of spectrotemporal features

to which it responds

Frequency

Time

Broadband onset neuron Narrowband neuron


Compare clean-song templates to target +

masker


Compute single-neuron classification performance


The chorus causes the least interference in

performance

Target-to-Masker Energy Ratio

-10 -5 0 5 10 clean0

50

100

Percent Correct

chorusmod. noisenoise


Narrowband neurons perform better, with larger differences

between masker types

Target-to-Masker Energy Ratio

-10 -5 0 5 10 clean -5 0 5 10 clean0

50

100

Percent Correct

-10

chorusnoisemod. noise


Best single neuron performance is very good

chorusnoisemod. noise

bestneuron

ave.neuron


But overall percent correct classification

does not describe kind of interference


Information in neural spike train is in timing / pattern

Frequency

Time


Information in neural spike train is in timing / pattern

Frequency

Time

Representation of the neuron’s tuning (e.g., features in time-frequency)Spike output


Hypothesize that noise masker suppresses

spikes to target features

Frequency

Time

Target content

Masker content

Target / Masker mixture


Hypothesize that modulated noise masker adds extra spikes at

noise onsets

Frequency

Time

Target content

Masker content



Hypothesize that chorus masker adds spikes, but fewer than the modulated

noise

Frequency

Time

Target content

Masker content



Spike ratecolor plot(clean song)

Wideband neuron example: response to clean song

Targetpressure waveform

Averagespike rate

Time


Wideband response to song in noise: spike

suppressionTargetpressure waveform

Averagespike rate

Spike rateas functionof SNR

Time


Wideband response to song in mod noise: spike

addition (some suppression)Targetpressure waveform

Averagespike rate


Time


Wideband response to song in chorus: spike suppression!!

Targetpressure waveform

Averagespike rate


Time


Narrowband response to song in noise: spike

suppressionTargetpressure waveform

Averagespike rate


Time


Narrowband response to song in mod noise: spike

additionTargetpressure waveform

Averagespike rate


Time


Narrowband response to song in chorus: spike

suppression!!Targetpressure waveform

Averagespike rate


Time


In between target syllables, all maskers

add some spikes

Mod noise adds the most spikes

Noise adds almost as many

The chorus adds the least

Ave. for clean targets


For Mod Noise, there is no effect of target level

clean ave.

clean ave.


Within syllables, the rate increases with

target level for Chorus and Noise

clean ave.

clean ave.


Neurophysiology shows neural correlates of different forms of

perceptual interference Single Field L neurons contain enough information to classify complex bird songs

Noise suppresses responses to song features

Modulated noise causes spurious extra spikes

Chorus response shows surprising amount of suppression rather than expected spike addition

Results consistent with extraordinary nonlinearity in response to complex features in chorus (further supported by estimates of linear spectrotemporal receptive field analysis)


Future Work

Compare release from interference with spatial separation in avian species and humans

Measure effects of spatial position of sources on avian forebrain neural responses

Develop awake-behaving neurophysiological preparation to explore attention and single-trial events

Relate human psychophysics to fMRI measures (with David Somers)

Develop model based on neurophysiological results that describes factors affecting listening in complex settings


Space helps even when all elements are audible…

if sources are similar

air pressure

time (sec)

Documents

Communicating in a Natural Cocktail Party: Relating Human and Animal Behavior to Neural Response