Cortical Encoding of Auditory Objects at the Cocktail Party

Cortical Encoding of Auditory Objects at the

Cocktail PartyJonathan Z. Simon

University of Maryland

Computational Audition Workshop, June 2013

Introduction

• Auditory Objects

• Magnetoencephalography (MEG)

• Neural Representations of Auditory Objects in Cortex: Decoding

• Neural Representations of Auditory Objects in Cortex: Encoding

Auditory Objects

• What is an auditory object?

• perceptual construct (not neural, not acoustic)

• commonalities with visual objects

• several potential formal definitions

Auditory Object Definition

• Griffiths & Warren definition:

• corresponds with something in the sensory world

• object information separate from information of rest of sensory world

• abstracted: object information generalized over particular sensory experiences

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party














Ding & Simon, PNAS (2012)

Experiments

tissue

CSF

skull

scalpB

MEG

VEEG

recordingsurface

currentflow

orientationof magneticfield

MagneticDipolarField

Projection

•Direct electrophysiological measurement•not hemodynamic•real-time

•No unique solution for distributed source

Magnetoencephalography (MEG)

Photo by Fritz Goro

•Measures spatially synchronized cortical activity

•Fine temporal resolution (~ 1 ms)•Moderate spatial resolution (~ 1 cm)

tissue

CSF

skull

scalpB

MEG

VEEG

recordingsurface

currentflow

orientationof magneticfield

MagneticDipolarField

Projection

•Direct electrophysiological measurement•not hemodynamic•real-time

•No unique solution for distributed source

Magnetoencephalography (MEG)

Photo by Fritz Goro

•Measures spatially synchronized cortical activity

•Fine temporal resolution (~ 1 ms)•Moderate spatial resolution (~ 1 cm)

Phase-Locking in MEG to Slow Temporal Modulations

Ding & Simon, J Neurophysiol (2009)Wang et al., J Neurophysiol (2012)

AM at 3 Hz 3 Hz phase-locked response

response spectrum (subject R0747)

MEG activity is precisely phase-locked to temporal modulations of sound

0 10

Frequency (Hz)

3 Hz

6 Hz

Phase-Locking in MEG to Slow Temporal Modulations

Ding & Simon, J Neurophysiol (2009)Wang et al., J Neurophysiol (2012)

AM at 3 Hz 3 Hz phase-locked response

response spectrum (subject R0747)

MEG activity is precisely phase-locked to temporal modulations of sound

0 10

Frequency (Hz)

3 Hz

6 Hz

MEG Responses

AuditoryModel

to Speech Modulations

Ding & Simon, J Neurophysiol (2012)

Spectro-Temporal Response Function (STRF)

(up to ~10 Hz)

MEG ResponsesPredicted by STRF Model

Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (in press)

Neural Reconstruction of Speech Envelope

2 s

stimulus speech envelopereconstructed stimulus speech envelope

Reconstruction accuracy comparable to single unit & ECoG recordings

(up to ~ 10 Hz)

MEG Responses

...

DecoderSpeech Envelope



2 s



(up to ~ 10 Hz)

MEG Responses

...




2 s



(up to ~ 10 Hz)

MEG Responses

...


Speech Stream as an Auditory Object

• corresponds with something in the sensory world

• information separate from information of rest of sensory worlde.g. other speech streams or noise

• abstracted: object information generalized over particular sensory experiencese.g. different sound mixtures

• neural representation is of something in sensory world

• when other sounds mixed in, neural representation is of auditory object, not entire acoustic scene

• neural representation invariant under broad changes in specific acoustics

Neural Representation of an Auditory Object

Selective Neural Encoding



Unselective vs. Selective Neural Encoding

Unselective vs. Selective Neural Encoding


Stream-Specific Representation

grand average over subjects

representative subject

Identical Stimuli!

reconstructed from MEG

attended speech envelopes


attending tospeaker 1


Stream-Specific Representation

grand average over subjects

representative subject

Identical Stimuli!


attended speech envelopes




Single Trial Speech Reconstruction

Single Trial Speech Reconstruction

Overall Speech Reconstruction

0.2

0

0.1

correlation

attended speechreconstruction

backgroundreconstruction

attended speech background

Distinct neural representations for different speech streams

Invariance Under Acoustic Changes





correlation

.1

.2

-8 -5 0 5 8

attended

background

Stream-Based Gain Control?

correlation

.1

.2

-8 -5 0 5 8Speaker Relative Intensity (dB)

attended

background

Gain-Control Models

Obj

ect-B

ased

Stim

ulus

- Bas

ed

Speaker Relative Intensity (dB)

correlation

.1

.2

-8 -5 0 5 8

attended

background


correlation

.1

.2


attended

background

Gain-Control Models

Obj

ect-B

ased

Stim

ulus

- Bas

ed

attended

backgroundcorrelation

.1

.2


Neural Results


correlation

.1

.2

-8 -5 0 5 8

attended

background


correlation

.1

.2


attended

background

Gain-Control Models

Obj

ect-B

ased

Stim

ulus

- Bas

ed

attended


.1

.2


Neural Results


correlation

.1

.2

-8 -5 0 5 8

attended

background


correlation

.1

.2


attended

background

Gain-Control Models

Obj

ect-B

ased

Stim

ulus

- Bas

ed

attended


.1

.2


Neural Results

•Stream-based not stimulus-based•Neural representation is invariant to acoustic changes.


✓ neural representation is of something in sensory world

✓ when other sounds mixed in, neural representation is of auditory object, not entire acoustic scene

✓ neural representation invariant under broad changes in specific acoustics

Neural Representation of an Auditory Object

Forward STRF Model


Forward STRF Model


STRF Results

•STRF separable (time, frequency)•300 Hz - 2 kHz dominant carriers•M50STRF positive peak•M100STRF negative peak

TRF

•M100STRF strongly modulated by attention, but not M50STRF

attended

unattended

.2

.5

1

3

0 100 200

Background

frequency (kHz)

.2

.5

1

3

0 100 200

Attended

time (ms) time (ms)

STRF Results


TRF


attended

unattended

.2

.5

1

3

0 100 200

Background

frequency (kHz)

.2

.5

1

3

0 100 200

Attended

time (ms) time (ms)

STRF Results


TRF


attended

unattended

.2

.5

1

3

0 100 200

Background

frequency (kHz)

.2

.5

1

3

0 100 200

Attended

time (ms) time (ms)

Neural Sources

RightLeft

anterior

posterior

medial

M50STRFM100STRFM100

•M100STRF source near (same as?) M100 source: PT

•M50STRF source is anterior and medial to M100 (same as M50?): HG

5 mm

Cortical Object-Processing Hierarchy

0 100 200 400time (ms)

0

attendedbackground

Attentional Modulation

0 100 200 400

0

time (ms)

clean

-5 dB-8 dB

Influence of Relative Intensity

0 dB5 dB8 dB

•M100STRF strongly modulated by attention, but not M50STRF.•M100STRF invariant against acoustic changes.•Objects well-neurally represented at 100 ms, but not 50 ms.

Not Just SpeechCompeting Tone Streams

Xiang et al., J Neuroscience (2010) Elhilali et al., PLoS Biology (2009)

Tone Stream in Masker Cloud

Time

Frequency

Time

Frequency


Time

Frequency


Time

Frequency


Time

Frequency


Time

Frequency

Competing Tone Streams

Time

Frequency


Time

Frequency


Time

Frequency

Summary

• Cortical representations of speech found here:

✓ consistent with being neural representations of auditory (perceptual) objects

✓ meet 3 formal criteria for auditory objects

• Object representation fully formed by 100 ms latency (PT), but not by 50 ms (HG)

• Not special to speech

Acknowledgements

FundingNIH R01 DC 008342

CollaboratorsCatherine CarrAlain de CheveignéDidier DepireuxMounya ElhilaliJonathan FritzCindy MossDavid PoeppelShihab Shamma

Past PostdocsDan HertzYadong Wang

Grad StudentsFrancisco CervantesMarisel Villafane Delgado Kim DrnecKrishna Puvvada

Past Grad StudentsNayef AhmarClaudia BoninMaria ChaitNai DingVictor Grau-SerratLing MaRaul RodriguezJuanjuan XiangKai Sum LiJiachen Zhuo

Undergraduate StudentsAbdulaziz Al-Turki Nicholas AsendorfSonja BohrElizabeth CamengaCorinne CameronJulien DagenaisKatya DombrowskiKevin HoganKevin KahnAndrea ShomeMadeleine VarmerBen Walsh

Collaborators’ StudentsMurat AytekinJulian JenkinsDavid KleinHuan Luo

Thank You

Reconstruction of Same-Sex Speech

0 0.1 0.2 0.3 0.40

0.1

0.2

0.3

0.4

Speaker

Tw

o

Speaker One

Single Trial Decoding Results

Two

One

Attended Speaker

Attended SpeechReconstruction

Correlation

0.1

0.2

attended speech

background speech

Speech in Noise: Stimuli

Speech in Noise: Results



Documents

Cortical Encoding of Auditory Objects at the Cocktail Party