68
Cortical Encoding of Auditory Objects at the Cocktail Party Jonathan Z. Simon University of Maryland Computational Audition Workshop, June 2013

Cortical Encoding of Auditory Objects at the Cocktail Party

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cortical Encoding of Auditory Objects at the Cocktail Party

Cortical Encoding of Auditory Objects at the

Cocktail PartyJonathan Z. Simon

University of Maryland

Computational Audition Workshop, June 2013

Page 2: Cortical Encoding of Auditory Objects at the Cocktail Party

Introduction

• Auditory Objects

• Magnetoencephalography (MEG)

• Neural Representations of Auditory Objects in Cortex: Decoding

• Neural Representations of Auditory Objects in Cortex: Encoding

Page 3: Cortical Encoding of Auditory Objects at the Cocktail Party

Auditory Objects

• What is an auditory object?

• perceptual construct (not neural, not acoustic)

• commonalities with visual objects

• several potential formal definitions

Page 4: Cortical Encoding of Auditory Objects at the Cocktail Party

Auditory Object Definition

• Griffiths & Warren definition:

• corresponds with something in the sensory world

• object information separate from information of rest of sensory world

• abstracted: object information generalized over particular sensory experiences

Page 5: Cortical Encoding of Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Page 6: Cortical Encoding of Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Page 7: Cortical Encoding of Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Page 8: Cortical Encoding of Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Page 9: Cortical Encoding of Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Page 10: Cortical Encoding of Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Page 11: Cortical Encoding of Auditory Objects at the Cocktail Party

Alex Katz, The Cocktail Party

Auditory Objects at the Cocktail Party

Page 12: Cortical Encoding of Auditory Objects at the Cocktail Party

Auditory Objects at the Cocktail Party

Ding & Simon, PNAS (2012)

Page 13: Cortical Encoding of Auditory Objects at the Cocktail Party

Experiments

Page 14: Cortical Encoding of Auditory Objects at the Cocktail Party

tissue

CSF

skull

scalpB

MEG

VEEG

recordingsurface

currentflow

orientationof magneticfield

MagneticDipolarField

Projection

•Direct electrophysiological measurement•not hemodynamic•real-time

•No unique solution for distributed source

Magnetoencephalography (MEG)

Photo by Fritz Goro

•Measures spatially synchronized cortical activity

•Fine temporal resolution (~ 1 ms)•Moderate spatial resolution (~ 1 cm)

Page 15: Cortical Encoding of Auditory Objects at the Cocktail Party

tissue

CSF

skull

scalpB

MEG

VEEG

recordingsurface

currentflow

orientationof magneticfield

MagneticDipolarField

Projection

•Direct electrophysiological measurement•not hemodynamic•real-time

•No unique solution for distributed source

Magnetoencephalography (MEG)

Photo by Fritz Goro

•Measures spatially synchronized cortical activity

•Fine temporal resolution (~ 1 ms)•Moderate spatial resolution (~ 1 cm)

Page 16: Cortical Encoding of Auditory Objects at the Cocktail Party

Phase-Locking in MEG to Slow Temporal Modulations

Ding & Simon, J Neurophysiol (2009)Wang et al., J Neurophysiol (2012)

AM  at  3  Hz   3  Hz  phase-­locked  response  

response  spectrum  (subject  R0747)  

MEG activity is precisely phase-locked to temporal modulations of sound

0 10

Frequency (Hz)

3 Hz

6 Hz

Page 17: Cortical Encoding of Auditory Objects at the Cocktail Party

Phase-Locking in MEG to Slow Temporal Modulations

Ding & Simon, J Neurophysiol (2009)Wang et al., J Neurophysiol (2012)

AM  at  3  Hz   3  Hz  phase-­locked  response  

response  spectrum  (subject  R0747)  

MEG activity is precisely phase-locked to temporal modulations of sound

0 10

Frequency (Hz)

3 Hz

6 Hz

Page 18: Cortical Encoding of Auditory Objects at the Cocktail Party

MEG Responses

AuditoryModel

to Speech Modulations

Page 19: Cortical Encoding of Auditory Objects at the Cocktail Party

Ding & Simon, J Neurophysiol (2012)

Spectro-Temporal Response Function (STRF)

(up to ~10 Hz)

MEG ResponsesPredicted by STRF Model

Page 20: Cortical Encoding of Auditory Objects at the Cocktail Party

Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (in press)

Neural Reconstruction of Speech Envelope

2 s

stimulus speech envelopereconstructed stimulus speech envelope

Reconstruction accuracy comparable to single unit & ECoG recordings

(up to ~ 10 Hz)

MEG Responses

...

DecoderSpeech Envelope

Page 21: Cortical Encoding of Auditory Objects at the Cocktail Party

Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (in press)

Neural Reconstruction of Speech Envelope

2 s

stimulus speech envelopereconstructed stimulus speech envelope

Reconstruction accuracy comparable to single unit & ECoG recordings

(up to ~ 10 Hz)

MEG Responses

...

DecoderSpeech Envelope

Page 22: Cortical Encoding of Auditory Objects at the Cocktail Party

Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (in press)

Neural Reconstruction of Speech Envelope

2 s

stimulus speech envelopereconstructed stimulus speech envelope

Reconstruction accuracy comparable to single unit & ECoG recordings

(up to ~ 10 Hz)

MEG Responses

...

DecoderSpeech Envelope

Page 23: Cortical Encoding of Auditory Objects at the Cocktail Party

Speech Stream as an Auditory Object

• corresponds with something in the sensory world

• information separate from information of rest of sensory worlde.g. other speech streams or noise

• abstracted: object information generalized over particular sensory experiencese.g. different sound mixtures

Page 24: Cortical Encoding of Auditory Objects at the Cocktail Party

• neural representation is of something in sensory world

• when other sounds mixed in, neural representation is of auditory object, not entire acoustic scene

• neural representation invariant under broad changes in specific acoustics

Neural Representation of an Auditory Object

Page 25: Cortical Encoding of Auditory Objects at the Cocktail Party

Selective Neural Encoding

Page 26: Cortical Encoding of Auditory Objects at the Cocktail Party

Selective Neural Encoding

Page 27: Cortical Encoding of Auditory Objects at the Cocktail Party

Selective Neural Encoding

Page 28: Cortical Encoding of Auditory Objects at the Cocktail Party

Unselective vs. Selective Neural Encoding

Page 29: Cortical Encoding of Auditory Objects at the Cocktail Party

Unselective vs. Selective Neural Encoding

Page 30: Cortical Encoding of Auditory Objects at the Cocktail Party

Selective Neural Encoding

Page 31: Cortical Encoding of Auditory Objects at the Cocktail Party

Stream-Specific Representation

grand average over subjects

representative subject

Identical Stimuli!

reconstructed from MEG

attended speech envelopes

reconstructed from MEG

attending tospeaker 1

attending tospeaker 2

Page 32: Cortical Encoding of Auditory Objects at the Cocktail Party

Stream-Specific Representation

grand average over subjects

representative subject

Identical Stimuli!

reconstructed from MEG

attended speech envelopes

reconstructed from MEG

attending tospeaker 1

attending tospeaker 2

Page 33: Cortical Encoding of Auditory Objects at the Cocktail Party

Single Trial Speech Reconstruction

Page 34: Cortical Encoding of Auditory Objects at the Cocktail Party

Single Trial Speech Reconstruction

Page 35: Cortical Encoding of Auditory Objects at the Cocktail Party

Overall Speech Reconstruction

0.2

0

0.1

correlation

attended  speechreconstruction

backgroundreconstruction

attended  speech background  

Distinct neural representations for different speech streams

Page 36: Cortical Encoding of Auditory Objects at the Cocktail Party

Invariance Under Acoustic Changes

Page 37: Cortical Encoding of Auditory Objects at the Cocktail Party

Invariance Under Acoustic Changes

Page 38: Cortical Encoding of Auditory Objects at the Cocktail Party

Invariance Under Acoustic Changes

Page 39: Cortical Encoding of Auditory Objects at the Cocktail Party

Invariance Under Acoustic Changes

Page 40: Cortical Encoding of Auditory Objects at the Cocktail Party

Invariance Under Acoustic Changes

Page 41: Cortical Encoding of Auditory Objects at the Cocktail Party

correlation

.1

.2

-­8 -­5 0 5 8

attended

background

Stream-Based Gain Control?

correlation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

attended

background

Gain-Control Models

Obj

ect-B

ased

Stim

ulus

- Bas

ed

Speaker  Relative  Intensity    (dB)

Page 42: Cortical Encoding of Auditory Objects at the Cocktail Party

correlation

.1

.2

-­8 -­5 0 5 8

attended

background

Stream-Based Gain Control?

correlation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

attended

background

Gain-Control Models

Obj

ect-B

ased

Stim

ulus

- Bas

ed

attended

backgroundcorrelation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

Neural Results

Speaker  Relative  Intensity    (dB)

Page 43: Cortical Encoding of Auditory Objects at the Cocktail Party

correlation

.1

.2

-­8 -­5 0 5 8

attended

background

Stream-Based Gain Control?

correlation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

attended

background

Gain-Control Models

Obj

ect-B

ased

Stim

ulus

- Bas

ed

attended

backgroundcorrelation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

Neural Results

Speaker  Relative  Intensity    (dB)

Page 44: Cortical Encoding of Auditory Objects at the Cocktail Party

correlation

.1

.2

-­8 -­5 0 5 8

attended

background

Stream-Based Gain Control?

correlation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

attended

background

Gain-Control Models

Obj

ect-B

ased

Stim

ulus

- Bas

ed

attended

backgroundcorrelation

.1

.2

-­8 -­5 0 5 8Speaker  Relative  Intensity    (dB)

Neural Results

•Stream-based not stimulus-based•Neural representation is invariant to acoustic changes.

Speaker  Relative  Intensity    (dB)

Page 45: Cortical Encoding of Auditory Objects at the Cocktail Party

✓ neural representation is of something in sensory world

✓ when other sounds mixed in, neural representation is of auditory object, not entire acoustic scene

✓ neural representation invariant under broad changes in specific acoustics

Neural Representation of an Auditory Object

Page 46: Cortical Encoding of Auditory Objects at the Cocktail Party

Forward STRF Model

Spectro-Temporal Response Function (STRF)

Page 47: Cortical Encoding of Auditory Objects at the Cocktail Party

Forward STRF Model

Spectro-Temporal Response Function (STRF)

Page 48: Cortical Encoding of Auditory Objects at the Cocktail Party

STRF Results

•STRF separable (time, frequency)•300 Hz - 2 kHz dominant carriers•M50STRF positive peak•M100STRF negative peak

TRF

•M100STRF strongly modulated by attention, but not M50STRF

attended

unattended

.2

.5

1

3

0 100 200

Background

frequency  (kHz)

.2

.5

1

3

0 100 200

Attended

time  (ms) time  (ms)

Page 49: Cortical Encoding of Auditory Objects at the Cocktail Party

STRF Results

•STRF separable (time, frequency)•300 Hz - 2 kHz dominant carriers•M50STRF positive peak•M100STRF negative peak

TRF

•M100STRF strongly modulated by attention, but not M50STRF

attended

unattended

.2

.5

1

3

0 100 200

Background

frequency  (kHz)

.2

.5

1

3

0 100 200

Attended

time  (ms) time  (ms)

Page 50: Cortical Encoding of Auditory Objects at the Cocktail Party

STRF Results

•STRF separable (time, frequency)•300 Hz - 2 kHz dominant carriers•M50STRF positive peak•M100STRF negative peak

TRF

•M100STRF strongly modulated by attention, but not M50STRF

attended

unattended

.2

.5

1

3

0 100 200

Background

frequency  (kHz)

.2

.5

1

3

0 100 200

Attended

time  (ms) time  (ms)

Page 51: Cortical Encoding of Auditory Objects at the Cocktail Party

Neural Sources

RightLeft

anterior

posterior

medial

M50STRFM100STRFM100

•M100STRF source near (same as?) M100 source: PT

•M50STRF source is anterior and medial to M100 (same as M50?): HG

5 mm

Page 52: Cortical Encoding of Auditory Objects at the Cocktail Party

Cortical Object-Processing Hierarchy

0 100 200 400time  (ms)

0

attendedbackground

Attentional  Modulation

0 100 200 400

0

time  (ms)

clean

-­5  dB-­8  dB

Influence  of  Relative  Intensity

0  dB5  dB8  dB

•M100STRF strongly modulated by attention, but not M50STRF.•M100STRF invariant against acoustic changes.•Objects well-neurally represented at 100 ms, but not 50 ms.

Page 53: Cortical Encoding of Auditory Objects at the Cocktail Party

Not Just SpeechCompeting Tone Streams

Xiang et al., J Neuroscience (2010) Elhilali et al., PLoS Biology (2009)

Tone Stream in Masker Cloud

Time

Frequency

Time

Frequency

Page 54: Cortical Encoding of Auditory Objects at the Cocktail Party

Tone Stream in Masker Cloud

Time

Frequency

Page 55: Cortical Encoding of Auditory Objects at the Cocktail Party

Tone Stream in Masker Cloud

Time

Frequency

Page 56: Cortical Encoding of Auditory Objects at the Cocktail Party

Tone Stream in Masker Cloud

Time

Frequency

Page 57: Cortical Encoding of Auditory Objects at the Cocktail Party

Tone Stream in Masker Cloud

Time

Frequency

Page 58: Cortical Encoding of Auditory Objects at the Cocktail Party

Competing Tone Streams

Time

Frequency

Page 59: Cortical Encoding of Auditory Objects at the Cocktail Party

Competing Tone Streams

Time

Frequency

Page 60: Cortical Encoding of Auditory Objects at the Cocktail Party

Competing Tone Streams

Time

Frequency

Page 61: Cortical Encoding of Auditory Objects at the Cocktail Party

Summary

• Cortical representations of speech found here:

✓ consistent with being neural representations of auditory (perceptual) objects

✓ meet 3 formal criteria for auditory objects

• Object representation fully formed by 100 ms latency (PT), but not by 50 ms (HG)

• Not special to speech

Page 62: Cortical Encoding of Auditory Objects at the Cocktail Party

Acknowledgements

FundingNIH R01 DC 008342

CollaboratorsCatherine CarrAlain de CheveignéDidier DepireuxMounya ElhilaliJonathan FritzCindy MossDavid PoeppelShihab Shamma

Past PostdocsDan HertzYadong Wang

Grad StudentsFrancisco CervantesMarisel Villafane Delgado Kim DrnecKrishna Puvvada

Past Grad StudentsNayef AhmarClaudia BoninMaria ChaitNai DingVictor Grau-SerratLing MaRaul RodriguezJuanjuan XiangKai Sum LiJiachen Zhuo

Undergraduate StudentsAbdulaziz Al-Turki Nicholas AsendorfSonja BohrElizabeth CamengaCorinne CameronJulien DagenaisKatya DombrowskiKevin HoganKevin KahnAndrea ShomeMadeleine VarmerBen Walsh

Collaborators’ StudentsMurat AytekinJulian JenkinsDavid KleinHuan Luo

Page 63: Cortical Encoding of Auditory Objects at the Cocktail Party

Thank You

Page 64: Cortical Encoding of Auditory Objects at the Cocktail Party

Reconstruction of Same-Sex Speech

0 0.1 0.2 0.3 0.40

0.1

0.2

0.3

0.4

Speaker

Tw

o

Speaker One

Single  Trial  Decoding  Results

Two

One

Attended  Speaker

Attended  SpeechReconstruction

Correlation

0.1

0.2

attended  speech

background  speech

Page 65: Cortical Encoding of Auditory Objects at the Cocktail Party

Speech in Noise: Stimuli

Page 66: Cortical Encoding of Auditory Objects at the Cocktail Party

Speech in Noise: Results

Page 67: Cortical Encoding of Auditory Objects at the Cocktail Party

Speech in Noise: Results

Page 68: Cortical Encoding of Auditory Objects at the Cocktail Party

Speech in Noise: Results