Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Cortical Encoding of Auditory Objects at the
Cocktail PartyJonathan Z. Simon
University of Maryland
Computational Audition Workshop, June 2013
Introduction
• Auditory Objects
• Magnetoencephalography (MEG)
• Neural Representations of Auditory Objects in Cortex: Decoding
• Neural Representations of Auditory Objects in Cortex: Encoding
Auditory Objects
• What is an auditory object?
• perceptual construct (not neural, not acoustic)
• commonalities with visual objects
• several potential formal definitions
Auditory Object Definition
• Griffiths & Warren definition:
• corresponds with something in the sensory world
• object information separate from information of rest of sensory world
• abstracted: object information generalized over particular sensory experiences
Alex Katz, The Cocktail Party
Auditory Objects at the Cocktail Party
Alex Katz, The Cocktail Party
Auditory Objects at the Cocktail Party
Alex Katz, The Cocktail Party
Auditory Objects at the Cocktail Party
Alex Katz, The Cocktail Party
Auditory Objects at the Cocktail Party
Alex Katz, The Cocktail Party
Auditory Objects at the Cocktail Party
Alex Katz, The Cocktail Party
Auditory Objects at the Cocktail Party
Alex Katz, The Cocktail Party
Auditory Objects at the Cocktail Party
Auditory Objects at the Cocktail Party
Ding & Simon, PNAS (2012)
Experiments
tissue
CSF
skull
scalpB
MEG
VEEG
recordingsurface
currentflow
orientationof magneticfield
MagneticDipolarField
Projection
•Direct electrophysiological measurement•not hemodynamic•real-time
•No unique solution for distributed source
Magnetoencephalography (MEG)
Photo by Fritz Goro
•Measures spatially synchronized cortical activity
•Fine temporal resolution (~ 1 ms)•Moderate spatial resolution (~ 1 cm)
tissue
CSF
skull
scalpB
MEG
VEEG
recordingsurface
currentflow
orientationof magneticfield
MagneticDipolarField
Projection
•Direct electrophysiological measurement•not hemodynamic•real-time
•No unique solution for distributed source
Magnetoencephalography (MEG)
Photo by Fritz Goro
•Measures spatially synchronized cortical activity
•Fine temporal resolution (~ 1 ms)•Moderate spatial resolution (~ 1 cm)
Phase-Locking in MEG to Slow Temporal Modulations
Ding & Simon, J Neurophysiol (2009)Wang et al., J Neurophysiol (2012)
AM at 3 Hz 3 Hz phase-locked response
response spectrum (subject R0747)
MEG activity is precisely phase-locked to temporal modulations of sound
0 10
Frequency (Hz)
3 Hz
6 Hz
Phase-Locking in MEG to Slow Temporal Modulations
Ding & Simon, J Neurophysiol (2009)Wang et al., J Neurophysiol (2012)
AM at 3 Hz 3 Hz phase-locked response
response spectrum (subject R0747)
MEG activity is precisely phase-locked to temporal modulations of sound
0 10
Frequency (Hz)
3 Hz
6 Hz
MEG Responses
AuditoryModel
to Speech Modulations
Ding & Simon, J Neurophysiol (2012)
Spectro-Temporal Response Function (STRF)
(up to ~10 Hz)
MEG ResponsesPredicted by STRF Model
Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (in press)
Neural Reconstruction of Speech Envelope
2 s
stimulus speech envelopereconstructed stimulus speech envelope
Reconstruction accuracy comparable to single unit & ECoG recordings
(up to ~ 10 Hz)
MEG Responses
...
DecoderSpeech Envelope
Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (in press)
Neural Reconstruction of Speech Envelope
2 s
stimulus speech envelopereconstructed stimulus speech envelope
Reconstruction accuracy comparable to single unit & ECoG recordings
(up to ~ 10 Hz)
MEG Responses
...
DecoderSpeech Envelope
Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (in press)
Neural Reconstruction of Speech Envelope
2 s
stimulus speech envelopereconstructed stimulus speech envelope
Reconstruction accuracy comparable to single unit & ECoG recordings
(up to ~ 10 Hz)
MEG Responses
...
DecoderSpeech Envelope
Speech Stream as an Auditory Object
• corresponds with something in the sensory world
• information separate from information of rest of sensory worlde.g. other speech streams or noise
• abstracted: object information generalized over particular sensory experiencese.g. different sound mixtures
• neural representation is of something in sensory world
• when other sounds mixed in, neural representation is of auditory object, not entire acoustic scene
• neural representation invariant under broad changes in specific acoustics
Neural Representation of an Auditory Object
Selective Neural Encoding
Selective Neural Encoding
Selective Neural Encoding
Unselective vs. Selective Neural Encoding
Unselective vs. Selective Neural Encoding
Selective Neural Encoding
Stream-Specific Representation
grand average over subjects
representative subject
Identical Stimuli!
reconstructed from MEG
attended speech envelopes
reconstructed from MEG
attending tospeaker 1
attending tospeaker 2
Stream-Specific Representation
grand average over subjects
representative subject
Identical Stimuli!
reconstructed from MEG
attended speech envelopes
reconstructed from MEG
attending tospeaker 1
attending tospeaker 2
Single Trial Speech Reconstruction
Single Trial Speech Reconstruction
Overall Speech Reconstruction
0.2
0
0.1
correlation
attended speechreconstruction
backgroundreconstruction
attended speech background
Distinct neural representations for different speech streams
Invariance Under Acoustic Changes
Invariance Under Acoustic Changes
Invariance Under Acoustic Changes
Invariance Under Acoustic Changes
Invariance Under Acoustic Changes
correlation
.1
.2
-8 -5 0 5 8
attended
background
Stream-Based Gain Control?
correlation
.1
.2
-8 -5 0 5 8Speaker Relative Intensity (dB)
attended
background
Gain-Control Models
Obj
ect-B
ased
Stim
ulus
- Bas
ed
Speaker Relative Intensity (dB)
correlation
.1
.2
-8 -5 0 5 8
attended
background
Stream-Based Gain Control?
correlation
.1
.2
-8 -5 0 5 8Speaker Relative Intensity (dB)
attended
background
Gain-Control Models
Obj
ect-B
ased
Stim
ulus
- Bas
ed
attended
backgroundcorrelation
.1
.2
-8 -5 0 5 8Speaker Relative Intensity (dB)
Neural Results
Speaker Relative Intensity (dB)
correlation
.1
.2
-8 -5 0 5 8
attended
background
Stream-Based Gain Control?
correlation
.1
.2
-8 -5 0 5 8Speaker Relative Intensity (dB)
attended
background
Gain-Control Models
Obj
ect-B
ased
Stim
ulus
- Bas
ed
attended
backgroundcorrelation
.1
.2
-8 -5 0 5 8Speaker Relative Intensity (dB)
Neural Results
Speaker Relative Intensity (dB)
correlation
.1
.2
-8 -5 0 5 8
attended
background
Stream-Based Gain Control?
correlation
.1
.2
-8 -5 0 5 8Speaker Relative Intensity (dB)
attended
background
Gain-Control Models
Obj
ect-B
ased
Stim
ulus
- Bas
ed
attended
backgroundcorrelation
.1
.2
-8 -5 0 5 8Speaker Relative Intensity (dB)
Neural Results
•Stream-based not stimulus-based•Neural representation is invariant to acoustic changes.
Speaker Relative Intensity (dB)
✓ neural representation is of something in sensory world
✓ when other sounds mixed in, neural representation is of auditory object, not entire acoustic scene
✓ neural representation invariant under broad changes in specific acoustics
Neural Representation of an Auditory Object
Forward STRF Model
Spectro-Temporal Response Function (STRF)
Forward STRF Model
Spectro-Temporal Response Function (STRF)
STRF Results
•STRF separable (time, frequency)•300 Hz - 2 kHz dominant carriers•M50STRF positive peak•M100STRF negative peak
TRF
•M100STRF strongly modulated by attention, but not M50STRF
attended
unattended
.2
.5
1
3
0 100 200
Background
frequency (kHz)
.2
.5
1
3
0 100 200
Attended
time (ms) time (ms)
STRF Results
•STRF separable (time, frequency)•300 Hz - 2 kHz dominant carriers•M50STRF positive peak•M100STRF negative peak
TRF
•M100STRF strongly modulated by attention, but not M50STRF
attended
unattended
.2
.5
1
3
0 100 200
Background
frequency (kHz)
.2
.5
1
3
0 100 200
Attended
time (ms) time (ms)
STRF Results
•STRF separable (time, frequency)•300 Hz - 2 kHz dominant carriers•M50STRF positive peak•M100STRF negative peak
TRF
•M100STRF strongly modulated by attention, but not M50STRF
attended
unattended
.2
.5
1
3
0 100 200
Background
frequency (kHz)
.2
.5
1
3
0 100 200
Attended
time (ms) time (ms)
Neural Sources
RightLeft
anterior
posterior
medial
M50STRFM100STRFM100
•M100STRF source near (same as?) M100 source: PT
•M50STRF source is anterior and medial to M100 (same as M50?): HG
5 mm
Cortical Object-Processing Hierarchy
0 100 200 400time (ms)
0
attendedbackground
Attentional Modulation
0 100 200 400
0
time (ms)
clean
-5 dB-8 dB
Influence of Relative Intensity
0 dB5 dB8 dB
•M100STRF strongly modulated by attention, but not M50STRF.•M100STRF invariant against acoustic changes.•Objects well-neurally represented at 100 ms, but not 50 ms.
Not Just SpeechCompeting Tone Streams
Xiang et al., J Neuroscience (2010) Elhilali et al., PLoS Biology (2009)
Tone Stream in Masker Cloud
Time
Frequency
Time
Frequency
Tone Stream in Masker Cloud
Time
Frequency
Tone Stream in Masker Cloud
Time
Frequency
Tone Stream in Masker Cloud
Time
Frequency
Tone Stream in Masker Cloud
Time
Frequency
Competing Tone Streams
Time
Frequency
Competing Tone Streams
Time
Frequency
Competing Tone Streams
Time
Frequency
Summary
• Cortical representations of speech found here:
✓ consistent with being neural representations of auditory (perceptual) objects
✓ meet 3 formal criteria for auditory objects
• Object representation fully formed by 100 ms latency (PT), but not by 50 ms (HG)
• Not special to speech
Acknowledgements
FundingNIH R01 DC 008342
CollaboratorsCatherine CarrAlain de CheveignéDidier DepireuxMounya ElhilaliJonathan FritzCindy MossDavid PoeppelShihab Shamma
Past PostdocsDan HertzYadong Wang
Grad StudentsFrancisco CervantesMarisel Villafane Delgado Kim DrnecKrishna Puvvada
Past Grad StudentsNayef AhmarClaudia BoninMaria ChaitNai DingVictor Grau-SerratLing MaRaul RodriguezJuanjuan XiangKai Sum LiJiachen Zhuo
Undergraduate StudentsAbdulaziz Al-Turki Nicholas AsendorfSonja BohrElizabeth CamengaCorinne CameronJulien DagenaisKatya DombrowskiKevin HoganKevin KahnAndrea ShomeMadeleine VarmerBen Walsh
Collaborators’ StudentsMurat AytekinJulian JenkinsDavid KleinHuan Luo
Thank You
Reconstruction of Same-Sex Speech
0 0.1 0.2 0.3 0.40
0.1
0.2
0.3
0.4
Speaker
Tw
o
Speaker One
Single Trial Decoding Results
Two
One
Attended Speaker
Attended SpeechReconstruction
Correlation
0.1
0.2
attended speech
background speech
Speech in Noise: Stimuli
Speech in Noise: Results
Speech in Noise: Results
Speech in Noise: Results