View
221
Download
3
Category
Tags:
Preview:
Citation preview
Audio WorkgroupAudio Workgroup
Neuro-inspired Speech RecognitionNeuro-inspired Speech Recognition
Group MembersGroup MembersIsmail Uysal Yoojin ChungRamin Pichevar Rich Hammett Tarek Massoud Ross GaylorDavid Anderson Shihab ShammaHynek Hermanski Shih-Chii LiuGiacomo Indiveri Malcolm Slaney
Audio WorkgroupAudio Workgroup
Audio ProjectsAudio Projects
LocalizationLocalization
Speech Speech RecognitionRecognition
Speech Speech RecognitionRecognition More ASRMore ASRMore ASRMore ASR
Audio WorkgroupAudio Workgroup
Shihab is RunningShihab is Running
See http://www.hardrock100.com/index.asp
Shihab arriving in Telluride in 2004
(should happen around 4PM today)
Audio WorkgroupAudio Workgroup
Localization EffortLocalization Effort
Interaural Time Difference (ITD)
Estimated from time difference between spikes of two matching channels.
Interaural Intensity Difference (IID)
Difference of spike counts between two cochleae.
Azimuth: Combination of ITD and IID
ITD estimation from pure tones
Azimuth estimation from music
Speaker
Microphones
Audio WorkgroupAudio Workgroup
Localization EffortLocalization Effort
Audio WorkgroupAudio Workgroup
FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition
Audio WorkgroupAudio Workgroup
FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition
Field Programmable Analog Array (FPAA)—based analog cochlea (non-spiking) with envelope detection.
MOTE—based pattern matching using matched filtering with “receptive fields”
Robosapien—listens to the spoken commands….
Audio WorkgroupAudio Workgroup
FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition
Status:Status:FPAA – (we are using a new FPAA) 2nd-order sections synthesized but a full auditory filter bank is not yet up.
MOTE – real-time communication with Matlab and sampling operational.
Audio WorkgroupAudio Workgroup
Relational Network (Simple)Relational Network (Simple)
X Y
Z
MM
X
M
Y
M
Z
m
Patches of neurons
Each measureone quantity
Bidirectionalrelations for feedback/feedforward
Thanks to Rodney Douglas
Audio WorkgroupAudio Workgroup
Relational Network (example)Relational Network (example)
Input here
RelationalFeedback
Relational specification
Relational feedback
Audio WorkgroupAudio Workgroup
ASR Relational NetworkASR Relational Network
Cochlea
Delay
Phone Recognizer
Word Recognizer
A patch of neurons(one of N output)Note: We don’t know
how to represent delays
Phone Recognizer
Bidirectional links enforce
phoneme/word constraints
Audio WorkgroupAudio Workgroup
Relational AdvantagesRelational Advantages
Not an HMMHMMs are great, but…
Incorporate other knowledgeBottom-up perception
Top-down word hypothesis
HallucinateBased on experience
Hear “ba..” and know thatBad, bat, bar, bass, band follow
>
Audio WorkgroupAudio Workgroup
Inner hair cells
Silicon CochleaSilicon Cochlea
Ganglion cells
Basilar membrane
highfrequency
lowfrequency
(van Schaik, Liu, 2004)
BASILAR MEMBRANE
INNER HAIR CELLS
GANGLION CELLS
Audio WorkgroupAudio Workgroup
Silicon Frequency ResponseSilicon Frequency Response
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Tone ramps into two cochleas
Audio WorkgroupAudio Workgroup
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Cochlear Rate ProfilesCochlear Rate Profiles
Left Cochlea Right Cochlea
Spi
kes
per
utte
ranc
e
Audio WorkgroupAudio Workgroup
Learning AlgorithmsLearning Algorithms
StatisticalSAS (Pick best channels for decision)
Least squares (for software demo)
Liquid State MachineTake input to high dimensions with spiking net
Spike Timing Dependent Plasticity (STDP)Giocomo/Srinjoy Chip
Brader/Fusi
0 0.05 0.1 0.15 0.2 0.250
0.5
1
1.5
2
2.5
V1
V2
Vowel 1
Vowel 2
LSM Spiking Output
Audio WorkgroupAudio Workgroup
Phoneme 1 Phoneme 2 Phoneme 2
Learning Chip ArchitectureLearning Chip Architecture
ImmediateCochlea
Pla
stic
sy
naps
esDelayedCochlea
Phoneme 1
Cochlea Chip
Learning ChipNeurons
Relational Network
Non
plas
tic
syna
pses
Exc
it.
Inhi
b.
Bin
ary
syna
ptic
w
eigh
ts:
, ,
Audio WorkgroupAudio Workgroup
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Tone ResultsTone Results
Tone recognitionSpike input from silicon cochlea
TrainingTwo tones
Duplicated input
Positive and negative examples
Testing
Audio WorkgroupAudio Workgroup
Phoneme recognitionSpike input from silicon cochlea
TrainingTwo phonemes
Duplicated inputs
Positive and negative examples
Testing
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Phoneme ResultsPhoneme Results
Audio WorkgroupAudio Workgroup
Behind the CurtainBehind the Curtain
Audio WorkgroupAudio Workgroup
Hardware OverviewHardware Overview
Cochlea
Learning
LearningLearning
PhonemeWord
PCI-AER (for remapping)
PCI-AER (for remapping)
Cochlea
Shih-Chii LiuGiacomo Indiveri
Implemented in MATLAB
Audio WorkgroupAudio Workgroup
Infrastructure DifficultiesInfrastructure Difficulties
RemapperEnsuing the problems surrounding AER mapper boards, remapping the AER data from silicon cochlea to the learning chip had to be done in Matlab. (very slow)
PowerThe unpredictable problem caused by the variation in supply voltage as much as 1V.
Sharing chipsThe learning chip had to be shared with two other workgroups.
PC replacement
Audio WorkgroupAudio Workgroup
Impedance DifficultiesImpedance Difficulties
Cochlear firing ratesCochlea: 6M spikes/second
30k channels, 200 spikes/second
Silicon Cochlea: 30k spikes/second30 channels, 1k spike/second
Learning Chip: 3k spikes/second30 channels, 100 spikes/second
Dynamic range
Audio WorkgroupAudio Workgroup
Desired ResultsDesired Results
/A/ Phoneme Patch
/I/ Phoneme Patch
AI Word Patch
IA Word Patch
A A A IPhoneme Input
Relational Feedback
Without With
Audio WorkgroupAudio Workgroup
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
SimulationSimulation
Audio WorkgroupAudio Workgroup
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Simulation 2Simulation 2
Audio WorkgroupAudio Workgroup
Simulation 3Simulation 3
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Audio WorkgroupAudio Workgroup
Great Job!Great Job!
Student MembersStudent MembersIsmail Uysal Yoojin ChungRamin Pichevar Rich Hammett Tarek Massoud Ross Gaylor
Audio WorkgroupAudio Workgroup
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Audio WorkgroupAudio Workgroup
Silicon CochleaSilicon Cochlea
0 20 40 60 800
0.5
1
1.5
2x 10
5
Channel Number
Mean firing rate
Mean firing rates in response to two tones
/a//i/
0 2 4 6 8 10 12 14
x 105
10
15
20
25
30
35
40
45
50
55Raster plots for two different tones
Time in microseconds
Channel number
200Hz1000Hz
Raster plot for two different tone inputs
Mean firing rates for two different vowel inputs
Channel Number
Cha
nnel
Num
ber
Time in microseconds
Audio WorkgroupAudio Workgroup
Word RecognizerWord Recognizer
Four example raster plot (silence, A_, A_ with relational, AI)
Audio WorkgroupAudio Workgroup
Software SimulationSoftware Simulation
Audio WorkgroupAudio Workgroup
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Software SimulationSoftware Simulation
Audio WorkgroupAudio Workgroup
Behind the CurtainBehind the Curtain
Recommended