Biologically-Motivated Approaches to Speech Recognition Mishal Awadah John Mayer Robert Hass Dr. Mark Liberman Sr. Design Poster Day - April 25, 2011 Department of Computer and Information Science, University of Pennsylvania is project investigates the emulation of human speech perception in computerized speech recognition. Hier- archical Temporal Memory (HTM), a machine learning algorithm modeled aﬅer the behavior of neural circuits, is compared to Gaussian classification, a conventional machine learning method; input audio formats of vary- ing fidelity to the structure of the human ear are con- trasted; experimental trials are conducted in parallel us- ing a custom-built job dispatching system. Future work will improve recognition performance by testing more HTM configurations, expanding from isolated vowels to continuous speech, and employing X-ray articulation data. » Discrete populations of neu- rons along the ear’s basilar membrane are activated by particular sound frequencies. » A biologically realistic repre- sentation of digital audio can be generated that follows this distribution of sensitivity. » Preliminary results suggest improved performance with ear-scaled sound data. » Trials to date have used isolated vowels - recognizing continuous speech is a harder problem in which differences between approaches will become more apparent. It is in the latter task that HTM networks are expected to outperform their counterparts. » HTM networks with increasingly complex multi-node configurations (as below) will be tried. e sophistication of the patterns that a network can recognize is closely tied to the structure of the network. » e Motor eory of Speech Perception suggests that speakers’ lips and tongues represent speech better than sound does. e utility of incorporating X-ray articulation data will be assessed. » HTM networks have many parameters to adjust - which combina- tion is optimal? » How can parallelism be used to allow searching a large parameter space? » “Queen” dispatches individual trial runs to “drones”. » A drone runs a simulation with particular parameters, reports results to the queen, then awaits more work. » A Gaussian classifier calculates a statistical distribution (mean and covariance) for each category in its training input. » Hierarchical Temporal Memory mimics the computational design of neural circuits. » Each node in a HTM network identifies spatial and temporal patterns in its input. » Hierarchical organization of nodes allows detection of nested patterns. » At each level in the hierarchy, a HTM network identifies “causes” of lower-level phenomena. » Hierarchical Temporal Memory was developed by Jeff Hawkins at Numenta, Inc. » e algorithmic structures of the machine learning techniques used in modern speech processing systems are not optimal for recognizing speech sounds; they are mathematically useful tools that bear little relation to the tasks to which they are applied. » While impressive results can be obtained from the heavily opti- mized use of such systems, their underlying architectures impose a strict upper bound on performance. Abstract Audio and the Ear Future Work Test Harness Gaussian Classification Hierarchical Temporal Memory (HTM) Machine Learning » A test vector is categorized by identifying the distribution most likely to be responsible for it. » Each of the three distributions in the example at leﬅ corresponds to a different category. Credit: Ari J. Hollander, Human Interface Technology Lab, Washington University Credit: Steve Renals, School of Informatics, University of Edinburgh

Biologically-Motivated Approaches to Speech Recognitioncse400/CSE400_2010_2011/CIS401_fi… · Sr. Design Poster Day - April 25, 2011 Department of Computer and Information Science,

Download PDF Report

Upload
others
View
1
Download
0

Embed Size (px)

Citation preview

Page 1: Biologically-Motivated Approaches to Speech Recognitioncse400/CSE400_2010_2011/CIS401_fi… · Sr. Design Poster Day - April 25, 2011 Department of Computer and Information Science,

Biologically-Motivated Approaches to Speech RecognitionMishal Awadah John MayerRobert Hass Dr. Mark Liberman

Sr. Design Poster Day - April 25, 2011Department of Computer and Information Science, University of Pennsylvania

This project investigates the emulation of human speech perception in computerized speech recognition. Hier-archical Temporal Memory (HTM), a machine learning algorithm modeled after the behavior of neural circuits, is compared to Gaussian classification, a conventional machine learning method; input audio formats of vary-ing fidelity to the structure of the human ear are con-trasted; experimental trials are conducted in parallel us-ing a custom-built job dispatching system. Future work will improve recognition performance by testing more HTM configurations, expanding from isolated vowels to continuous speech, and employing X-ray articulation data.

»Discrete populations of neu-rons along the ear’s basilar membrane are activated by particular sound frequencies.

»A biologically realistic repre-sentation of digital audio can be generated that follows this distribution of sensitivity.

» Preliminary results suggest improved performance with ear-scaled sound data.

»Trials to date have used isolated vowels - recognizing continu-ous speech is a harder problem in which differences between ap-proaches will become more apparent. It is in the latter task that HTM networks are expected to outperform their counterparts.

»HTM networks with increasingly complex multi-node configura-tions (as below) will be tried. The sophistication of the patterns that a network can recognize is closely tied to the structure of the network.

»The Motor Theory of Speech Perception suggests that speakers’ lips and tongues represent speech better than sound does. The utility of incorporating X-ray articulation data will be assessed.

»HTM networks have many parameters to adjust - which combina-tion is optimal?

»How can parallelism be used to allow searching a large parameter space?

» “Queen” dispatches individual trial runs to “drones”.

»A drone runs a simulation with particular parameters, reports re-sults to the queen, then awaits more work.

»A Gaussian classifier calculates a statistical distribution (mean and covariance) for each category in its training input.

»Hierarchical Temporal Memory mimics the computational de-sign of neural circuits.

» Each node in a HTM network identifies spatial and temporal patterns in its input.

»Hierarchical organization of nodes allows detection of nested patterns.

»At each level in the hierarchy, a HTM network identifies “causes” of lower-level phenomena.

»Hierarchical Temporal Memory was developed by Jeff Hawkins at Numenta, Inc.

»The algorithmic structures of the machine learning techniques used in modern speech processing systems are not optimal for recognizing speech sounds; they are mathematically useful tools that bear little relation to the tasks to which they are applied.

»While impressive results can be obtained from the heavily opti-mized use of such systems, their underlying architectures impose a strict upper bound on performance.

Abstract Audio and the Ear

Future WorkTest Harness

Gaussian Classification

Hierarchical Temporal Memory (HTM)

Machine Learning

»A test vector is categorized by identifying the distribution most likely to be responsible for it.

» Each of the three distributions in the example at left corresponds to a different category.

Credit: Ari J. Hollander, Human Interface Technology Lab, Washington University

Credit: Steve Renals, School of Informatics, University of Edinburgh