Upload
ishana
View
21
Download
4
Embed Size (px)
DESCRIPTION
NetTalk Project. Speech Generation Using a Neural Network. Michael J Euhardy. The Speech Generation Idea. Input: a specific letter whose sound is to be generated Input: three letters on each side of it for a total of seven letters input - PowerPoint PPT Presentation
Citation preview
NetTalk ProjectNetTalk Project
Speech Generation Speech Generation Using a Neural NetworkUsing a Neural Network
Michael J Euhardy
The Speech Generation The Speech Generation IdeaIdea
Input: a specific letter whose sound Input: a specific letter whose sound is to be generatedis to be generated
Input: three letters on each side of it Input: three letters on each side of it for a total of seven letters inputfor a total of seven letters input
Output: the sound that should be Output: the sound that should be generated based on the input letter generated based on the input letter and the surrounding lettersand the surrounding letters
The StrategyThe Strategy
26 possible letters26 possible letters 7 input position7 input position
Map each letter in each position to Map each letter in each position to a unique inputa unique input
7*26 = 182 total inputs7*26 = 182 total inputs
The StrategyThe Strategy
57 possible sounds generated57 possible sounds generated
Map to 57 output labelsMap to 57 output labels
The Resulting The Resulting ANNANN
A fully connected single A fully connected single layer perceptron with 182 layer perceptron with 182
inputs and 57 outputsinputs and 57 outputs
The FindingsThe Findings
The trained neural network performs The trained neural network performs very well, and the larger the training very well, and the larger the training set and the longer spent training on set and the longer spent training on it, the better it performsit, the better it performs
The training can be an extremely The training can be an extremely long process if a high rate of long process if a high rate of classification is desired and the classification is desired and the training set is largetraining set is large
ProblemsProblems
TimeTime SpaceSpace
TimeTime
You can’t rush training the network. You can’t rush training the network. Even using a dual PIII-733 with Even using a dual PIII-733 with 512MB, it still took a really long time 512MB, it still took a really long time to train any data of a significant size. to train any data of a significant size. And just converting all of the And just converting all of the characters in the data file to the characters in the data file to the matrices necessary to use as inputs matrices necessary to use as inputs and labels took hours.and labels took hours.
SpaceSpace
20000 words of data with maybe 7 20000 words of data with maybe 7 letters on average. That’s a matrixletters on average. That’s a matrix
140000x239140000x239
Double precision in Matlab, that’s a Double precision in Matlab, that’s a lot of memorylot of memory
WorkaroundsWorkarounds
Smaller data set, only 1000 wordsSmaller data set, only 1000 words Lower standards of training, only Lower standards of training, only
train to 80% classificationtrain to 80% classification
Next TimeNext Time
C++C++ Matlab is way too slow and way too Matlab is way too slow and way too
memory intensivememory intensive Start Earlier, it’s a long processStart Earlier, it’s a long process Multi-Layer PerceptronMulti-Layer Perceptron
ConclusionConclusion
I give up!I give up! I don’t know how Microsoft’s I don’t know how Microsoft’s
Narrator does it, but I bet it doesn’t Narrator does it, but I bet it doesn’t do it this way.do it this way.