12
Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore.

Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

Embed Size (px)

Citation preview

Page 1: Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

Voice Recognition (Presentation 2)

By:

Priya Devi A.S/W Developer,

Xsys technologies

Bangalore.

Page 2: Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

Preparing Grammar

Grammar file currently extended to 56 tokens. Dynamic generation of grammar file is possible. User Interface for entering grammar token and

action is implemented. Tokens are entered into grammar file which are

recognized by sphinx recognizer on detection from microphone input.

Action are associated to tokens and recorded in form of hash table.

Grammar file is according to JSGF format.

Page 3: Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

JSGF (Java Speech Grammar Format)

The JSpeech Grammar Format (JSGF) is a platform-independent, vendor-independent textual representation of grammars for use in speech recognition.

Example token definition according to JSGF is as follows :

public <desktopAction> = open (Computer | Document | Recycle | Network | <defaultApplication> );

public <defaultApplication> = player | word | powerpoint | internet | start | tasks ;

Page 4: Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

Major Challenge - Accuracy

Accuracy now is only 45 %. Accuracy depends on a lot of factors like noise,

microphone quality. Accuracy highly depends on Recognizer. Recognizer search grammar file for tokens

according to Best first scheme. Best first scheme fails due to wrong textual

comparison. For eg. Word can be recognized as ward.

Page 5: Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

Improving Accuracy

Limit the size of grammar file. Remove trivial tokens from grammar file. All the tokens given on slide 3 are trivial tokens. Trivial tokens can be identified by .WAV file

training and not included in grammar file. Which reduces search space of grammar file. Accuracy is increased to 72 % With this command and control application is

completed.

Page 6: Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

.WAV file training

.Wav file training is process of recording small .wav files in user’s voice to improve accuracy in speech recognition application.

User are provided with the interface to read set of lines before starting with the speech recognition application.

Set of lines consists of words which are trivial for command and control application like , open, close, file, computer, document, player, internet.

Recognizer first match token with .wav file. If token is not found in .wav file the grammar file is searched.

Page 7: Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

Next task : Dictation

Dictation is different from command and control. It requires large number of words to be recognized.

Dictation should be start on recognizing “Start dictation” token and then input from microphone should not be

used as command but as keystrokes. Complex task as grammar file and .wav file training fails

in this case because user can speak anything which may be not present in grammar file and .wav files.

Page 8: Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

Thank You

Page 9: Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

Voice Recognition (Presentation 3)

By:

Priya Devi A.S/W Developer,

Xsys technologies

Bangalore.

Page 10: Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

Dictation Functionality

Speech dictation is to consider input voice not as command but as text.

Recognition of spoken word is similar to as it was in command and control application.

Once the spoken word is recognized as “Start Dictation”; Rest all word is considered as text till recognizer recognizes “Stop Dictation”.

After recognizing “Stop Dictation” ; application again will work as command and control

Dictation is implemented by using algorithm given in the next slide.

Page 11: Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

Algorithm Dictation

Changes in Command and controlIf ( Recognizer(spoken_word)= “Start Dictation” )

call function RecognizeDictation()else

match in hashtable.Recognize DictationWhile(true)

Start RecordingIf ( Recognizer(spoken_word) != “Stop Dictation” )Create object of Robot Class present in java.awt packagefor i=0 to Recognizer(spoken_word).length-1 RobotObject.keyPress(recognizeword.charAt(i).toAscii()) RobotObject.keyRelease(recognizeword.charAt(i).toAscii())End forElsereturn

End While

Page 12: Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore

Open Points

Paragraph framing for training .wav files Modification in dictation functionality as “Stop Dictation”

can not be dictated. Proper GUI creation with logo and standard design. Deployment with the existing system on centos. Testing on centos. Code Cleanup. Complete Testing of command and control and Dictation Documentation.