92
BENGALI SPEECH RECOGNITION DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING LEADING UNIVERSITY, SYLHET 1 st January 2013

Thesis Paper of my Bachelor Degree

Embed Size (px)

Citation preview

Page 1: Thesis Paper of my Bachelor Degree

BENGALI SPEECH RECOGNITION

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

LEADING UNIVERSITY SYLHET

1st January

2013

Bengali Speech Recognition

2

BENGALI SPEECH RECOGNITION

1st JANUARY 2013

This Project report is submitted to the Department of Computer Science and Engineering Leading University for the partial fulfillment for the requirements of the degree of Bachelor of Science in Computer Science and Engineering

Supervised By

Mrs Arpita Chakraborty

Assistant Professor

Department of Computer Science and Engineering

Leading University Sylhet

amp

Mrinal Kanti Dhar

Lecturer

Department of Electrical amp Electronic Engineering

Leading University Sylhet

Conducted By

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

LEADING UNIVERSITY SYLHET BANGLADESH

Shimul Dey

BSc (Honrsquos) Final Semester

Examination-2013

ID 0901020032

Session 2009-2013

Sanjoy Ranjan Das

BSc (Honrsquos) Final Semester

Examination-2013

ID 0901020016

Session 2009-2013

Md Badrul Alom Chowdhury

BSc (Honrsquos) Final Semester

Examination-2013

ID 0901020004

Session 2009-2013

Bengali Speech Recognition

3

To

The Head

Department of Computer Science and Engineering

Leading University Sylhet Bangladesh

Sub Proposal for Project

Respected Sir

We would like to inform you that we are the student of your department would like to carryout a

project on ldquoBENGALI SPEECH RECOGNITIONrdquo

We would be grateful to you if you kindly allow us to proceed to complete the project on the

above mention topics under condition of partial fulfillment of the requirements for the degree of

Bachelor of Science in Computer Science and Engineering

Thanking you

Yours Sincerely

Name ID

Md Badrul Alom Chowdhury

0901020004

Sanjoy Ranjan Das

0901020016

Shimul Dey

0901020032

Bengali Speech Recognition

4

DECLARATION

We hereby declare that the project work entitled ldquoBengali Speech Recognitionrdquo submitted

to the Leading University is a record of an original work done by us under the guidance of

Arpita Chakraborty Assistant professor in Department of Computer Science and Engineering

Leading University and this project work is submitted in the fulfillment of Bachelor in Computer

Science amp Engineering The result of this project has not been submitted to any other University

or Institute for the award of any degree or diploma Materials of work found by other researcher

are mentioned by reference

Signature of Spervisor amp Co-supervisor

Name of Supervisor Signature

Mrs Arpita Chakraborty

Assistant Professor

Name of Co-supervisor Signature

Mrinal Kanti Dhar

Lecturer

Signature of Authors

Name of Authors Signature

Md Badrul Alom Chowdhury

Sanjoy Ranjan Das

Shimul Dey

Bengali Speech Recognition

5

ACKNOWLEDGEMENT

We would like to thank our Honorable Supervisor Arpita Chakraborty amp Co-supervisor

Mrinal Kanti Dhar for their guidance throughout the process They exposed us to the real

professional research world with their precious experience We really cherish for the time

working with them on such an interesting topic Also we would like to thank our university

students to let us record their voice for experiments and our Computer Science amp Engineering

Department for giving us authority and facility to complete the project Last but not at least

thanks to the Almighty for helping us in every steps of this project work

Bengali Speech Recognition

6

Table of Contents

Declaration 4

Acknowledgments5

List of figures 8

List of Chart 9

List of Table 10

List of Abbreviation amp Symbols 10

Abstract 11

Literature Survey 12

Chapter 1 Introduction (13-21)

11 Introduction 14

12 History of Speech Recognition 14-15

13 Types of Speech Recognition 15

131 Isolated Words 15

132 Connected Words 16

133 Continuous Words 16

134 Spontaneous Words 16

135 Speaker Dependent 16

136 Speaker independent 16

137 Overview of Speech Recognition System 17

14 Terms and Concepts 17

141 Utterance 17

142 Pronunciation 17-18

143 Grammars 18

Bengali Speech Recognition

7

144 Vocabularies 18

145 Training 18

146 Accuracy 18

147 Language Dictionary 18

148 Filler Dictionary 19

149 Phone 19

1410 HMM 19-20

1411 Language Model 20

15 Overview of the Full system 21

Chapter 2 METHODOLOGY (22-32)

21 Data Preparation 23

211 Corpus 23

212 Audio Files 23-24

213 Dictionary Files 24-25

214 Phone File 25-26

215 Language Model File lm Format 26

216 Language Model File DMP Format 26-27

217 Transcription File 27

218 Fileids File 27-28

219 Filler File 28

22 Setting up The System Environment 28

221 Software Requirements 28

222 Trainer Setup 28

223 Project Folder Setup 39-30

Bengali Speech Recognition

8

224 Training the Acoustic Model 30

225 Testing Part 30

2251 Testing with Pocket Sphinx 30-31

2252 Testing with Sphinx4 31-32

Chapter 3 TESTING AND PERFORMANCE EVALUATION (33-38)

31 Testing amp Performance Evaluation 34

32 Test Results with Pocket Sphinx35

33 Test Results with Sphinx4 36

331 Input Type Microphone 37

332 Input Type Audio 38

Chapter 4 Applications amp Developing (40-42)

41 Review of Some Developed Recognized Application 41

411 Dictation Application 41

412 Phonetic Translator 41

413 Training File Creator 41-42

414 Training File Creator42

Chapter 5 Limitation amp Future Work (43-44)

51 Limitation 44

52 Future Work 44

Chapter 6 CONCLUSION amp REFERENCES (45-47)

61 Conclusion 46

62 References 47

Bengali Speech Recognition

9

List of Figures

List of Charts

Fig No Name of figures Page

No

137 Overview of Speech Recognition System 17

1410 Applying Hidden Markov Model on Speech Recognition 20

15 Overview of the full System Model 21

212 Audio File Recording Format 24

2251 Testing with Pocket Sphinx 31

2252 Testing with Sphinx4 32

412 Dictionary files with phonetic translation 41

4131 Fileids files with phonetic translation 42

4142 Transcription File 42

Fig No Name of Charts Page

No

322 Experiment Results with Pocket Sphinx 35

3312 Experimental Details with Results for Sphinx 4 Live 37

3322 Experimental Details with Results for Sphinx 4 Audio 39

Bengali Speech Recognition

10

List of Table

No of

table

Name of tables Page

No

12 History of Speech Recognition 15

223 Configuration of Sphinx-traincfg 29-30

321 Experimental details with Results for Pocket Sphinx 34

3311 Test results with Sphinx4 Input Type Microphone 36

3321 Test results with Sphinx4 Input Type Audio 38

71 Speaker Profiles 48

72 Unicode to IPA Chart 49-63

73 Corpus About University Admission Information 64-70

List of Abbreviation amp symbols

ASR Automatic Speech recognition

BSD Berkeley Software Distribution

CMU Carnegie Mellon University

HMM Hidden Markov Model

IPA International Phonetic Alphabet

CMU Principal Component Analysis

ASCII American Standard Code for Information Interchange

MERL Mitsubishi Electric Research Labs

CRBLP Center for Research Bangla Language Processing

D2P Dictionary to pronunciation

SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition

IDE Integrated Development Engine

ABI Allied Business Intelligence

Bengali Speech Recognition

11

ABASTRACT

This report presents an overview of Automatic Speech Recognition (ASR) for our mother

tongue Bangla It begins with an introduction to speech recognition technology and then it

explains how such systems work and the level of accuracy that can be expected The object of

human speech is not just a way to convey words from one person to another but also to make the

other person to understand the depth of the spoken words These systems have made dramatic

performance leaps in the recent past The aim of this project is to develop software that identifies

human speech with the help of CMU sphinx Speech Recognition API

Bengali Speech Recognition

12

Literature Survey

Today speech technology plays an important role in many applications Speech

technology has moved from research to commercial application Many human machine

interfaces have been invented and applied today in telephone food ordering system airport

information system ticketing system restaurant reservation system etc As a result we have

selected this important field for our project On the other hand most of the languages have a

speech recognition system but our mother tongue Bangla has no proper speech recognition

system this is the main reasons to select this topics At the starting era most of the research

works are done by using Artificial Neural Network (ANN) but as we are using HMM

based technique so some HMM based and related research are mentioned below

Implementation of Speech Recognition System for Bangla (Shammur Absar

Chowdhury-August 2010) We have studied this thesis report within one week and acquire lot of

knowledge about Speech Recognition We are really very thankful to Shammur Absar

Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students

who want to work in these fields [9]

Speech Recognition by Machine A Review (MAAnusuya and SKKatti Department

of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore

India) from this review we have learn lot of things about the types of Speech Recognition

approaches of speech recognition etc [1]

Isolated and Continuous Bangla Speech Recognition Implementation

Performance and application perspective (by Md Abul Hasnat Jabir Mowla and Mumit

Khan- BRAC) ndash We have studied the past works and to the best of us knowledge this work is the

first reported attempt to recognized Bangla speech using HMM Technique so from this

publication we have taken most of us suggestion about the steps to build Speech

Recognition System for our report From here we have learned how to increase the quality

of audio signal given as input by noise elimination process and end detection algorithm

from this paper we have also learned that how feature of a sound is extracted and what are the

parameters taken in feature files we have also learn the algorithm for creating HMM models [8]

Bengali segmented automated speech recognition (Department of Computer Science

and Engineering BRAC University) from this thesis report we have learn about the Vowel and

Consonants phonemes Vowels and Consonants phoneme clusters Voiced and non-voiced stops

and Hidden Markov Model[6]

Recognition of Spoken Letters in Bangla (Abul HasanatMd Rezaul KarimMd

Shahidur Rahman and Md Zafar Iqbal - SUST) Extraction of Bangla Vowel and Representation

in the Vowel Space (Syed Akhter Hossain-East West M Lutfar Rahman-Du and Farruk Ahmed-

NSU) Acoustic Analysis of Bangla Consonants(Firoj Alam S M Murtoza Habib and Mumit

Khan) - From here We have learn the technique used to recognize letters vowels and consonant

basically here we found out the basic steps towards a recognizer and what are the

common steps to build a full functioning recognizer[7]

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 2: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

2

BENGALI SPEECH RECOGNITION

1st JANUARY 2013

This Project report is submitted to the Department of Computer Science and Engineering Leading University for the partial fulfillment for the requirements of the degree of Bachelor of Science in Computer Science and Engineering

Supervised By

Mrs Arpita Chakraborty

Assistant Professor

Department of Computer Science and Engineering

Leading University Sylhet

amp

Mrinal Kanti Dhar

Lecturer

Department of Electrical amp Electronic Engineering

Leading University Sylhet

Conducted By

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

LEADING UNIVERSITY SYLHET BANGLADESH

Shimul Dey

BSc (Honrsquos) Final Semester

Examination-2013

ID 0901020032

Session 2009-2013

Sanjoy Ranjan Das

BSc (Honrsquos) Final Semester

Examination-2013

ID 0901020016

Session 2009-2013

Md Badrul Alom Chowdhury

BSc (Honrsquos) Final Semester

Examination-2013

ID 0901020004

Session 2009-2013

Bengali Speech Recognition

3

To

The Head

Department of Computer Science and Engineering

Leading University Sylhet Bangladesh

Sub Proposal for Project

Respected Sir

We would like to inform you that we are the student of your department would like to carryout a

project on ldquoBENGALI SPEECH RECOGNITIONrdquo

We would be grateful to you if you kindly allow us to proceed to complete the project on the

above mention topics under condition of partial fulfillment of the requirements for the degree of

Bachelor of Science in Computer Science and Engineering

Thanking you

Yours Sincerely

Name ID

Md Badrul Alom Chowdhury

0901020004

Sanjoy Ranjan Das

0901020016

Shimul Dey

0901020032

Bengali Speech Recognition

4

DECLARATION

We hereby declare that the project work entitled ldquoBengali Speech Recognitionrdquo submitted

to the Leading University is a record of an original work done by us under the guidance of

Arpita Chakraborty Assistant professor in Department of Computer Science and Engineering

Leading University and this project work is submitted in the fulfillment of Bachelor in Computer

Science amp Engineering The result of this project has not been submitted to any other University

or Institute for the award of any degree or diploma Materials of work found by other researcher

are mentioned by reference

Signature of Spervisor amp Co-supervisor

Name of Supervisor Signature

Mrs Arpita Chakraborty

Assistant Professor

Name of Co-supervisor Signature

Mrinal Kanti Dhar

Lecturer

Signature of Authors

Name of Authors Signature

Md Badrul Alom Chowdhury

Sanjoy Ranjan Das

Shimul Dey

Bengali Speech Recognition

5

ACKNOWLEDGEMENT

We would like to thank our Honorable Supervisor Arpita Chakraborty amp Co-supervisor

Mrinal Kanti Dhar for their guidance throughout the process They exposed us to the real

professional research world with their precious experience We really cherish for the time

working with them on such an interesting topic Also we would like to thank our university

students to let us record their voice for experiments and our Computer Science amp Engineering

Department for giving us authority and facility to complete the project Last but not at least

thanks to the Almighty for helping us in every steps of this project work

Bengali Speech Recognition

6

Table of Contents

Declaration 4

Acknowledgments5

List of figures 8

List of Chart 9

List of Table 10

List of Abbreviation amp Symbols 10

Abstract 11

Literature Survey 12

Chapter 1 Introduction (13-21)

11 Introduction 14

12 History of Speech Recognition 14-15

13 Types of Speech Recognition 15

131 Isolated Words 15

132 Connected Words 16

133 Continuous Words 16

134 Spontaneous Words 16

135 Speaker Dependent 16

136 Speaker independent 16

137 Overview of Speech Recognition System 17

14 Terms and Concepts 17

141 Utterance 17

142 Pronunciation 17-18

143 Grammars 18

Bengali Speech Recognition

7

144 Vocabularies 18

145 Training 18

146 Accuracy 18

147 Language Dictionary 18

148 Filler Dictionary 19

149 Phone 19

1410 HMM 19-20

1411 Language Model 20

15 Overview of the Full system 21

Chapter 2 METHODOLOGY (22-32)

21 Data Preparation 23

211 Corpus 23

212 Audio Files 23-24

213 Dictionary Files 24-25

214 Phone File 25-26

215 Language Model File lm Format 26

216 Language Model File DMP Format 26-27

217 Transcription File 27

218 Fileids File 27-28

219 Filler File 28

22 Setting up The System Environment 28

221 Software Requirements 28

222 Trainer Setup 28

223 Project Folder Setup 39-30

Bengali Speech Recognition

8

224 Training the Acoustic Model 30

225 Testing Part 30

2251 Testing with Pocket Sphinx 30-31

2252 Testing with Sphinx4 31-32

Chapter 3 TESTING AND PERFORMANCE EVALUATION (33-38)

31 Testing amp Performance Evaluation 34

32 Test Results with Pocket Sphinx35

33 Test Results with Sphinx4 36

331 Input Type Microphone 37

332 Input Type Audio 38

Chapter 4 Applications amp Developing (40-42)

41 Review of Some Developed Recognized Application 41

411 Dictation Application 41

412 Phonetic Translator 41

413 Training File Creator 41-42

414 Training File Creator42

Chapter 5 Limitation amp Future Work (43-44)

51 Limitation 44

52 Future Work 44

Chapter 6 CONCLUSION amp REFERENCES (45-47)

61 Conclusion 46

62 References 47

Bengali Speech Recognition

9

List of Figures

List of Charts

Fig No Name of figures Page

No

137 Overview of Speech Recognition System 17

1410 Applying Hidden Markov Model on Speech Recognition 20

15 Overview of the full System Model 21

212 Audio File Recording Format 24

2251 Testing with Pocket Sphinx 31

2252 Testing with Sphinx4 32

412 Dictionary files with phonetic translation 41

4131 Fileids files with phonetic translation 42

4142 Transcription File 42

Fig No Name of Charts Page

No

322 Experiment Results with Pocket Sphinx 35

3312 Experimental Details with Results for Sphinx 4 Live 37

3322 Experimental Details with Results for Sphinx 4 Audio 39

Bengali Speech Recognition

10

List of Table

No of

table

Name of tables Page

No

12 History of Speech Recognition 15

223 Configuration of Sphinx-traincfg 29-30

321 Experimental details with Results for Pocket Sphinx 34

3311 Test results with Sphinx4 Input Type Microphone 36

3321 Test results with Sphinx4 Input Type Audio 38

71 Speaker Profiles 48

72 Unicode to IPA Chart 49-63

73 Corpus About University Admission Information 64-70

List of Abbreviation amp symbols

ASR Automatic Speech recognition

BSD Berkeley Software Distribution

CMU Carnegie Mellon University

HMM Hidden Markov Model

IPA International Phonetic Alphabet

CMU Principal Component Analysis

ASCII American Standard Code for Information Interchange

MERL Mitsubishi Electric Research Labs

CRBLP Center for Research Bangla Language Processing

D2P Dictionary to pronunciation

SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition

IDE Integrated Development Engine

ABI Allied Business Intelligence

Bengali Speech Recognition

11

ABASTRACT

This report presents an overview of Automatic Speech Recognition (ASR) for our mother

tongue Bangla It begins with an introduction to speech recognition technology and then it

explains how such systems work and the level of accuracy that can be expected The object of

human speech is not just a way to convey words from one person to another but also to make the

other person to understand the depth of the spoken words These systems have made dramatic

performance leaps in the recent past The aim of this project is to develop software that identifies

human speech with the help of CMU sphinx Speech Recognition API

Bengali Speech Recognition

12

Literature Survey

Today speech technology plays an important role in many applications Speech

technology has moved from research to commercial application Many human machine

interfaces have been invented and applied today in telephone food ordering system airport

information system ticketing system restaurant reservation system etc As a result we have

selected this important field for our project On the other hand most of the languages have a

speech recognition system but our mother tongue Bangla has no proper speech recognition

system this is the main reasons to select this topics At the starting era most of the research

works are done by using Artificial Neural Network (ANN) but as we are using HMM

based technique so some HMM based and related research are mentioned below

Implementation of Speech Recognition System for Bangla (Shammur Absar

Chowdhury-August 2010) We have studied this thesis report within one week and acquire lot of

knowledge about Speech Recognition We are really very thankful to Shammur Absar

Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students

who want to work in these fields [9]

Speech Recognition by Machine A Review (MAAnusuya and SKKatti Department

of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore

India) from this review we have learn lot of things about the types of Speech Recognition

approaches of speech recognition etc [1]

Isolated and Continuous Bangla Speech Recognition Implementation

Performance and application perspective (by Md Abul Hasnat Jabir Mowla and Mumit

Khan- BRAC) ndash We have studied the past works and to the best of us knowledge this work is the

first reported attempt to recognized Bangla speech using HMM Technique so from this

publication we have taken most of us suggestion about the steps to build Speech

Recognition System for our report From here we have learned how to increase the quality

of audio signal given as input by noise elimination process and end detection algorithm

from this paper we have also learned that how feature of a sound is extracted and what are the

parameters taken in feature files we have also learn the algorithm for creating HMM models [8]

Bengali segmented automated speech recognition (Department of Computer Science

and Engineering BRAC University) from this thesis report we have learn about the Vowel and

Consonants phonemes Vowels and Consonants phoneme clusters Voiced and non-voiced stops

and Hidden Markov Model[6]

Recognition of Spoken Letters in Bangla (Abul HasanatMd Rezaul KarimMd

Shahidur Rahman and Md Zafar Iqbal - SUST) Extraction of Bangla Vowel and Representation

in the Vowel Space (Syed Akhter Hossain-East West M Lutfar Rahman-Du and Farruk Ahmed-

NSU) Acoustic Analysis of Bangla Consonants(Firoj Alam S M Murtoza Habib and Mumit

Khan) - From here We have learn the technique used to recognize letters vowels and consonant

basically here we found out the basic steps towards a recognizer and what are the

common steps to build a full functioning recognizer[7]

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 3: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

3

To

The Head

Department of Computer Science and Engineering

Leading University Sylhet Bangladesh

Sub Proposal for Project

Respected Sir

We would like to inform you that we are the student of your department would like to carryout a

project on ldquoBENGALI SPEECH RECOGNITIONrdquo

We would be grateful to you if you kindly allow us to proceed to complete the project on the

above mention topics under condition of partial fulfillment of the requirements for the degree of

Bachelor of Science in Computer Science and Engineering

Thanking you

Yours Sincerely

Name ID

Md Badrul Alom Chowdhury

0901020004

Sanjoy Ranjan Das

0901020016

Shimul Dey

0901020032

Bengali Speech Recognition

4

DECLARATION

We hereby declare that the project work entitled ldquoBengali Speech Recognitionrdquo submitted

to the Leading University is a record of an original work done by us under the guidance of

Arpita Chakraborty Assistant professor in Department of Computer Science and Engineering

Leading University and this project work is submitted in the fulfillment of Bachelor in Computer

Science amp Engineering The result of this project has not been submitted to any other University

or Institute for the award of any degree or diploma Materials of work found by other researcher

are mentioned by reference

Signature of Spervisor amp Co-supervisor

Name of Supervisor Signature

Mrs Arpita Chakraborty

Assistant Professor

Name of Co-supervisor Signature

Mrinal Kanti Dhar

Lecturer

Signature of Authors

Name of Authors Signature

Md Badrul Alom Chowdhury

Sanjoy Ranjan Das

Shimul Dey

Bengali Speech Recognition

5

ACKNOWLEDGEMENT

We would like to thank our Honorable Supervisor Arpita Chakraborty amp Co-supervisor

Mrinal Kanti Dhar for their guidance throughout the process They exposed us to the real

professional research world with their precious experience We really cherish for the time

working with them on such an interesting topic Also we would like to thank our university

students to let us record their voice for experiments and our Computer Science amp Engineering

Department for giving us authority and facility to complete the project Last but not at least

thanks to the Almighty for helping us in every steps of this project work

Bengali Speech Recognition

6

Table of Contents

Declaration 4

Acknowledgments5

List of figures 8

List of Chart 9

List of Table 10

List of Abbreviation amp Symbols 10

Abstract 11

Literature Survey 12

Chapter 1 Introduction (13-21)

11 Introduction 14

12 History of Speech Recognition 14-15

13 Types of Speech Recognition 15

131 Isolated Words 15

132 Connected Words 16

133 Continuous Words 16

134 Spontaneous Words 16

135 Speaker Dependent 16

136 Speaker independent 16

137 Overview of Speech Recognition System 17

14 Terms and Concepts 17

141 Utterance 17

142 Pronunciation 17-18

143 Grammars 18

Bengali Speech Recognition

7

144 Vocabularies 18

145 Training 18

146 Accuracy 18

147 Language Dictionary 18

148 Filler Dictionary 19

149 Phone 19

1410 HMM 19-20

1411 Language Model 20

15 Overview of the Full system 21

Chapter 2 METHODOLOGY (22-32)

21 Data Preparation 23

211 Corpus 23

212 Audio Files 23-24

213 Dictionary Files 24-25

214 Phone File 25-26

215 Language Model File lm Format 26

216 Language Model File DMP Format 26-27

217 Transcription File 27

218 Fileids File 27-28

219 Filler File 28

22 Setting up The System Environment 28

221 Software Requirements 28

222 Trainer Setup 28

223 Project Folder Setup 39-30

Bengali Speech Recognition

8

224 Training the Acoustic Model 30

225 Testing Part 30

2251 Testing with Pocket Sphinx 30-31

2252 Testing with Sphinx4 31-32

Chapter 3 TESTING AND PERFORMANCE EVALUATION (33-38)

31 Testing amp Performance Evaluation 34

32 Test Results with Pocket Sphinx35

33 Test Results with Sphinx4 36

331 Input Type Microphone 37

332 Input Type Audio 38

Chapter 4 Applications amp Developing (40-42)

41 Review of Some Developed Recognized Application 41

411 Dictation Application 41

412 Phonetic Translator 41

413 Training File Creator 41-42

414 Training File Creator42

Chapter 5 Limitation amp Future Work (43-44)

51 Limitation 44

52 Future Work 44

Chapter 6 CONCLUSION amp REFERENCES (45-47)

61 Conclusion 46

62 References 47

Bengali Speech Recognition

9

List of Figures

List of Charts

Fig No Name of figures Page

No

137 Overview of Speech Recognition System 17

1410 Applying Hidden Markov Model on Speech Recognition 20

15 Overview of the full System Model 21

212 Audio File Recording Format 24

2251 Testing with Pocket Sphinx 31

2252 Testing with Sphinx4 32

412 Dictionary files with phonetic translation 41

4131 Fileids files with phonetic translation 42

4142 Transcription File 42

Fig No Name of Charts Page

No

322 Experiment Results with Pocket Sphinx 35

3312 Experimental Details with Results for Sphinx 4 Live 37

3322 Experimental Details with Results for Sphinx 4 Audio 39

Bengali Speech Recognition

10

List of Table

No of

table

Name of tables Page

No

12 History of Speech Recognition 15

223 Configuration of Sphinx-traincfg 29-30

321 Experimental details with Results for Pocket Sphinx 34

3311 Test results with Sphinx4 Input Type Microphone 36

3321 Test results with Sphinx4 Input Type Audio 38

71 Speaker Profiles 48

72 Unicode to IPA Chart 49-63

73 Corpus About University Admission Information 64-70

List of Abbreviation amp symbols

ASR Automatic Speech recognition

BSD Berkeley Software Distribution

CMU Carnegie Mellon University

HMM Hidden Markov Model

IPA International Phonetic Alphabet

CMU Principal Component Analysis

ASCII American Standard Code for Information Interchange

MERL Mitsubishi Electric Research Labs

CRBLP Center for Research Bangla Language Processing

D2P Dictionary to pronunciation

SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition

IDE Integrated Development Engine

ABI Allied Business Intelligence

Bengali Speech Recognition

11

ABASTRACT

This report presents an overview of Automatic Speech Recognition (ASR) for our mother

tongue Bangla It begins with an introduction to speech recognition technology and then it

explains how such systems work and the level of accuracy that can be expected The object of

human speech is not just a way to convey words from one person to another but also to make the

other person to understand the depth of the spoken words These systems have made dramatic

performance leaps in the recent past The aim of this project is to develop software that identifies

human speech with the help of CMU sphinx Speech Recognition API

Bengali Speech Recognition

12

Literature Survey

Today speech technology plays an important role in many applications Speech

technology has moved from research to commercial application Many human machine

interfaces have been invented and applied today in telephone food ordering system airport

information system ticketing system restaurant reservation system etc As a result we have

selected this important field for our project On the other hand most of the languages have a

speech recognition system but our mother tongue Bangla has no proper speech recognition

system this is the main reasons to select this topics At the starting era most of the research

works are done by using Artificial Neural Network (ANN) but as we are using HMM

based technique so some HMM based and related research are mentioned below

Implementation of Speech Recognition System for Bangla (Shammur Absar

Chowdhury-August 2010) We have studied this thesis report within one week and acquire lot of

knowledge about Speech Recognition We are really very thankful to Shammur Absar

Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students

who want to work in these fields [9]

Speech Recognition by Machine A Review (MAAnusuya and SKKatti Department

of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore

India) from this review we have learn lot of things about the types of Speech Recognition

approaches of speech recognition etc [1]

Isolated and Continuous Bangla Speech Recognition Implementation

Performance and application perspective (by Md Abul Hasnat Jabir Mowla and Mumit

Khan- BRAC) ndash We have studied the past works and to the best of us knowledge this work is the

first reported attempt to recognized Bangla speech using HMM Technique so from this

publication we have taken most of us suggestion about the steps to build Speech

Recognition System for our report From here we have learned how to increase the quality

of audio signal given as input by noise elimination process and end detection algorithm

from this paper we have also learned that how feature of a sound is extracted and what are the

parameters taken in feature files we have also learn the algorithm for creating HMM models [8]

Bengali segmented automated speech recognition (Department of Computer Science

and Engineering BRAC University) from this thesis report we have learn about the Vowel and

Consonants phonemes Vowels and Consonants phoneme clusters Voiced and non-voiced stops

and Hidden Markov Model[6]

Recognition of Spoken Letters in Bangla (Abul HasanatMd Rezaul KarimMd

Shahidur Rahman and Md Zafar Iqbal - SUST) Extraction of Bangla Vowel and Representation

in the Vowel Space (Syed Akhter Hossain-East West M Lutfar Rahman-Du and Farruk Ahmed-

NSU) Acoustic Analysis of Bangla Consonants(Firoj Alam S M Murtoza Habib and Mumit

Khan) - From here We have learn the technique used to recognize letters vowels and consonant

basically here we found out the basic steps towards a recognizer and what are the

common steps to build a full functioning recognizer[7]

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 4: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

4

DECLARATION

We hereby declare that the project work entitled ldquoBengali Speech Recognitionrdquo submitted

to the Leading University is a record of an original work done by us under the guidance of

Arpita Chakraborty Assistant professor in Department of Computer Science and Engineering

Leading University and this project work is submitted in the fulfillment of Bachelor in Computer

Science amp Engineering The result of this project has not been submitted to any other University

or Institute for the award of any degree or diploma Materials of work found by other researcher

are mentioned by reference

Signature of Spervisor amp Co-supervisor

Name of Supervisor Signature

Mrs Arpita Chakraborty

Assistant Professor

Name of Co-supervisor Signature

Mrinal Kanti Dhar

Lecturer

Signature of Authors

Name of Authors Signature

Md Badrul Alom Chowdhury

Sanjoy Ranjan Das

Shimul Dey

Bengali Speech Recognition

5

ACKNOWLEDGEMENT

We would like to thank our Honorable Supervisor Arpita Chakraborty amp Co-supervisor

Mrinal Kanti Dhar for their guidance throughout the process They exposed us to the real

professional research world with their precious experience We really cherish for the time

working with them on such an interesting topic Also we would like to thank our university

students to let us record their voice for experiments and our Computer Science amp Engineering

Department for giving us authority and facility to complete the project Last but not at least

thanks to the Almighty for helping us in every steps of this project work

Bengali Speech Recognition

6

Table of Contents

Declaration 4

Acknowledgments5

List of figures 8

List of Chart 9

List of Table 10

List of Abbreviation amp Symbols 10

Abstract 11

Literature Survey 12

Chapter 1 Introduction (13-21)

11 Introduction 14

12 History of Speech Recognition 14-15

13 Types of Speech Recognition 15

131 Isolated Words 15

132 Connected Words 16

133 Continuous Words 16

134 Spontaneous Words 16

135 Speaker Dependent 16

136 Speaker independent 16

137 Overview of Speech Recognition System 17

14 Terms and Concepts 17

141 Utterance 17

142 Pronunciation 17-18

143 Grammars 18

Bengali Speech Recognition

7

144 Vocabularies 18

145 Training 18

146 Accuracy 18

147 Language Dictionary 18

148 Filler Dictionary 19

149 Phone 19

1410 HMM 19-20

1411 Language Model 20

15 Overview of the Full system 21

Chapter 2 METHODOLOGY (22-32)

21 Data Preparation 23

211 Corpus 23

212 Audio Files 23-24

213 Dictionary Files 24-25

214 Phone File 25-26

215 Language Model File lm Format 26

216 Language Model File DMP Format 26-27

217 Transcription File 27

218 Fileids File 27-28

219 Filler File 28

22 Setting up The System Environment 28

221 Software Requirements 28

222 Trainer Setup 28

223 Project Folder Setup 39-30

Bengali Speech Recognition

8

224 Training the Acoustic Model 30

225 Testing Part 30

2251 Testing with Pocket Sphinx 30-31

2252 Testing with Sphinx4 31-32

Chapter 3 TESTING AND PERFORMANCE EVALUATION (33-38)

31 Testing amp Performance Evaluation 34

32 Test Results with Pocket Sphinx35

33 Test Results with Sphinx4 36

331 Input Type Microphone 37

332 Input Type Audio 38

Chapter 4 Applications amp Developing (40-42)

41 Review of Some Developed Recognized Application 41

411 Dictation Application 41

412 Phonetic Translator 41

413 Training File Creator 41-42

414 Training File Creator42

Chapter 5 Limitation amp Future Work (43-44)

51 Limitation 44

52 Future Work 44

Chapter 6 CONCLUSION amp REFERENCES (45-47)

61 Conclusion 46

62 References 47

Bengali Speech Recognition

9

List of Figures

List of Charts

Fig No Name of figures Page

No

137 Overview of Speech Recognition System 17

1410 Applying Hidden Markov Model on Speech Recognition 20

15 Overview of the full System Model 21

212 Audio File Recording Format 24

2251 Testing with Pocket Sphinx 31

2252 Testing with Sphinx4 32

412 Dictionary files with phonetic translation 41

4131 Fileids files with phonetic translation 42

4142 Transcription File 42

Fig No Name of Charts Page

No

322 Experiment Results with Pocket Sphinx 35

3312 Experimental Details with Results for Sphinx 4 Live 37

3322 Experimental Details with Results for Sphinx 4 Audio 39

Bengali Speech Recognition

10

List of Table

No of

table

Name of tables Page

No

12 History of Speech Recognition 15

223 Configuration of Sphinx-traincfg 29-30

321 Experimental details with Results for Pocket Sphinx 34

3311 Test results with Sphinx4 Input Type Microphone 36

3321 Test results with Sphinx4 Input Type Audio 38

71 Speaker Profiles 48

72 Unicode to IPA Chart 49-63

73 Corpus About University Admission Information 64-70

List of Abbreviation amp symbols

ASR Automatic Speech recognition

BSD Berkeley Software Distribution

CMU Carnegie Mellon University

HMM Hidden Markov Model

IPA International Phonetic Alphabet

CMU Principal Component Analysis

ASCII American Standard Code for Information Interchange

MERL Mitsubishi Electric Research Labs

CRBLP Center for Research Bangla Language Processing

D2P Dictionary to pronunciation

SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition

IDE Integrated Development Engine

ABI Allied Business Intelligence

Bengali Speech Recognition

11

ABASTRACT

This report presents an overview of Automatic Speech Recognition (ASR) for our mother

tongue Bangla It begins with an introduction to speech recognition technology and then it

explains how such systems work and the level of accuracy that can be expected The object of

human speech is not just a way to convey words from one person to another but also to make the

other person to understand the depth of the spoken words These systems have made dramatic

performance leaps in the recent past The aim of this project is to develop software that identifies

human speech with the help of CMU sphinx Speech Recognition API

Bengali Speech Recognition

12

Literature Survey

Today speech technology plays an important role in many applications Speech

technology has moved from research to commercial application Many human machine

interfaces have been invented and applied today in telephone food ordering system airport

information system ticketing system restaurant reservation system etc As a result we have

selected this important field for our project On the other hand most of the languages have a

speech recognition system but our mother tongue Bangla has no proper speech recognition

system this is the main reasons to select this topics At the starting era most of the research

works are done by using Artificial Neural Network (ANN) but as we are using HMM

based technique so some HMM based and related research are mentioned below

Implementation of Speech Recognition System for Bangla (Shammur Absar

Chowdhury-August 2010) We have studied this thesis report within one week and acquire lot of

knowledge about Speech Recognition We are really very thankful to Shammur Absar

Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students

who want to work in these fields [9]

Speech Recognition by Machine A Review (MAAnusuya and SKKatti Department

of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore

India) from this review we have learn lot of things about the types of Speech Recognition

approaches of speech recognition etc [1]

Isolated and Continuous Bangla Speech Recognition Implementation

Performance and application perspective (by Md Abul Hasnat Jabir Mowla and Mumit

Khan- BRAC) ndash We have studied the past works and to the best of us knowledge this work is the

first reported attempt to recognized Bangla speech using HMM Technique so from this

publication we have taken most of us suggestion about the steps to build Speech

Recognition System for our report From here we have learned how to increase the quality

of audio signal given as input by noise elimination process and end detection algorithm

from this paper we have also learned that how feature of a sound is extracted and what are the

parameters taken in feature files we have also learn the algorithm for creating HMM models [8]

Bengali segmented automated speech recognition (Department of Computer Science

and Engineering BRAC University) from this thesis report we have learn about the Vowel and

Consonants phonemes Vowels and Consonants phoneme clusters Voiced and non-voiced stops

and Hidden Markov Model[6]

Recognition of Spoken Letters in Bangla (Abul HasanatMd Rezaul KarimMd

Shahidur Rahman and Md Zafar Iqbal - SUST) Extraction of Bangla Vowel and Representation

in the Vowel Space (Syed Akhter Hossain-East West M Lutfar Rahman-Du and Farruk Ahmed-

NSU) Acoustic Analysis of Bangla Consonants(Firoj Alam S M Murtoza Habib and Mumit

Khan) - From here We have learn the technique used to recognize letters vowels and consonant

basically here we found out the basic steps towards a recognizer and what are the

common steps to build a full functioning recognizer[7]

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 5: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

5

ACKNOWLEDGEMENT

We would like to thank our Honorable Supervisor Arpita Chakraborty amp Co-supervisor

Mrinal Kanti Dhar for their guidance throughout the process They exposed us to the real

professional research world with their precious experience We really cherish for the time

working with them on such an interesting topic Also we would like to thank our university

students to let us record their voice for experiments and our Computer Science amp Engineering

Department for giving us authority and facility to complete the project Last but not at least

thanks to the Almighty for helping us in every steps of this project work

Bengali Speech Recognition

6

Table of Contents

Declaration 4

Acknowledgments5

List of figures 8

List of Chart 9

List of Table 10

List of Abbreviation amp Symbols 10

Abstract 11

Literature Survey 12

Chapter 1 Introduction (13-21)

11 Introduction 14

12 History of Speech Recognition 14-15

13 Types of Speech Recognition 15

131 Isolated Words 15

132 Connected Words 16

133 Continuous Words 16

134 Spontaneous Words 16

135 Speaker Dependent 16

136 Speaker independent 16

137 Overview of Speech Recognition System 17

14 Terms and Concepts 17

141 Utterance 17

142 Pronunciation 17-18

143 Grammars 18

Bengali Speech Recognition

7

144 Vocabularies 18

145 Training 18

146 Accuracy 18

147 Language Dictionary 18

148 Filler Dictionary 19

149 Phone 19

1410 HMM 19-20

1411 Language Model 20

15 Overview of the Full system 21

Chapter 2 METHODOLOGY (22-32)

21 Data Preparation 23

211 Corpus 23

212 Audio Files 23-24

213 Dictionary Files 24-25

214 Phone File 25-26

215 Language Model File lm Format 26

216 Language Model File DMP Format 26-27

217 Transcription File 27

218 Fileids File 27-28

219 Filler File 28

22 Setting up The System Environment 28

221 Software Requirements 28

222 Trainer Setup 28

223 Project Folder Setup 39-30

Bengali Speech Recognition

8

224 Training the Acoustic Model 30

225 Testing Part 30

2251 Testing with Pocket Sphinx 30-31

2252 Testing with Sphinx4 31-32

Chapter 3 TESTING AND PERFORMANCE EVALUATION (33-38)

31 Testing amp Performance Evaluation 34

32 Test Results with Pocket Sphinx35

33 Test Results with Sphinx4 36

331 Input Type Microphone 37

332 Input Type Audio 38

Chapter 4 Applications amp Developing (40-42)

41 Review of Some Developed Recognized Application 41

411 Dictation Application 41

412 Phonetic Translator 41

413 Training File Creator 41-42

414 Training File Creator42

Chapter 5 Limitation amp Future Work (43-44)

51 Limitation 44

52 Future Work 44

Chapter 6 CONCLUSION amp REFERENCES (45-47)

61 Conclusion 46

62 References 47

Bengali Speech Recognition

9

List of Figures

List of Charts

Fig No Name of figures Page

No

137 Overview of Speech Recognition System 17

1410 Applying Hidden Markov Model on Speech Recognition 20

15 Overview of the full System Model 21

212 Audio File Recording Format 24

2251 Testing with Pocket Sphinx 31

2252 Testing with Sphinx4 32

412 Dictionary files with phonetic translation 41

4131 Fileids files with phonetic translation 42

4142 Transcription File 42

Fig No Name of Charts Page

No

322 Experiment Results with Pocket Sphinx 35

3312 Experimental Details with Results for Sphinx 4 Live 37

3322 Experimental Details with Results for Sphinx 4 Audio 39

Bengali Speech Recognition

10

List of Table

No of

table

Name of tables Page

No

12 History of Speech Recognition 15

223 Configuration of Sphinx-traincfg 29-30

321 Experimental details with Results for Pocket Sphinx 34

3311 Test results with Sphinx4 Input Type Microphone 36

3321 Test results with Sphinx4 Input Type Audio 38

71 Speaker Profiles 48

72 Unicode to IPA Chart 49-63

73 Corpus About University Admission Information 64-70

List of Abbreviation amp symbols

ASR Automatic Speech recognition

BSD Berkeley Software Distribution

CMU Carnegie Mellon University

HMM Hidden Markov Model

IPA International Phonetic Alphabet

CMU Principal Component Analysis

ASCII American Standard Code for Information Interchange

MERL Mitsubishi Electric Research Labs

CRBLP Center for Research Bangla Language Processing

D2P Dictionary to pronunciation

SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition

IDE Integrated Development Engine

ABI Allied Business Intelligence

Bengali Speech Recognition

11

ABASTRACT

This report presents an overview of Automatic Speech Recognition (ASR) for our mother

tongue Bangla It begins with an introduction to speech recognition technology and then it

explains how such systems work and the level of accuracy that can be expected The object of

human speech is not just a way to convey words from one person to another but also to make the

other person to understand the depth of the spoken words These systems have made dramatic

performance leaps in the recent past The aim of this project is to develop software that identifies

human speech with the help of CMU sphinx Speech Recognition API

Bengali Speech Recognition

12

Literature Survey

Today speech technology plays an important role in many applications Speech

technology has moved from research to commercial application Many human machine

interfaces have been invented and applied today in telephone food ordering system airport

information system ticketing system restaurant reservation system etc As a result we have

selected this important field for our project On the other hand most of the languages have a

speech recognition system but our mother tongue Bangla has no proper speech recognition

system this is the main reasons to select this topics At the starting era most of the research

works are done by using Artificial Neural Network (ANN) but as we are using HMM

based technique so some HMM based and related research are mentioned below

Implementation of Speech Recognition System for Bangla (Shammur Absar

Chowdhury-August 2010) We have studied this thesis report within one week and acquire lot of

knowledge about Speech Recognition We are really very thankful to Shammur Absar

Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students

who want to work in these fields [9]

Speech Recognition by Machine A Review (MAAnusuya and SKKatti Department

of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore

India) from this review we have learn lot of things about the types of Speech Recognition

approaches of speech recognition etc [1]

Isolated and Continuous Bangla Speech Recognition Implementation

Performance and application perspective (by Md Abul Hasnat Jabir Mowla and Mumit

Khan- BRAC) ndash We have studied the past works and to the best of us knowledge this work is the

first reported attempt to recognized Bangla speech using HMM Technique so from this

publication we have taken most of us suggestion about the steps to build Speech

Recognition System for our report From here we have learned how to increase the quality

of audio signal given as input by noise elimination process and end detection algorithm

from this paper we have also learned that how feature of a sound is extracted and what are the

parameters taken in feature files we have also learn the algorithm for creating HMM models [8]

Bengali segmented automated speech recognition (Department of Computer Science

and Engineering BRAC University) from this thesis report we have learn about the Vowel and

Consonants phonemes Vowels and Consonants phoneme clusters Voiced and non-voiced stops

and Hidden Markov Model[6]

Recognition of Spoken Letters in Bangla (Abul HasanatMd Rezaul KarimMd

Shahidur Rahman and Md Zafar Iqbal - SUST) Extraction of Bangla Vowel and Representation

in the Vowel Space (Syed Akhter Hossain-East West M Lutfar Rahman-Du and Farruk Ahmed-

NSU) Acoustic Analysis of Bangla Consonants(Firoj Alam S M Murtoza Habib and Mumit

Khan) - From here We have learn the technique used to recognize letters vowels and consonant

basically here we found out the basic steps towards a recognizer and what are the

common steps to build a full functioning recognizer[7]

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 6: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

6

Table of Contents

Declaration 4

Acknowledgments5

List of figures 8

List of Chart 9

List of Table 10

List of Abbreviation amp Symbols 10

Abstract 11

Literature Survey 12

Chapter 1 Introduction (13-21)

11 Introduction 14

12 History of Speech Recognition 14-15

13 Types of Speech Recognition 15

131 Isolated Words 15

132 Connected Words 16

133 Continuous Words 16

134 Spontaneous Words 16

135 Speaker Dependent 16

136 Speaker independent 16

137 Overview of Speech Recognition System 17

14 Terms and Concepts 17

141 Utterance 17

142 Pronunciation 17-18

143 Grammars 18

Bengali Speech Recognition

7

144 Vocabularies 18

145 Training 18

146 Accuracy 18

147 Language Dictionary 18

148 Filler Dictionary 19

149 Phone 19

1410 HMM 19-20

1411 Language Model 20

15 Overview of the Full system 21

Chapter 2 METHODOLOGY (22-32)

21 Data Preparation 23

211 Corpus 23

212 Audio Files 23-24

213 Dictionary Files 24-25

214 Phone File 25-26

215 Language Model File lm Format 26

216 Language Model File DMP Format 26-27

217 Transcription File 27

218 Fileids File 27-28

219 Filler File 28

22 Setting up The System Environment 28

221 Software Requirements 28

222 Trainer Setup 28

223 Project Folder Setup 39-30

Bengali Speech Recognition

8

224 Training the Acoustic Model 30

225 Testing Part 30

2251 Testing with Pocket Sphinx 30-31

2252 Testing with Sphinx4 31-32

Chapter 3 TESTING AND PERFORMANCE EVALUATION (33-38)

31 Testing amp Performance Evaluation 34

32 Test Results with Pocket Sphinx35

33 Test Results with Sphinx4 36

331 Input Type Microphone 37

332 Input Type Audio 38

Chapter 4 Applications amp Developing (40-42)

41 Review of Some Developed Recognized Application 41

411 Dictation Application 41

412 Phonetic Translator 41

413 Training File Creator 41-42

414 Training File Creator42

Chapter 5 Limitation amp Future Work (43-44)

51 Limitation 44

52 Future Work 44

Chapter 6 CONCLUSION amp REFERENCES (45-47)

61 Conclusion 46

62 References 47

Bengali Speech Recognition

9

List of Figures

List of Charts

Fig No Name of figures Page

No

137 Overview of Speech Recognition System 17

1410 Applying Hidden Markov Model on Speech Recognition 20

15 Overview of the full System Model 21

212 Audio File Recording Format 24

2251 Testing with Pocket Sphinx 31

2252 Testing with Sphinx4 32

412 Dictionary files with phonetic translation 41

4131 Fileids files with phonetic translation 42

4142 Transcription File 42

Fig No Name of Charts Page

No

322 Experiment Results with Pocket Sphinx 35

3312 Experimental Details with Results for Sphinx 4 Live 37

3322 Experimental Details with Results for Sphinx 4 Audio 39

Bengali Speech Recognition

10

List of Table

No of

table

Name of tables Page

No

12 History of Speech Recognition 15

223 Configuration of Sphinx-traincfg 29-30

321 Experimental details with Results for Pocket Sphinx 34

3311 Test results with Sphinx4 Input Type Microphone 36

3321 Test results with Sphinx4 Input Type Audio 38

71 Speaker Profiles 48

72 Unicode to IPA Chart 49-63

73 Corpus About University Admission Information 64-70

List of Abbreviation amp symbols

ASR Automatic Speech recognition

BSD Berkeley Software Distribution

CMU Carnegie Mellon University

HMM Hidden Markov Model

IPA International Phonetic Alphabet

CMU Principal Component Analysis

ASCII American Standard Code for Information Interchange

MERL Mitsubishi Electric Research Labs

CRBLP Center for Research Bangla Language Processing

D2P Dictionary to pronunciation

SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition

IDE Integrated Development Engine

ABI Allied Business Intelligence

Bengali Speech Recognition

11

ABASTRACT

This report presents an overview of Automatic Speech Recognition (ASR) for our mother

tongue Bangla It begins with an introduction to speech recognition technology and then it

explains how such systems work and the level of accuracy that can be expected The object of

human speech is not just a way to convey words from one person to another but also to make the

other person to understand the depth of the spoken words These systems have made dramatic

performance leaps in the recent past The aim of this project is to develop software that identifies

human speech with the help of CMU sphinx Speech Recognition API

Bengali Speech Recognition

12

Literature Survey

Today speech technology plays an important role in many applications Speech

technology has moved from research to commercial application Many human machine

interfaces have been invented and applied today in telephone food ordering system airport

information system ticketing system restaurant reservation system etc As a result we have

selected this important field for our project On the other hand most of the languages have a

speech recognition system but our mother tongue Bangla has no proper speech recognition

system this is the main reasons to select this topics At the starting era most of the research

works are done by using Artificial Neural Network (ANN) but as we are using HMM

based technique so some HMM based and related research are mentioned below

Implementation of Speech Recognition System for Bangla (Shammur Absar

Chowdhury-August 2010) We have studied this thesis report within one week and acquire lot of

knowledge about Speech Recognition We are really very thankful to Shammur Absar

Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students

who want to work in these fields [9]

Speech Recognition by Machine A Review (MAAnusuya and SKKatti Department

of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore

India) from this review we have learn lot of things about the types of Speech Recognition

approaches of speech recognition etc [1]

Isolated and Continuous Bangla Speech Recognition Implementation

Performance and application perspective (by Md Abul Hasnat Jabir Mowla and Mumit

Khan- BRAC) ndash We have studied the past works and to the best of us knowledge this work is the

first reported attempt to recognized Bangla speech using HMM Technique so from this

publication we have taken most of us suggestion about the steps to build Speech

Recognition System for our report From here we have learned how to increase the quality

of audio signal given as input by noise elimination process and end detection algorithm

from this paper we have also learned that how feature of a sound is extracted and what are the

parameters taken in feature files we have also learn the algorithm for creating HMM models [8]

Bengali segmented automated speech recognition (Department of Computer Science

and Engineering BRAC University) from this thesis report we have learn about the Vowel and

Consonants phonemes Vowels and Consonants phoneme clusters Voiced and non-voiced stops

and Hidden Markov Model[6]

Recognition of Spoken Letters in Bangla (Abul HasanatMd Rezaul KarimMd

Shahidur Rahman and Md Zafar Iqbal - SUST) Extraction of Bangla Vowel and Representation

in the Vowel Space (Syed Akhter Hossain-East West M Lutfar Rahman-Du and Farruk Ahmed-

NSU) Acoustic Analysis of Bangla Consonants(Firoj Alam S M Murtoza Habib and Mumit

Khan) - From here We have learn the technique used to recognize letters vowels and consonant

basically here we found out the basic steps towards a recognizer and what are the

common steps to build a full functioning recognizer[7]

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 7: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

7

144 Vocabularies 18

145 Training 18

146 Accuracy 18

147 Language Dictionary 18

148 Filler Dictionary 19

149 Phone 19

1410 HMM 19-20

1411 Language Model 20

15 Overview of the Full system 21

Chapter 2 METHODOLOGY (22-32)

21 Data Preparation 23

211 Corpus 23

212 Audio Files 23-24

213 Dictionary Files 24-25

214 Phone File 25-26

215 Language Model File lm Format 26

216 Language Model File DMP Format 26-27

217 Transcription File 27

218 Fileids File 27-28

219 Filler File 28

22 Setting up The System Environment 28

221 Software Requirements 28

222 Trainer Setup 28

223 Project Folder Setup 39-30

Bengali Speech Recognition

8

224 Training the Acoustic Model 30

225 Testing Part 30

2251 Testing with Pocket Sphinx 30-31

2252 Testing with Sphinx4 31-32

Chapter 3 TESTING AND PERFORMANCE EVALUATION (33-38)

31 Testing amp Performance Evaluation 34

32 Test Results with Pocket Sphinx35

33 Test Results with Sphinx4 36

331 Input Type Microphone 37

332 Input Type Audio 38

Chapter 4 Applications amp Developing (40-42)

41 Review of Some Developed Recognized Application 41

411 Dictation Application 41

412 Phonetic Translator 41

413 Training File Creator 41-42

414 Training File Creator42

Chapter 5 Limitation amp Future Work (43-44)

51 Limitation 44

52 Future Work 44

Chapter 6 CONCLUSION amp REFERENCES (45-47)

61 Conclusion 46

62 References 47

Bengali Speech Recognition

9

List of Figures

List of Charts

Fig No Name of figures Page

No

137 Overview of Speech Recognition System 17

1410 Applying Hidden Markov Model on Speech Recognition 20

15 Overview of the full System Model 21

212 Audio File Recording Format 24

2251 Testing with Pocket Sphinx 31

2252 Testing with Sphinx4 32

412 Dictionary files with phonetic translation 41

4131 Fileids files with phonetic translation 42

4142 Transcription File 42

Fig No Name of Charts Page

No

322 Experiment Results with Pocket Sphinx 35

3312 Experimental Details with Results for Sphinx 4 Live 37

3322 Experimental Details with Results for Sphinx 4 Audio 39

Bengali Speech Recognition

10

List of Table

No of

table

Name of tables Page

No

12 History of Speech Recognition 15

223 Configuration of Sphinx-traincfg 29-30

321 Experimental details with Results for Pocket Sphinx 34

3311 Test results with Sphinx4 Input Type Microphone 36

3321 Test results with Sphinx4 Input Type Audio 38

71 Speaker Profiles 48

72 Unicode to IPA Chart 49-63

73 Corpus About University Admission Information 64-70

List of Abbreviation amp symbols

ASR Automatic Speech recognition

BSD Berkeley Software Distribution

CMU Carnegie Mellon University

HMM Hidden Markov Model

IPA International Phonetic Alphabet

CMU Principal Component Analysis

ASCII American Standard Code for Information Interchange

MERL Mitsubishi Electric Research Labs

CRBLP Center for Research Bangla Language Processing

D2P Dictionary to pronunciation

SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition

IDE Integrated Development Engine

ABI Allied Business Intelligence

Bengali Speech Recognition

11

ABASTRACT

This report presents an overview of Automatic Speech Recognition (ASR) for our mother

tongue Bangla It begins with an introduction to speech recognition technology and then it

explains how such systems work and the level of accuracy that can be expected The object of

human speech is not just a way to convey words from one person to another but also to make the

other person to understand the depth of the spoken words These systems have made dramatic

performance leaps in the recent past The aim of this project is to develop software that identifies

human speech with the help of CMU sphinx Speech Recognition API

Bengali Speech Recognition

12

Literature Survey

Today speech technology plays an important role in many applications Speech

technology has moved from research to commercial application Many human machine

interfaces have been invented and applied today in telephone food ordering system airport

information system ticketing system restaurant reservation system etc As a result we have

selected this important field for our project On the other hand most of the languages have a

speech recognition system but our mother tongue Bangla has no proper speech recognition

system this is the main reasons to select this topics At the starting era most of the research

works are done by using Artificial Neural Network (ANN) but as we are using HMM

based technique so some HMM based and related research are mentioned below

Implementation of Speech Recognition System for Bangla (Shammur Absar

Chowdhury-August 2010) We have studied this thesis report within one week and acquire lot of

knowledge about Speech Recognition We are really very thankful to Shammur Absar

Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students

who want to work in these fields [9]

Speech Recognition by Machine A Review (MAAnusuya and SKKatti Department

of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore

India) from this review we have learn lot of things about the types of Speech Recognition

approaches of speech recognition etc [1]

Isolated and Continuous Bangla Speech Recognition Implementation

Performance and application perspective (by Md Abul Hasnat Jabir Mowla and Mumit

Khan- BRAC) ndash We have studied the past works and to the best of us knowledge this work is the

first reported attempt to recognized Bangla speech using HMM Technique so from this

publication we have taken most of us suggestion about the steps to build Speech

Recognition System for our report From here we have learned how to increase the quality

of audio signal given as input by noise elimination process and end detection algorithm

from this paper we have also learned that how feature of a sound is extracted and what are the

parameters taken in feature files we have also learn the algorithm for creating HMM models [8]

Bengali segmented automated speech recognition (Department of Computer Science

and Engineering BRAC University) from this thesis report we have learn about the Vowel and

Consonants phonemes Vowels and Consonants phoneme clusters Voiced and non-voiced stops

and Hidden Markov Model[6]

Recognition of Spoken Letters in Bangla (Abul HasanatMd Rezaul KarimMd

Shahidur Rahman and Md Zafar Iqbal - SUST) Extraction of Bangla Vowel and Representation

in the Vowel Space (Syed Akhter Hossain-East West M Lutfar Rahman-Du and Farruk Ahmed-

NSU) Acoustic Analysis of Bangla Consonants(Firoj Alam S M Murtoza Habib and Mumit

Khan) - From here We have learn the technique used to recognize letters vowels and consonant

basically here we found out the basic steps towards a recognizer and what are the

common steps to build a full functioning recognizer[7]

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 8: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

8

224 Training the Acoustic Model 30

225 Testing Part 30

2251 Testing with Pocket Sphinx 30-31

2252 Testing with Sphinx4 31-32

Chapter 3 TESTING AND PERFORMANCE EVALUATION (33-38)

31 Testing amp Performance Evaluation 34

32 Test Results with Pocket Sphinx35

33 Test Results with Sphinx4 36

331 Input Type Microphone 37

332 Input Type Audio 38

Chapter 4 Applications amp Developing (40-42)

41 Review of Some Developed Recognized Application 41

411 Dictation Application 41

412 Phonetic Translator 41

413 Training File Creator 41-42

414 Training File Creator42

Chapter 5 Limitation amp Future Work (43-44)

51 Limitation 44

52 Future Work 44

Chapter 6 CONCLUSION amp REFERENCES (45-47)

61 Conclusion 46

62 References 47

Bengali Speech Recognition

9

List of Figures

List of Charts

Fig No Name of figures Page

No

137 Overview of Speech Recognition System 17

1410 Applying Hidden Markov Model on Speech Recognition 20

15 Overview of the full System Model 21

212 Audio File Recording Format 24

2251 Testing with Pocket Sphinx 31

2252 Testing with Sphinx4 32

412 Dictionary files with phonetic translation 41

4131 Fileids files with phonetic translation 42

4142 Transcription File 42

Fig No Name of Charts Page

No

322 Experiment Results with Pocket Sphinx 35

3312 Experimental Details with Results for Sphinx 4 Live 37

3322 Experimental Details with Results for Sphinx 4 Audio 39

Bengali Speech Recognition

10

List of Table

No of

table

Name of tables Page

No

12 History of Speech Recognition 15

223 Configuration of Sphinx-traincfg 29-30

321 Experimental details with Results for Pocket Sphinx 34

3311 Test results with Sphinx4 Input Type Microphone 36

3321 Test results with Sphinx4 Input Type Audio 38

71 Speaker Profiles 48

72 Unicode to IPA Chart 49-63

73 Corpus About University Admission Information 64-70

List of Abbreviation amp symbols

ASR Automatic Speech recognition

BSD Berkeley Software Distribution

CMU Carnegie Mellon University

HMM Hidden Markov Model

IPA International Phonetic Alphabet

CMU Principal Component Analysis

ASCII American Standard Code for Information Interchange

MERL Mitsubishi Electric Research Labs

CRBLP Center for Research Bangla Language Processing

D2P Dictionary to pronunciation

SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition

IDE Integrated Development Engine

ABI Allied Business Intelligence

Bengali Speech Recognition

11

ABASTRACT

This report presents an overview of Automatic Speech Recognition (ASR) for our mother

tongue Bangla It begins with an introduction to speech recognition technology and then it

explains how such systems work and the level of accuracy that can be expected The object of

human speech is not just a way to convey words from one person to another but also to make the

other person to understand the depth of the spoken words These systems have made dramatic

performance leaps in the recent past The aim of this project is to develop software that identifies

human speech with the help of CMU sphinx Speech Recognition API

Bengali Speech Recognition

12

Literature Survey

Today speech technology plays an important role in many applications Speech

technology has moved from research to commercial application Many human machine

interfaces have been invented and applied today in telephone food ordering system airport

information system ticketing system restaurant reservation system etc As a result we have

selected this important field for our project On the other hand most of the languages have a

speech recognition system but our mother tongue Bangla has no proper speech recognition

system this is the main reasons to select this topics At the starting era most of the research

works are done by using Artificial Neural Network (ANN) but as we are using HMM

based technique so some HMM based and related research are mentioned below

Implementation of Speech Recognition System for Bangla (Shammur Absar

Chowdhury-August 2010) We have studied this thesis report within one week and acquire lot of

knowledge about Speech Recognition We are really very thankful to Shammur Absar

Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students

who want to work in these fields [9]

Speech Recognition by Machine A Review (MAAnusuya and SKKatti Department

of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore

India) from this review we have learn lot of things about the types of Speech Recognition

approaches of speech recognition etc [1]

Isolated and Continuous Bangla Speech Recognition Implementation

Performance and application perspective (by Md Abul Hasnat Jabir Mowla and Mumit

Khan- BRAC) ndash We have studied the past works and to the best of us knowledge this work is the

first reported attempt to recognized Bangla speech using HMM Technique so from this

publication we have taken most of us suggestion about the steps to build Speech

Recognition System for our report From here we have learned how to increase the quality

of audio signal given as input by noise elimination process and end detection algorithm

from this paper we have also learned that how feature of a sound is extracted and what are the

parameters taken in feature files we have also learn the algorithm for creating HMM models [8]

Bengali segmented automated speech recognition (Department of Computer Science

and Engineering BRAC University) from this thesis report we have learn about the Vowel and

Consonants phonemes Vowels and Consonants phoneme clusters Voiced and non-voiced stops

and Hidden Markov Model[6]

Recognition of Spoken Letters in Bangla (Abul HasanatMd Rezaul KarimMd

Shahidur Rahman and Md Zafar Iqbal - SUST) Extraction of Bangla Vowel and Representation

in the Vowel Space (Syed Akhter Hossain-East West M Lutfar Rahman-Du and Farruk Ahmed-

NSU) Acoustic Analysis of Bangla Consonants(Firoj Alam S M Murtoza Habib and Mumit

Khan) - From here We have learn the technique used to recognize letters vowels and consonant

basically here we found out the basic steps towards a recognizer and what are the

common steps to build a full functioning recognizer[7]

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 9: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

9

List of Figures

List of Charts

Fig No Name of figures Page

No

137 Overview of Speech Recognition System 17

1410 Applying Hidden Markov Model on Speech Recognition 20

15 Overview of the full System Model 21

212 Audio File Recording Format 24

2251 Testing with Pocket Sphinx 31

2252 Testing with Sphinx4 32

412 Dictionary files with phonetic translation 41

4131 Fileids files with phonetic translation 42

4142 Transcription File 42

Fig No Name of Charts Page

No

322 Experiment Results with Pocket Sphinx 35

3312 Experimental Details with Results for Sphinx 4 Live 37

3322 Experimental Details with Results for Sphinx 4 Audio 39

Bengali Speech Recognition

10

List of Table

No of

table

Name of tables Page

No

12 History of Speech Recognition 15

223 Configuration of Sphinx-traincfg 29-30

321 Experimental details with Results for Pocket Sphinx 34

3311 Test results with Sphinx4 Input Type Microphone 36

3321 Test results with Sphinx4 Input Type Audio 38

71 Speaker Profiles 48

72 Unicode to IPA Chart 49-63

73 Corpus About University Admission Information 64-70

List of Abbreviation amp symbols

ASR Automatic Speech recognition

BSD Berkeley Software Distribution

CMU Carnegie Mellon University

HMM Hidden Markov Model

IPA International Phonetic Alphabet

CMU Principal Component Analysis

ASCII American Standard Code for Information Interchange

MERL Mitsubishi Electric Research Labs

CRBLP Center for Research Bangla Language Processing

D2P Dictionary to pronunciation

SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition

IDE Integrated Development Engine

ABI Allied Business Intelligence

Bengali Speech Recognition

11

ABASTRACT

This report presents an overview of Automatic Speech Recognition (ASR) for our mother

tongue Bangla It begins with an introduction to speech recognition technology and then it

explains how such systems work and the level of accuracy that can be expected The object of

human speech is not just a way to convey words from one person to another but also to make the

other person to understand the depth of the spoken words These systems have made dramatic

performance leaps in the recent past The aim of this project is to develop software that identifies

human speech with the help of CMU sphinx Speech Recognition API

Bengali Speech Recognition

12

Literature Survey

Today speech technology plays an important role in many applications Speech

technology has moved from research to commercial application Many human machine

interfaces have been invented and applied today in telephone food ordering system airport

information system ticketing system restaurant reservation system etc As a result we have

selected this important field for our project On the other hand most of the languages have a

speech recognition system but our mother tongue Bangla has no proper speech recognition

system this is the main reasons to select this topics At the starting era most of the research

works are done by using Artificial Neural Network (ANN) but as we are using HMM

based technique so some HMM based and related research are mentioned below

Implementation of Speech Recognition System for Bangla (Shammur Absar

Chowdhury-August 2010) We have studied this thesis report within one week and acquire lot of

knowledge about Speech Recognition We are really very thankful to Shammur Absar

Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students

who want to work in these fields [9]

Speech Recognition by Machine A Review (MAAnusuya and SKKatti Department

of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore

India) from this review we have learn lot of things about the types of Speech Recognition

approaches of speech recognition etc [1]

Isolated and Continuous Bangla Speech Recognition Implementation

Performance and application perspective (by Md Abul Hasnat Jabir Mowla and Mumit

Khan- BRAC) ndash We have studied the past works and to the best of us knowledge this work is the

first reported attempt to recognized Bangla speech using HMM Technique so from this

publication we have taken most of us suggestion about the steps to build Speech

Recognition System for our report From here we have learned how to increase the quality

of audio signal given as input by noise elimination process and end detection algorithm

from this paper we have also learned that how feature of a sound is extracted and what are the

parameters taken in feature files we have also learn the algorithm for creating HMM models [8]

Bengali segmented automated speech recognition (Department of Computer Science

and Engineering BRAC University) from this thesis report we have learn about the Vowel and

Consonants phonemes Vowels and Consonants phoneme clusters Voiced and non-voiced stops

and Hidden Markov Model[6]

Recognition of Spoken Letters in Bangla (Abul HasanatMd Rezaul KarimMd

Shahidur Rahman and Md Zafar Iqbal - SUST) Extraction of Bangla Vowel and Representation

in the Vowel Space (Syed Akhter Hossain-East West M Lutfar Rahman-Du and Farruk Ahmed-

NSU) Acoustic Analysis of Bangla Consonants(Firoj Alam S M Murtoza Habib and Mumit

Khan) - From here We have learn the technique used to recognize letters vowels and consonant

basically here we found out the basic steps towards a recognizer and what are the

common steps to build a full functioning recognizer[7]

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 10: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

10

List of Table

No of

table

Name of tables Page

No

12 History of Speech Recognition 15

223 Configuration of Sphinx-traincfg 29-30

321 Experimental details with Results for Pocket Sphinx 34

3311 Test results with Sphinx4 Input Type Microphone 36

3321 Test results with Sphinx4 Input Type Audio 38

71 Speaker Profiles 48

72 Unicode to IPA Chart 49-63

73 Corpus About University Admission Information 64-70

List of Abbreviation amp symbols

ASR Automatic Speech recognition

BSD Berkeley Software Distribution

CMU Carnegie Mellon University

HMM Hidden Markov Model

IPA International Phonetic Alphabet

CMU Principal Component Analysis

ASCII American Standard Code for Information Interchange

MERL Mitsubishi Electric Research Labs

CRBLP Center for Research Bangla Language Processing

D2P Dictionary to pronunciation

SBSBSR Sanjoy Bappy Shimul Bengali Speech Recognition

IDE Integrated Development Engine

ABI Allied Business Intelligence

Bengali Speech Recognition

11

ABASTRACT

This report presents an overview of Automatic Speech Recognition (ASR) for our mother

tongue Bangla It begins with an introduction to speech recognition technology and then it

explains how such systems work and the level of accuracy that can be expected The object of

human speech is not just a way to convey words from one person to another but also to make the

other person to understand the depth of the spoken words These systems have made dramatic

performance leaps in the recent past The aim of this project is to develop software that identifies

human speech with the help of CMU sphinx Speech Recognition API

Bengali Speech Recognition

12

Literature Survey

Today speech technology plays an important role in many applications Speech

technology has moved from research to commercial application Many human machine

interfaces have been invented and applied today in telephone food ordering system airport

information system ticketing system restaurant reservation system etc As a result we have

selected this important field for our project On the other hand most of the languages have a

speech recognition system but our mother tongue Bangla has no proper speech recognition

system this is the main reasons to select this topics At the starting era most of the research

works are done by using Artificial Neural Network (ANN) but as we are using HMM

based technique so some HMM based and related research are mentioned below

Implementation of Speech Recognition System for Bangla (Shammur Absar

Chowdhury-August 2010) We have studied this thesis report within one week and acquire lot of

knowledge about Speech Recognition We are really very thankful to Shammur Absar

Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students

who want to work in these fields [9]

Speech Recognition by Machine A Review (MAAnusuya and SKKatti Department

of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore

India) from this review we have learn lot of things about the types of Speech Recognition

approaches of speech recognition etc [1]

Isolated and Continuous Bangla Speech Recognition Implementation

Performance and application perspective (by Md Abul Hasnat Jabir Mowla and Mumit

Khan- BRAC) ndash We have studied the past works and to the best of us knowledge this work is the

first reported attempt to recognized Bangla speech using HMM Technique so from this

publication we have taken most of us suggestion about the steps to build Speech

Recognition System for our report From here we have learned how to increase the quality

of audio signal given as input by noise elimination process and end detection algorithm

from this paper we have also learned that how feature of a sound is extracted and what are the

parameters taken in feature files we have also learn the algorithm for creating HMM models [8]

Bengali segmented automated speech recognition (Department of Computer Science

and Engineering BRAC University) from this thesis report we have learn about the Vowel and

Consonants phonemes Vowels and Consonants phoneme clusters Voiced and non-voiced stops

and Hidden Markov Model[6]

Recognition of Spoken Letters in Bangla (Abul HasanatMd Rezaul KarimMd

Shahidur Rahman and Md Zafar Iqbal - SUST) Extraction of Bangla Vowel and Representation

in the Vowel Space (Syed Akhter Hossain-East West M Lutfar Rahman-Du and Farruk Ahmed-

NSU) Acoustic Analysis of Bangla Consonants(Firoj Alam S M Murtoza Habib and Mumit

Khan) - From here We have learn the technique used to recognize letters vowels and consonant

basically here we found out the basic steps towards a recognizer and what are the

common steps to build a full functioning recognizer[7]

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 11: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

11

ABASTRACT

This report presents an overview of Automatic Speech Recognition (ASR) for our mother

tongue Bangla It begins with an introduction to speech recognition technology and then it

explains how such systems work and the level of accuracy that can be expected The object of

human speech is not just a way to convey words from one person to another but also to make the

other person to understand the depth of the spoken words These systems have made dramatic

performance leaps in the recent past The aim of this project is to develop software that identifies

human speech with the help of CMU sphinx Speech Recognition API

Bengali Speech Recognition

12

Literature Survey

Today speech technology plays an important role in many applications Speech

technology has moved from research to commercial application Many human machine

interfaces have been invented and applied today in telephone food ordering system airport

information system ticketing system restaurant reservation system etc As a result we have

selected this important field for our project On the other hand most of the languages have a

speech recognition system but our mother tongue Bangla has no proper speech recognition

system this is the main reasons to select this topics At the starting era most of the research

works are done by using Artificial Neural Network (ANN) but as we are using HMM

based technique so some HMM based and related research are mentioned below

Implementation of Speech Recognition System for Bangla (Shammur Absar

Chowdhury-August 2010) We have studied this thesis report within one week and acquire lot of

knowledge about Speech Recognition We are really very thankful to Shammur Absar

Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students

who want to work in these fields [9]

Speech Recognition by Machine A Review (MAAnusuya and SKKatti Department

of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore

India) from this review we have learn lot of things about the types of Speech Recognition

approaches of speech recognition etc [1]

Isolated and Continuous Bangla Speech Recognition Implementation

Performance and application perspective (by Md Abul Hasnat Jabir Mowla and Mumit

Khan- BRAC) ndash We have studied the past works and to the best of us knowledge this work is the

first reported attempt to recognized Bangla speech using HMM Technique so from this

publication we have taken most of us suggestion about the steps to build Speech

Recognition System for our report From here we have learned how to increase the quality

of audio signal given as input by noise elimination process and end detection algorithm

from this paper we have also learned that how feature of a sound is extracted and what are the

parameters taken in feature files we have also learn the algorithm for creating HMM models [8]

Bengali segmented automated speech recognition (Department of Computer Science

and Engineering BRAC University) from this thesis report we have learn about the Vowel and

Consonants phonemes Vowels and Consonants phoneme clusters Voiced and non-voiced stops

and Hidden Markov Model[6]

Recognition of Spoken Letters in Bangla (Abul HasanatMd Rezaul KarimMd

Shahidur Rahman and Md Zafar Iqbal - SUST) Extraction of Bangla Vowel and Representation

in the Vowel Space (Syed Akhter Hossain-East West M Lutfar Rahman-Du and Farruk Ahmed-

NSU) Acoustic Analysis of Bangla Consonants(Firoj Alam S M Murtoza Habib and Mumit

Khan) - From here We have learn the technique used to recognize letters vowels and consonant

basically here we found out the basic steps towards a recognizer and what are the

common steps to build a full functioning recognizer[7]

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 12: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

12

Literature Survey

Today speech technology plays an important role in many applications Speech

technology has moved from research to commercial application Many human machine

interfaces have been invented and applied today in telephone food ordering system airport

information system ticketing system restaurant reservation system etc As a result we have

selected this important field for our project On the other hand most of the languages have a

speech recognition system but our mother tongue Bangla has no proper speech recognition

system this is the main reasons to select this topics At the starting era most of the research

works are done by using Artificial Neural Network (ANN) but as we are using HMM

based technique so some HMM based and related research are mentioned below

Implementation of Speech Recognition System for Bangla (Shammur Absar

Chowdhury-August 2010) We have studied this thesis report within one week and acquire lot of

knowledge about Speech Recognition We are really very thankful to Shammur Absar

Chowdhury for writing his thesis report easily and smoothly it is very helpful for new students

who want to work in these fields [9]

Speech Recognition by Machine A Review (MAAnusuya and SKKatti Department

of Computer Science and Engineering Sri Jaya Chamarajendra College of Engineering Mysore

India) from this review we have learn lot of things about the types of Speech Recognition

approaches of speech recognition etc [1]

Isolated and Continuous Bangla Speech Recognition Implementation

Performance and application perspective (by Md Abul Hasnat Jabir Mowla and Mumit

Khan- BRAC) ndash We have studied the past works and to the best of us knowledge this work is the

first reported attempt to recognized Bangla speech using HMM Technique so from this

publication we have taken most of us suggestion about the steps to build Speech

Recognition System for our report From here we have learned how to increase the quality

of audio signal given as input by noise elimination process and end detection algorithm

from this paper we have also learned that how feature of a sound is extracted and what are the

parameters taken in feature files we have also learn the algorithm for creating HMM models [8]

Bengali segmented automated speech recognition (Department of Computer Science

and Engineering BRAC University) from this thesis report we have learn about the Vowel and

Consonants phonemes Vowels and Consonants phoneme clusters Voiced and non-voiced stops

and Hidden Markov Model[6]

Recognition of Spoken Letters in Bangla (Abul HasanatMd Rezaul KarimMd

Shahidur Rahman and Md Zafar Iqbal - SUST) Extraction of Bangla Vowel and Representation

in the Vowel Space (Syed Akhter Hossain-East West M Lutfar Rahman-Du and Farruk Ahmed-

NSU) Acoustic Analysis of Bangla Consonants(Firoj Alam S M Murtoza Habib and Mumit

Khan) - From here We have learn the technique used to recognize letters vowels and consonant

basically here we found out the basic steps towards a recognizer and what are the

common steps to build a full functioning recognizer[7]

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 13: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

13

Chapter 1

INTRODUCTION INTRODUCTION

HISTORY OF SPEECH RECPGNITION

TYPES OF SPEECH RECPGNITION

TERMS AND CONCEPTS

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 14: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

14

11 Introduction

Automatic Speech Recognition (ASR) in terms of machinery is the process of converting

an acoustic signal captured by a microphone or a telephone to a set of words It is a broad term

which means it can recognize almost anybodyrsquos speech and also known as automatic speech

recognition or computer speech recognition which means understanding voice by the computer

and performing any required task On the other hand Speech Recognition Simply is the

process of converting spoken input to text Speech recognition is thus sometimes referred to

as speech-to-text Speech recognition also referred to as voice recognition is software

technology that lets the user control computer functions and dictate text by voice For

example a person can move the cursor with a voice command such as ldquomouse uprdquo We can

control application functions such as opening a file menu and we can create a document such as

letters or reports or start media player by saying ldquoMusicrdquo For this reason many scientists and

researchers are busy with doing works on speech recognition Most of the languages in the world

have speech recognizers of its own But our mother tongue Bengali is not enriched with a speech

recognizers Small research works have been carried on Bengali speech recognizer but it really

does not have a great outcome Implementing continuous speech recognizer for Bengali is our

main goal throughout the project work But developing full blown continious speech recognizer

is a huge task within a short span of time As a result we have selected a domain based

continuous speech recognizer which includes a conversation on university admission process

Throughout the whole period of work we tried to learn about different tools and we chose to use

CMU Sphinx4 as speech recognition API because itrsquos open source software and it has high

accuracy There are many high quality and widely used software are available for this work But

these types of software are so costly and need Berkeley Software Distribution (BSD) license

The ultimate goal of ASR research is to allow a computer to recognize in real-time with 100

accuracy all words that are intelligibly spoken by any person independent of vocabulary size

noise speaker characteristics or accent [9]

12 History of Speech Recognition

While ATampT Bell Laboratories developed a primitive device that could recognize speech

in the 1940s researchers knew that the widespread use of speech recognition would depend on

the ability to accurately and consistently perceive subtle and complex verbal input Thus in the

1960s researchers turned their focus towards a series of smaller goals that would aid in

developing the larger speech recognition system As a first step developers created a device that

would use discrete speech verbal stimuli punctuated by small pauses However in the 1970s

continuous speech recognition which does not require the user to pause between words began

This technology became functional during the 1980s and is still being developed and refined

today Speech Recognition Systems have become so advanced and mainstream that business and

health care professionals are turning to speech recognition solutions for everything from

providing telephone support to writing medical reports Technological advances have made

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 15: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

15

speech recognition software and devices more functional and user friendly with most

contemporary products performing tasks with over 90 percent accuracy

According to the figure provided by industry satisfying the needs of consumers and

businesses by simplifying customer interaction increasing efficiency and reducing operating

costs speech recognition is used in a wide range of applications Furthermore Allied Business

Intelligence (ABI) the increased popularity of speech recognition will push revenues from $677

million in 2002 to an estimated $53 Billion by 2008 Indeed recent advances in speech

recognition software are creating a dynamic environment since this technology appeals to

anyone who needs or wants a hands-free approach to computing tasks As the merger of large

vocabularies and continuous recognition continues look for more and more companies to move

toward speech recognition and watch the industry take its place as a leader in the technology

sector [1]

1936 ATampTs Bell Labs produced the first electronic speech synthesizer called the Voder

1970 HMM approach to speech amp voice recognition was invented by Lenny Baum of

Princeton University

1971 DARPA established

1982 Dragon Systems was founded

1984 Speech Works the leading provider of over-the-telephone automated speech

recognition (ASR) solutions was founded

1995 Dragon released discrete word dictation-level speech recognition software It was the

first time dictation speech amp voice recognition technology was available to consumers

1997 Dragon introduced Naturally Speaking the first continuous speech dictation

software available

1998 Microsoft invested $45 million to allow Microsoft to use speech amp voice recognition

technology in their systems

2000 Lernout amp Hauspie acquired Dragon Systems for approximately $460 million

2003 Scan Soft Ships Dragon Naturally Speaking 7 Medical Lowers Healthcare Costs

through Highly Accurate Speech Recognition

Table 12 History of Speech Recognition

13 Types of Speech Recognition

Speech recognition systems can be separated in different classes by describing

what types of utterances they have the ability to recognize These classes are classified as

the following [1]

131 Isolated Words Isolated word recognizers usually require each utterance to

have quiet (lack of an audio signal) on both sides of the sample window It accepts single

words or single utterance at a time These systems have ListenNot-Listen states where they

require the speaker to wait between utterances (usually doing processing during the pauses)

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 16: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

16

Isolated Utterance might be a better name for this class Simply Isolated Words are the single

words such as me You Go etc

132 Connected Words Connected word systems (or more correctly connected

utterances) are similar to isolated words but allows separate utterances to be run-together

with a minimal pause between them Such as- I eat rice

133 Continuous Speech Continuous speech recognizers allow users to speak almost

naturally while the computer determines the content (Basically its computer

dictation) Recognizers with continuous speech capabilities are some of the most

difficult to create because they utilize special methods to determine utterance boundaries

134 Spontaneous Speech At a basic level it can be thought of as speech that is natural

sounding and not rehearsed An ASR system with spontaneous speech ability should be able

to handle a variety of natural speech features such as words being run together ums and

ahs and even slight stutters

Based on speaker there are two type of speech recognition Those are

1 Speakerndashdependent 2 Speakerndashindependent

135 Speakerndashdependent Speech recognition systems that require a user to train the

system to hisher voice are known as speaker-dependent systems If you are familiar with

desktop dictation systems most are speaker dependent like IBM via Voice Because they

operate on very large vocabularies dictation systems perform much better when the

speaker has spent the time to train the system to hisher voice Speakerndashdependent software

is commonly used for dictation It works by learning the unique characteristics of a single

persons voice in a way similar to voice recognition New users must first train the software by

speaking to it so the computer can analyze how the person talks This often means users have to

read a few pages of text to the computer before they can use the speech recognition software

136 Speakerndashindependent Speech recognition systems that do not require a user to

train the system are known as speaker-independent systems Speech recognition in the Voice

XML word must be speaker-independent Speakerndashindependent software is more commonly

found in telephone applications It is designed to recognize anyones voice so no training is

involved This means it is the only real option for applications such as interactive voice response

systems where businesses cant ask callers to read pages of text before using the system The

downside is that speakerndashindependent software is generally less accurate than speakerndashdependent

software

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 17: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

17

137 Overview of Speech Recognition System

Fig 137 Overview of Speech Recognition System

14 Terms and Concepts

Following are the some basic terms and concepts that are fundamental to speech

recognition It is important to have a good understanding of these concepts [9][10]

141 Utterances

An utterance is something you say It can be one word or it can be a series of words For

example ldquoWordrdquo ldquoMicrosoft Wordrdquo or ldquoIrsquod like to run Microsoft Wordrdquo are all examples of

possible utterances On the other hands an utterance is any stream of speech between two

periods of silence Utterances are sent to the speech engine to be processed Silence in speech

recognition is almost as important as what is spoken because silence delineates the start and end

of an utterance The speech recognition engine is listening for speech input When the engine

detects audio input in other words a lack of silence the beginning of an utterance is signaled

Similarly when the engine detects a certain amount of silence following the audio the end of the

utterance occurs

142 Pronunciation

You have heard the word pronunciation when it pertains to learning any language What is

pronunciation and what are some of the fundamental aspects of this important part of learning

English In any language pronunciation pertains to the sounds that are produced to make

meaning There are aspects of speech that go beyond that individual sound that makes the

language unique phrasing stress intonation timing and rhythm Your voice is then projected to

communicate what you want to say Add to that cultural nuances gestures and local expressions

and you speak that immediately tells something about yourself to the people around you When

you are just learning a new language it would be easy to avoid speaking in public but that is not

the best choice because you do not want to experience social isolation It does not seem fair but

people can be judged by the way they speak and can be seen as uneducated incompetent or lack

knowledge All because the listener is reacting to the pronunciation and not what you are trying

to communicate The speech recognition engine uses all sorts of data statistical models and

algorithms to convert spoken input into text One piece of information that the speech

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 18: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

18

recognition engine uses to process a word is its pronunciation which represents what the speech

engine thinks a word should sound like Words can have multiple pronunciations associated with

them For example the word ldquotherdquo has at least two pronunciations in the US English language

ldquotheerdquo and ldquothuhrdquo

143 Grammars

Grammars define the domain or context within which the recognition engine works The

engine compares the current utterance against the words and phrases in the active

grammars If the user says something that is not in the grammar the speech engine will not be

able to understand it correctly So usually speech engines have a very vast grammar

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the

Speech Recognition system Generally smaller vocabularies are easier for a computer to

recognize while larger vocabularies are more difficult Unlike normal dictionaries each

entry doesnt have to be a single word

144 Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR

system Generally smaller vocabularies are easier for a computer to recognize while larger

vocabularies are more difficult Unlike normal dictionaries each entry doesnt have to be a single

word They can be as long as a sentence or two Smaller vocabularies can have as few as 1 or 2

recognized utterances (eg ldquoWake Up) while very large vocabularies can have a hundred

thousand or more

145 Training

Some speech recognizers have the ability to adapt to a speaker When the system has this ability

it may allow training to take place An ASR (Automatic Speech Recognition) system is trained

by having the speaker repeat standard or common phrases and adjusting its comparison

algorithms to match that particular speaker Training a recognizer usually improves its accuracy

Training can also be used by speakers that have difficulty speaking or pronouncing

certain words As long as the speaker can consistently repeat an utterance ASR systems with

training should be able to adapt

146 Accuracy

The ability of a recognizer can be examined by measuring its accuracy minus or how well it

recognizes utterances The performance of a speech recognition system is measurable Perhaps

the most widely used measurement is accuracy It is typically a quantitative measurement and

can be calculated in several ways This measurement is useful in validating application design

For example if the user said yes the engine returned yes and the YES action was

executed it is clear that the desired result was achieved But what happens if the engine

returns text that does not exactly match the utterance For example what if the user

said nope the engine returned no yet the NO action was executed Should that be

considered a successful dialog The answer to that question is yes because the desired result was

achieved

147 A Language Dictionary

Accepted Words in the Language are mapped to sequences of sound units representing

pronunciation sometimes includes syllabification and stress

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 19: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

19

148 A Filler Dictionary

Non-Speech sounds are mapped to corresponding non-speech or speech like sound units

149 Phone

Way of representing the pronunciation of words in terms of sound units The standard system for

representing phones is the International Phonetic Alphabet or IPA English Language use

transcription system that uses ASCII letters whereas Bangla uses Unicode letters

1410 HMM

Hidden Markov Models can be seem as finite state machines where for each sequence unit

observation there is a state transition and for each state there is a output symbol

emission Transitions among the states are governed by a set of probabilities called transition

probabilities In a particular state an outcome or observation can be generated according to the

associated probability distribution It is only the outcome not the state visible to an external

observer and therefore states are ``hidden to the outside hence the name Hidden Markov

Model

On the other hand Hidden Markov Model (HMM) is a statistical model in which the system

being modeled assumed to be a Markov process with unknown parameters and the challenge is

to determine the hidden parameters from an observation parameters In speech recognition

process after our voice is recorded it will be divided into many frames that we need to process

in order to generate the sentence in text form Each frame is represented as state group of some

states is represented as phoneme and group of some phonemes is represented as word that we

need to recognize In database known as linguist model we store the reference value of state

phoneme and word in order to compare with the observed data (voice)By applying HMM we

construct a statistical model on each phone that its states are assigned specific possibilities in

comparison with reference value The possibility of each state depends on itself and the previous

one The goal of speech recognition system is to find out the sequence of states that has the

maximum probability Because the HMM theory is very complicated so we donrsquot go very detail

about that If you want to learn more you can see at the Appendix A

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 20: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

20

Fig 1410 Applying Hidden Markov Model on Speech Recognition

1411 Language Model

The language model describes the likelihood probability or penalty taken when a sequence or

collection of words is seen A language model is used to restrict word search It defines which

word could follow previously recognized words and helps to significantly restrict the matching

process by stripping words that are not probable Most common language models used are n-

gram language models-these contain statistics of word sequences-and finite state language

models-these define speech sequences by finite state automation sometimes with weights

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 21: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

21

15 Overview of the Full System

Figure 15 Overview of the Full System Model

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 22: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

22

Chapter 2

METHODOLOGY Data Preparation

Setting up the System Environment

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 23: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

23

21 Data Preparation

We have to make some important files that are required for training and also for testing

We have already mentioned that our project is about domain based recognition application

Domain based means a particular topic containing small amount of data We have selected fifty

sentences for our recognition application and files in below are created based on these data The

required files are

Corpus

Audio files

Dictionary file

Phone file

Language Model file lm format

Language Model file DMP format

Transcription file

Fileids file

Filler file

211 Corpus

The Corpus is just a list of sentences that use to train the language model and simply we can tell

Corpus is the collection of sentences those we are want to recognize in our machine For our

project we have also collected some important sentences according to our domain Some

sentences of our project are followinghellip

hellip

212 Audio files

After collecting the corpus next step is to collect the audio file of this corpus with the (wav) or

(sph) format During recording session the following parameters of the wave file has been

maintained throughout

bull Sampling rate of the audio 16 kHz

bull Bit rate (bits per sample) 16

bull Channel mono (single channel)

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 24: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

24

Fig 212 Audio File Recording Format

For this work 16 kHz sample rate has been chosen because it provides more accurate high

frequency information and 16 bit per sample will divides the element position in to 65536

possible values After the recording the splitting of the audio files per sentence has been done

manually using recording software for our project we are using WavePad sound editor and sound

file in a wav format where each wav file has been named by using speaker id and sentence id

For example An audio file of our project is

01_01wav stands as

Speaker Id 01 Sentence Id 01

When we have collected the audio from a speaker we have saved the personal information of

this speaker like

Name Age Gender Audio collected environment

Some other information like

Environmental condition of recording (for example class room condition number of

students present sources of noise like fan generatorrsquos sound etc)

Technical details of device (pc microphone)

Date and time of recording has also been noted down

213 Dictionary file

Simply dictionary file is the list of words which we get from our corpus file and then we need to

find the pronunciation of those words such as -AA P NA KE For this work we need

software which gives us dictionary file to pronunciation file Also a software grapheme to

phoneme (G2P) is developed by CRBLP which gives us pronunciation with the Unicode IPA

system but we need ASCII format As a result for our project we have developed a software D2P

(Dictionary to pronunciation) which gives us the pronunciation file from the dictionary file Our

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 25: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

25

dictionary file contains 128 words The format of dictionary file will be (dic) The name of our

dictionary file is sbsbsrdic and some contents of dictionary file for our project is

AA CH E AA P N AA K E AA P N I AA M I I CCHA U K

hellip Note

All phonemes are in capital letter such as = AA P N I

File format is (dic)

File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

214 Phone file

Phone file is the list of phoneme within words such as ldquoAA P N Irdquo Here is 4 phonemes and it is

a simple text file that tells a trainer what phonemes are part of the training set The file has one

phone in each line no duplicity is allowed This file can be generated using a small program

written for this project which takes the dic file as input and gives phone file as output

For our project the file name is sbsbsrphone Some contents of phone file for our project is

A

AA

B

BH

C

CCHA

CH

hellip Note

All phones are in capital letter File format is phone File encoding is utf_8 without BOM

Word can not be repeated

A blank line is required in the end of file (ie an extra line)

Silence phoneme ldquoSILrdquo also included in phone file

All phoneme in dic file are present in phone file without repetition

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 26: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

26

215 Language Model file lm format

A language model assigns a probability to a piece of unseen text based on some training data

The language model file is plain text The format is the commonly used arpa format which is

standard in speech recognition research It lists 1- 2- and 3-grams along with their likelihood

(the first field) and a back-off factor (the third field) To build this file CMU Imtoolkit is used

Imtool is a web based tool that allows users to quickly compile text-based components needed

for using an ASR decoder To do this a corpus is needed which in this case means a set

of sentences (or more precisely utterances) that is expected for recognition system to be able

to handleThe corpus needs to be in the form of an ASCII text file but with new advanced

version Unicode text file is also supported with one sentence to a line Upload this file click the

compile button This will give a set of lexical (pronunciation dictionary) and language

modeling files Here the only file used is LM file as Pronunciation dictionary should be built as

stated above The tool is best for small domains For our project the file name is sbsbsrlm and

file format is lm Some contents of language model file for our project is

-20719 -02626

-20719 -02861

-20719 -02973

-17709 -02936

-20719 -02626

-17709 এ -02861

-20719 -02626

-20719 ও -02973

hellip

216 Language Model file DMP format

We also need Language Model file DMP format for training in sphinx4 We are using Linux

environment for getting the Language Model file DMP format from the Language Model file

lm format We have used following commands in Linux terminal for getting the Language

Model file with DMP format

sphinx_lm_convert -i modellm -o modelDMP

Here modellm is the name of language model file with lm format and modeldmp is the name of

language model file with DMP format For our project it is

sphinx_lm_convert -i sbsbsrlm -o sbsbsrDMP

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 27: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

27

217 Transcription file

A transcript is needed to represent what the speakers are saying in the audio file So in a file the

dialogue of the speaker noted exactly the same precise way it has been recorded with

silence tag (starting tag ltsgt ending tag ltsgt) followed by the file ID which represent the

utterance This file is known as transcription file and basically there are two types of

transcription file One of them is used to train the system and another is to testing Named the

files using the project name here the name of the project is ldquosbsbsrrdquo so the train file name is

sbsbsr_traintranscription and test file name is sbsbsr_testtranscription Some contents of

transcription file for our project is

ltsgt ltsgt (01_01) ltsgt ltsgt (01_02) ltsgt ltsgt (01_03) ltsgt ltsgt (01_04)

hellip Note

File format is transcription File encoding is utf_8 without BOM

Sentence can not be repeated

A blank line is required in the end of file (ie an extra line)

218 Fileids file

The Fileids files contain the name of all audio file without wav or sph extension Two types of

Fileids file one for training and other for testing The name of training file for our project is

sbsbsr_trainfileids and the name of testing file for our project is sbsbsr_testfileids

For Example

sbsbsr_trainsanjoysanjoy101_01

sbsbsr_ train sanjoysanjoy101_02

sbsbsr_ train sanjoysanjoy101_03

helliphellip

219 Filler file

Filler file contains userrsquos definition of any background noise emerging in recording

database and it is dictionary where a non-speech sounds are mapped to corresponding non

speech sound units This file is named as sbsbsrfiller for our project

For Example

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 28: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

28

ltsgt SIL ltsilgt SIL ltsgt SIL

Note that the words ltsgt ltsgt and ltsilgt are treated as special words and are required to be

present in the filler dictionary At least one of these must be mapped on to a phone called SIL

ltsgt symbolizes ldquobeginning of speechrdquo

ltsgt symbolizes ldquoend of speechrdquo

ltsilgt symbolizes ldquosilence in speechrdquo

22 Setting up the System Environment

221 Software Requirements

We did the training part in Linux operating system For training the recognition engine from

CMU sphinx we need two software minus sphinx base and sphinx train We have collected it from

CMU sphinx web site For installing these twos oftware first we need to install some dependence

software in Ubuntu distribution of Linux such as Perl and C compiler (gcc) [14]

We installed these two softwares by the following commands in Linux terminal

Perl

sudo apt-get install perl

GCC

sudo apt-get install gcc

222 Trainer Setup

We already know that for setting up the trainer we need two software sphinx base and

sphinx train After downloading the software we have been decompressed it in a folder in Linux

it can be any folder We did it in Linux root folder After decompressing these softwarersquos we can

install them by the following commands in terminal [14]

Sphinxbase

cd sphinxbase

sudo configure

sudo make

sudo make install

Sphinxtrain

cd Sphinxtrain

sudo configure

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 29: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

29

sudo make

sudo make install

223 Project Folder Setup

We have created the system environment for training We have created a project folder

where sphinx train will create the trained files or acoustic model First we need to enter to the

root directory where our installed sphinx base and sphinx train folder are placed Here we have

created a folder We gave the folder name is ldquosbsbsrrdquo After creating the folder we need to open

terminal and go to created folder sbsbsr from terminal Then we have created a project task for

sphinx train by the following command in terminal

sphinxtrainscripts_plsetup_SphinxTrainpl -task sbsbsr

Executing this command from terminal will create various folder in sbsbsr such as ldquoetcrdquo

ldquowavrdquo ldquomodel parametersrdquo etc

Now the time to copy files those we have created in data preparation part We have to

copy dic filler phone transcript fileids lm lmdmp in ldquoetcrdquo folder and our collected audio into

ldquowavrdquo folder Now we need to change some parameters for training in sphinx_traincfg file

created automatically in ldquoetcrdquo folder when creating the project task We have changed some

parameters written below in sphinx_traincfg file

Parameter Value Before Value After

$CFG_WAVFILE_EXTENSION sph wav

$CFG_WAVFILE_TYPE nistmswavraw raw

$CFG_FEATURE 2s_c_d_dd 1s_c_d_dd

$CFG_FINAL_NUM_DENSITIES 8 1

$CFG_STATESPERHMM 6 3

$CFG_N_TIED_STATES 100 100

Table 223 Configuration of Sphinx-traincfg

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 30: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

30

By these changes we have finished the project folder setup

224 Training the Acoustic Model

In training process first we have to convert our collected raw speech audio data into mfc files

Thatrsquos why again we opened project directory in Linux terminal and ran the feature extraction

command in terminal Command we executed for this task is [14]

perl scripts_plmake_featspl -ctl etcsbsbsr_trainfileids

Executing this command made all wav files into mfc files in ldquofeatrdquo directory under project

folder ldquosbsbsrrdquo

Now we are ready to execute the main training command in Linux terminal that will create the

acoustic model For this task still we have to stay in project directory in terminal and execute the

following command in terminal

perl scripts_plRunAllpl

By executing this command will train the acoustic model and we will find the trained acoustic

model The model files are placed in ldquomodel_parametersrdquo directory under project folder

ldquosbsbsrrdquo

225 Testing part

We tested our model by two recognizers from CMU sphinx They are Pocketsphinx and Sphinx4

Pocketsphinx is a lightweight recognizer and Sphinx4 adjustable modifiable recognizer

2251Testing with Pocketsphinx

First we have to download this tool from CMU Sphinx site After downloading this tool we will

go to the root directory where we have installed sphinxbase and sphinxtrain Here we extract the

downloaded pocket sphinx file After extracting we install this software by these following

commands in Linux terminal

configure

make

After installing the software we have to go into our project folder and execute a command from

terminal to make folder structure for training For this task the command is

pocketsphinxscriptssetup_sphinxpl -task sbsbsr

Then we put some testing audio in ldquowavrdquo directory because pocketsphinx recognize with input

as audio file Also we have to copy test fileids and transcript file in ldquoetcrdquo folder For decoding or

testing our model from audio with pocket sphinx first we need feature files of audio files We

can make this by the following command in Linux terminal

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 31: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

31

perl scripts_plmake_featspl -ctl etcsbsbsr_testfileids

After that we execute the main command for decoding or testing in Linux terminal

perl scripts_pldecodeslavepl

Executing this command will decode the corresponding speech of input audio with the help of

our trained acoustic model We can find the result of testing in ldquoresultrdquo folder under project

folder For our fifty sentences trained model the result is below

Figure 2251 Testing with Pocketsphinx

2252 Testing with Sphinx4

Sphinx4 is an adjustable modifiable recognizer written in Java We use this sphinx4 java library

to test our trained model in windows 7 operating system We need two softwares to test our

model with sphinx4 decoder They are sphinx4 and eclipse IDE After installing the eclipse IDE

in windows we have to download sphinx4 from CMU sphinx site After downloading sphinx4

we extract the zip file in any place in windows Then we have to create a new java project in

eclipse and make a java file with the help of demo application from CMU sphinx After that we

need to add the files shown in below from our previous project in eclipse project [11]

ldquosbsbsrcd_cont_100rdquo folder from sbsbsrmodel_parmeters

dic

lmDMP

filler

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 32: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

32

We create a cofigxml file in eclipse project to tell the configuration to recognizer and say where

the required model files are placed We can create this cofigxml with the help of config file in

sphinx4 demo application We need to add four java jar files jsjar jsapijar sphinx4jar tagsjar

from sphinx4lib directory to our project Now our java project is ready to build and run After

building our project we run the project and can test with live voice input from microphone For

our ten sentences trained model result is

Figure 2252 Testing with Sphinx4

We can build various applications with the help of sphinx4 by using java language We build

some application using sphinx4 that will be discussed later

Chapter 3

TESTING amp

PERFORMANCE

EVALUATION

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 33: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

33

31 Testing and Performance Evaluation

We tried to test our model in various environments such as open room closed room

university lab room common room etc We have completed our testing using audio inputs of six

test speaker For the live testing we are using microphone in different environments [9]We are

completed our test using two different kinds of decoder those are

1 Pocket Sphinx

2 Sphinx4

32 Test Results with Pocket Sphinx

Experiment No Details Results

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 34: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

34

Experiment 01 Using Trained Data Set

Number of Speaker 5

Male 4

Female 1

Total Words 1025

Correct 975

Errors 98

Total Percent correct = 9512

Error = 956

Accuracy = 9044

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 591

Errors 39

Total Percent correct = 9610

Error = 634

Accuracy = 9366

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 3

Female 0

Total Words 615

Correct 601

Errors 22

Total Percent correct = 9772

Error = 358

Accuracy = 9642

Experiment 04 Using Trained Data Set

Number of Speaker 5

Male 2

Female 3

Total Words 1025

Correct 938

Errors 184

Total Percent correct = 9151

Error = 1795

Accuracy = 8205

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 0

Female 3

Total Words 615

Correct 545

Errors 125

Total Percent correct = 8862

Error = 2033

Accuracy = 7967

Table 321 Experimental details with Results for Pocket Sphinx

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 35: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

35

Chart 322 Experiment Results with Pocket Sphinx

Average Accuracy = 8844

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 36: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

36

33 Test Results with Sphinx4

331 Input Type Microphone

Experiment No Details Results

Experiment 01 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 156

Correct Words154

Errors 2

Percent of Correct 98

Errors 2

Accuracy 98

Experiment 02 User Type Untrained

Number of Speaker 3

Environment Lab Room

Speaker Type Male

Input Device Microphone

Number of Words 130

Correct Words 120

Errors10

Percent of Correct 90

Errors 10

Accuracy 90

Experiment 03 User Type Untrained

Number of Speaker 3

Environment University Campus

Speaker Type Male

Input Device Microphone

Number of Words 140

Correct Words 122

Errors18

Percent of Correct 82

Errors 18

Accuracy 82

Experiment 04 User Type Untrained

Number of Speaker 2

Environment University Campus

Speaker Type Female

Input Device Microphone

Number of Words 108

Correct Words 95

Errors13

Percent of Correct 87

Errors 13

Accuracy 87

Experiment 05 User Type Trained

Number of Speaker 3

Environment University floor

Speaker Type Female

Input Device Microphone

Number of Words 126

Correct Words 115

Errors11

Percent of Correct 89

Errors 11

Accuracy 89

Experiment 06 User Type Trained

Number of Speaker 2

Environment Closed Room

Speaker Type Male

Input Device Microphone

Number of Words 128

Correct Words 127

Errors1

Percent of Correct 99

Errors 1

Accuracy 99

Table 3311 Experimental Details with Results for Sphinx 4 Live

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 37: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

37

Chart 3312 Experiment Results with Sphinx-4 Live Input

Average Accuracy = 9083

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 38: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

38

332 Input Type Audio

Experiment No Details Results

Experiment 01 Using Trained Data Set

Number of Speaker 3

Male 3

Female0

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8571

Error = 1571

Accuracy = 8571

Experiment 02 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 188

Errors 22

Total Percent correct = 7714

Error = 22

Accuracy = 7714

Experiment 03 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 182

Errors 28

Total Percent correct = 7238

Error = 28

Accuracy = 7238

Experiment 04 Using Trained Data Set

Number of Speaker 3

Male 2

Female1

Total Words 210

Correct 194

Errors 16

Total Percent correct = 8444

Error = 16

Accuracy = 8444

Experiment 05 Using Trained Data Set

Number of Speaker 3

Male 1

Female2

Total Words 210

Correct 184

Errors 26

Total Percent correct = 7476

Error = 26

Accuracy = 7476

Table 3321 Experimental Details with Results for Sphinx 4 Audio

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 39: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

39

Chart 3322 Experiment Results with Sphinx-4 Audio Input

Average Accuracy = 7888

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 40: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

40

Chapter 4

APPLICATION amp

DEVELOPING Reveiw of Developed Application

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 41: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

41

41 Review of Some Developed Recognition Application

We developed four applicationsThey are dictation applications phonetic translator

training file creator and desktop command type application

411 Dictation Application

We write this application with the help sphinx4 demo application The main objective of this

application is to recognize sentences Actually this is our main objective in this project

412 Phonetic Translator

For training the acoustic model we need a file called dic In this file all training words and

their pronunciation are placed We have made these pronunciations several times when training

acoustic model experimentally As time goes on we think about an automatic pronunciation or

phonetic translation maker and this software is the implementation of that thinking First we

made a database where all phonemes and their corresponding letters are stored We took help

from IPA chart and various thesis papers [1] [9] [10] to make this database However various

sources define phonemes in different ways Even all lettersrsquo phoneme is not defined Thatrsquos why

we personally define some phonemes for some consonant and conjunct letters After making this

database we made phonetic translator to give phonetic translation of Bengali words with the help

of our created database

Fig 412 Dictionary files with phonetic translation

413 Training File Creator

For training the acoustic model we also need fileids and transcript file These two files contain

information about training audio file paths and their corresponding sentences Before creating

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 42: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

42

this program we have to make these two files manually But after creating our software we can

make these two big files automatically within moments For example if we have 8000

Fig 4131 Fileids files with phonetic translation

audio files then we have to write 8000 audio file paths in fileids and 8000 audio file names and

theirrsquos corresponding sentences in transcript file But now we can create these two files

automatically if we provide root folder name of audio file and sentence corpus file to this

software as input

Fig 4132 Transcription File

414 Command Application

We make a simple voice command application By using Bengali word as voice command this

application do some common task such as opening my computer left click right click etc

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 43: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

43

Chapter 5

LIMITATION amp

FUTURE WORK LINITATION

FUTUR WORK

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 44: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

44

Limitation

In our project we have some limitation in some specific tasks The system of our Project

has been built on small data for time consistency We have selected a domain about University

admission information for new comer students with 185 sentences But it was difficult to collect

lots of audio from 16 speakers with a short span of time As a result we have selected 50

sentences from 185 sentences for training But more speakers are needed for getting more

accurate results For creating dictionary file we have also faced some problemsBecause our

Bengali phoneme list is not declared accurately and we donrsquot know exact number of phonemes in

our Bengali language as different researchers said about different number of phonemes The

performance of system depends on speaker pronunciation environment and microphone It

recognizes the sentences accurately when speaker speak the sentences loudly and clearly and

sometimes it cannot recognize the sentences accurately because of slowly speaking and

pronunciation problem We created a program for automatically generated pronunciation of a

Bengali word But this software is not working properly because of encoding problem As

accurate phoneme is a prerequisite for good pronunciation thatrsquos why if we have accurate

number of phonemes then we can hope a good output from this software

Future Works

We have done implementation of Bengali Speech Recognition for small data size In

future we will increase our data size for creating a complete model and we have a plan to

increase its capability to recognize speech more accurately and enhance its vocabulary We also

have developed software for making dictionary file fileids file and transcription file We want to

make a user friendly stand-alone GUI application for writing Bengali language We also have an

intention to develop a complete desktop command type application for Bengali Still training and

creating a model depend on developer thatrsquos why we have a plan to make an automatic trainer

that can be used by a normal user By using this automatic trainer user will be able to train any

sentence with the corresponding audio We also want to integrate this system to various

document type applications for writing Bengali sentence by just uttering the sentence We want

to make voice respond type application It will work like a user asking the software to give

answer of his question and software will give the predefined answer of this question For making

a good recognition application we need a lot of audio So we want to develop a website for

collecting audio from people From this website we will be able to collect audio and using this

audio we will enrich our recognition application

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 45: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

45

Chapter 6

C ONCLUSION amp

REFERENCES CONCLUSION

REFERENCES

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 46: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

46

Conclusion

Speech is the primary and the most convenient means of communication between people

Lot of research in the field of ASR is being carried out for English Hindi Urdu Arabic

Japanese languages and so on But in our mother tongue Bengali is still in beginner level in this

field So we tried to learn about this field and to develop some tools to recognize Bengali

language We tried to discuss about our objectives various tools we used and process of speech

recognition through this whole report But our developed tools are in preliminary level For

making good and complete recognition application lots of improvement required such as we

need a big training database lots of speakers audio with low noise etc Still no speech

recognizer is 100 accurate But if we can improve the requirements of a good recognizer and

can train our system more accurately then the result of the system will be enough to achieve our

goal

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 47: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

47

References

[1] MAAnusuya amp SKKatti ldquoSpeech Recognition by Machine A Reviewrdquo (IJCSIS)

International Journal of Computer Science and Information Security Vol 6 No 3 2009

[2] Morched Derbali MU Tasem Jarrah Mohd Taib Wahid ldquoA Review of Speech Recognition

with Sphinx Engine in Language Detectionrdquo Journal of Theoretical and Applied Information

Technology Vol 40 No2 2005 - 2012

[3] L Rabiner amp B Juang ldquoFundamentals of Speech Recognitionrdquo Prentice Hall 1993

[4] Daniel Jurafsky and James HMartin ldquoAn Introduction to Natural Language Processing

Computational Linguistics and Speech Recognitionrdquo Prentice Hall 2000

[5] LR Rabiner and RW Schafer ldquoDigital Processing of Speech Signalrdquo Prentice Hall 1978

[6] A K M Mahmudul Hoque ldquoBengali Segmented Automatic Speech Recognitionrdquo

BRACU 2006

[7] Abul Hasanat Md Rezaul Karim Md Shahidur Rahman and Md Zafar Iqbal ldquoRecognition

of Spoken Letters in Banglardquo banglacomputingnet 2002

[8] Md AbulHasnat Jabir Mowla Mumit Khan ldquoIsolated and Continuous Bangla Speech

Recognition Implementation Performance and application perspectiverdquo BRACU 2007

[9] Shammur Absar Chowdhury ldquoImplementation of Speech Recognition System for Banglardquo

BRACU August 2010

[10] Qqbal SO Shahzad ldquoSpeech Recognition Systemrdquo Iqra University March 2009

[11] Tran Viet KhairdquoSphinx4 Adaptation to Vietnamese Language Vietnamese Automatic

Digit Recognitionrdquo Bo Xuan Tu Hochiminh cityVietnam 2008

[12] Sadaoki Furui ldquoSpeech-to-Text and Speech-to-Speech Summarization of Spontaneous

Speechrdquo IEEE 2004

[13] M S Islam ldquoResearch on Bangla Language Processing in Bangladesh Progress and

Challengesrdquo BUET 2009

[14] P Foster T Schalk ldquoSpeech Recognition The Complete Practical Reference Guiderdquo

1993 ISBN 0936648392

[15] H Satori M Harti and N Chenfour ldquoIntroduction to Arabic Speech Recognition Using

CMUS Sphinx Systemrdquo 2007

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 48: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

48

Fig 71 Speaker Profiles

Speaker

ID

Name Age Gender District Environment InstitutionOther

02 Bappy 23 Male Sylhet Closed Room Leading University

03 Bijoy 23 Male Moulovibazar Closed Room MC College

04 Dola 24 Female Chittagong Department Leading University

05 Falguny 24 Female Sylhet Open Space Leading University

06 Lovely 20 Female Sylhet Class Room Lotifa Shofi

Chowdhury Mohila

College

07 Mazed 23 Male Moulovibazar Closed Room Leading University

08 Moni 23 Female Sylhet Lab Leading University

09 Pinku 23 Male Sylhet Closed Room Sylhet Govt

College

10 Polash 24 Male Feni Cafeteria Leading University

11 Pritom 20 Male Sylhet Closed Room Leading University

12 Razib 23 Male Sylhet Cafeteria Leading University

13 Rumi 22 Female Sylhet Lab Leading University

14 Sanjoy 23 Male Sylhet Closed Room Leading University

15 Shimul 23 Male Sylhet Closed Room Leading Univerity

16 Sumit 22 Male Shunamgonj Closed Room Madan Muhon

College

APENDICES

Speaker Profile

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 49: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

49

Unicode to IPA Chart

Bangla Pnoneme (বযজঞনবরণ) IPA

ক K

খ KH

গ G

ঘ GH

ঙ NG

চ C

ছ CH

জ J

ঝ JH

ঞ NIO

ট T

ঠ TH

ড D

ঢ DH

ণ N

ত TA

থ TO

দ DA

ধ DO

ন N

প P

ফ PH

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 50: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

50

ব B

ভ BH

ম M

য Z

র R

ল L

শ SH

ষ SH

স S

হ H

ড় RA

ঢ় RH

য় Y

ৎ T``

NG

^

Bangla Pnoneme IPA

AA

I

II

U

UU

RI

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 51: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

51

E

OI

O

OU

Bangla Pnoneme ( ) IPA

ব- W

য- Y

র- R

ম- M

RR

Bangla Pnoneme (নামবার) IPA

শনয 0

এক 1

দই 2

তিন 3

চার 4

পাচ 5

ছয় 6

সাত 7

আট 8

নয় 9

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 52: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

52

Bangla Pnoneme (যকতবরণ) IPA

KK

KT

KT

KTR

KW

KM

KY

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

CNG

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 53: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

53

CY

GN

GNY

GW

GM

GY

GR

GL

KR

KL

KKH

KKHW

KKHN

KKHM

KKHY

KS

KHY

KHR

GN

GDH

NGM

CC

CCH

CCHW

CCHR

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 54: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

54

CNG

CY

JJ

JJW

JJH

GG

JW

JY

JR

NC

NCH

NJ

NJH

TT

TT

TTW

TTH

TN

TW

TM

TMY

TY

TR

THW

THY

THR

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 55: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

55

DG

DGH

DD

DDW

DDH

DW

DV

DM

DY

DR

NM

NY

NS

PT

PT

PN

PP

PY

PR

PL

PS

FR

FL

BJ

BD

BDH

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 56: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

56

BB

BY

BR

BL

LT

LD

LDH

LP

LB

LV

LM

LY

LL

SHC

SHCH

SHT

SHN

SHW

SHM

SHY

SHR

SHL

SHK

SHKR

SHT

SF

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 57: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

57

SW

SM

SY

SR

SL

SKL

HN

HN

HW

HM

HY

HR

HL

HRRI

GHN

GHY

GHR

NK

NKY

NGKKH

NGKH

NGG

NGGY

NGGH

NGGHY

NGGHR

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 58: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

58

TW

TM

TY

TR

DD

DY

DR

DHY

DHR

NT

NTH

ND

NDY

NDR

NDH

NN

NW

NM

NY

DHN

DHW

DHM

DHY

DHR

NT

NTH

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 59: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

59

ND

NT

NTW

NTY

NTR

NTH

ND

NDY

NDW

NDR

NDH

NDHY

NDHR

NN

NW

VY

VR

VL

MTH

MN

MP

MPR

MF

MB

MV

MVR

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 60: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

60

MM

MY

MR

ML

ZY

RRK

RRKY

LK

LG

SHTY

SHTR

SHTH

SHTHY

SHN

SHP

SHPR

SPHY

SHW

SHM

SK

SKR

ST

STR

SKH

ST

STW

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 61: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

61

STY

STH

STHY

SN

SP

Corpus

Our total Sentences is 185 but we have recognized 50 sentences for short time duration

No Sentence

Fig 72 Unicode to IPA Chart

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 62: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

62

1 শভ সকাল

2 ধনযবাদ

3 আমি আপনাকে কি সাহাযয করতে পারি

4 আমি কিছ তথয জানতে এসেছি

5 কি বলন

6 ভরতি বিষয়ে

7 এএএএ কোন বিভাগে ভরতি হতে ইচছক

8 কমপিউটার বিজঞান এ এএএএএএএএ বিভাগে

9 এ বিভাগে ভরতি চলছে

10 এ বিভাগে কি কি সবিধা আছে

11 বিভিনন ধরনের লযাব সবিধা আছে

12 যেমন

13 দএএ কমপিউটার লযাব আছে

14 ও আচছা

15 একটি শধ বিজঞান বিভাগের জনয

16 আর আরেকটি

17 সব এএএএএএএ জনয

18 পরতি এএএএএএ কতগলো কমপিউটার আছে

19 বতরিশটি করে

20 আর কিছ

21 কতজন শিকষক আছেন এ বিভাগে

22 পরায় এএএএ জন

23 মোট কতজন ছাতরছাতরী এ এএএএএএ

24 পরায় এএএএএ জন

25 এ বিভাগেএ এএএএএএএএ এএএ এএ

26 রমেল এম এস রাহমান পীর

27 বিশব বিদযালয়ের পরতিষঠাতা কে জানতে পারি

28 অবশযই

29 দানবীর মিসটার রাগিব আলী

30 আপনাদের কি আর কোন শাখা আছে

31 না সিলেট এই একমাতর কযামপাস

32 এএএএএএএএএএএএএ কবে সথাপিত হয়েছে

33 এএএ এএএএএ এএ সালে

34 আর কি কি সবিধা আছে

35 হারডওয়যার সারকিট ও রসায়নের লযাব আছে

36 লাইবরেরী কি আছে

37 অবশযই একটা বড় লাইবরেরী আছে

38 কযানটিন আছে

39 খবই উননতমানের একটি কযানটিনও আছে

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 63: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

63

40 ভরতির শেষ তারিখ কবে

41 এ মাসের পাচ তারিখ

42 কলাস শর হবে এ মাসের দশ তারিখ হতে

43 কোন কোন তলা নিয়ে বিশব বিদযালয় কযামপাস

44 তিন চার এএএ পাচ এএএ নিয়ে

45 আপনাদের বএরে কত সেমিসটার

46 তিন সেমিসটার

47 তাহলে তো মোট এএএ সেমিসটার

48 জি হযা

49 এ বিভাগে মোট কত করেডিট পড়ানো হয়

50 এএএএ এএএএএএএ করেডিট

51 ইউ

52

53 কত

54

55 কত

56 উপর

57

58 এক

59

60 আর

61 কত

62 এক

63

64

65 আর

67 ও

68

69 কত

70 এক

71 কত

72

73 রকম

74

75 আর

76 ও

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 64: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

64

77 সব একশত

78

79 উপর

80

81 আর কত

82 এ আর এ

83 এ

84

85 এক

86

87 আর সবসময়

88

89 ও এ

90

91 এ আর এ

92 এ আর এ

93

94 আর কম

95 আর এ

96

97 আর

98

99

100

101 আর

102

103

104

105

106

107 আরও

108

109 সব

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 65: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

65

110

111

112

113 ও সব

114

115 হয়

116

117

118 আর

119 -ই

হয়

120

121 হয়

122

123 ও হয়

124

125 -

126 হয়

127

128

129 এ

হয়

130 সব

131

132

133

134

135

136 ওহ

137

হয়

138 হয়

139 বছর হয়

140

141

142

143

144 হয়

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 66: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

66

145 এখনও

146

147

148

149

150

151

152

153 একর

154

155

156

157

158

159 ভবন

160

161

162

163

164

165 এক বৎসর

166

167

168

169 সময়

170

171 এখন

172 পর

173

174 পর

175

176 পর

177 এক পর

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 67: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

67

178

179 ও

180

181

182

183

184

185

Fig 73 Corpus about University Admission Information

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 68: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

68

CODE OF OUR PROJECT

package sbsBSRtrainingfilescreator

import javaioBufferedWriter

import javaioFile

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilList

public class FileidsCreator

static ListltStringgt dirTreeLevel1= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel2= new ArrayListltStringgt()

static ListltStringgt dirTreeLevel3= new ArrayListltStringgt()

static ListltStringgt tempList = new ArrayListltStringgt()

public static void main(String[] args)

SortArrayList sortObj = new SortArrayList()

int lineCounts = 0

String dirTreeRootName = Ctrainnign_filesbs_asr_train

String root = getRootFromPath(dirTreeRootName)

listdir(dirTreeRootName1)

Collectionssort(dirTreeLevel1)

int sizeOfdirTreeLevel1 = dirTreeLevel1size()

int i = 0j=0k=0

while(sizeOfdirTreeLevel1gti)

String path2 = dirTreeRootName+dirTreeLevel1get(i)

dirTreeLevel2clear()

listdir(path22)

dirTreeLevel2 = sortObjsortList(dirTreeLevel2)

int sizeOfdirTreeLevel2 = dirTreeLevel2size()

String path3=

while(sizeOfdirTreeLevel2gtj)

path3 = path2++dirTreeLevel2get(j)+

listdir(path33)

while(dirTreeLevel3size()gtk)

String targetPath =

root++dirTreeLevel1get(i)++dirTreeLevel2get(j)++dirTreeLevel3get(k)

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 69: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

69

targetPath = targetPathreplaceAll(wav )

writIntoFile(targetPathCtrainnign_filesbs_asr_train+sbs_asr_trainfileids)

lineCounts++

k++

j++

file listing end

j=0

i++

transcriptCreator tcObj = new transcriptCreator()

try

tcObjreadCorpus(Ctrainnign_filecorpustxt)

tcObjCreateTransCriptFile(dirTreeLevel3

Ctrainnign_filesbs_asr_trainsbs_asr_traintranscript)

catch (FileNotFoundException e)

eprintStackTrace()

int si=0

public static void listdir(String pathint Level)

File folder = new File(path)

File[] listOfFiles = folderlistFiles()

int numofL_I = 0

int numOfL = listOfFileslength

while(numOfLgtnumofL_I)

if (listOfFiles[numofL_I]isDirectory())

if(Level == 1)

dirTreeLevel1add(listOfFiles[numofL_I]getName())

else if(Level == 2)

dirTreeLevel2add(listOfFiles[numofL_I]getName())

else

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 70: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

70

if (listOfFiles[numofL_I]isFile())

dirTreeLevel3add(listOfFiles[numofL_I]getName())

numofL_I++

public static String getRootFromPath(String UserDir)

String root = null

int count = 0

int[] indexes = new int[2]

int i = 0

i = indexes[1] = UserDirlastIndexOf()

i=i-1

while(igt0)

if(UserDircharAt(i)==)

indexes[0] = i

break

i--

root = UserDirsubstring(indexes[0]+1 indexes[1])

return root

public static void writIntoFile(String dataString path)

try

FileWriter fstream = new FileWriter(pathtrue)

BufferedWriter out = new BufferedWriter(fstream)

outwrite(data+n)

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 71: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

71

ARRAY

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

package sbsBSRtrainingfilescreator

import javautilArrayList

import javautilArrays

import javautilCollections

import javautilComparator

import javautilList

public class SortArrayList

public ListltStringgt sortList(ListltStringgt unsortList)

ListltStringgt mysortList = new ArrayListltStringgt()

int i = 0

while(unsortListsize()gti)

String str = unsortListget(i)

str = strreplaceAll([^d] )

mysortListadd(str)

i++

int[] sortint = new int[mysortListsize()]

i = 0

while(unsortListsize()gti)

sortint[i] = IntegervalueOf(mysortListget(i))

i++

Arrayssort(sortint)

String folNameWONum = unsortListget(0)replaceAll([^a-z ^A-Z])

mysortListclear()

i = 0

while(unsortListsize()gti)

String requiredString =

folNameWONum+StringvalueOf(sortint[i])

mysortListadd(requiredString)

i++

return mysortList

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 72: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

72

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

TRANSCRIPT CREATOR

package sbsBSRtrainingfilescreator

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileNotFoundException

import javaioFileWriter

import javaioIOException

import javaioInputStreamReader

import javautilArrayList

import javautilList

public class transcriptCreator

static ArrayListltObjectgt inputLines=new ArrayListltObjectgt()

public void readCorpus(String path) throws FileNotFoundException

FileInputStream fstream = new FileInputStream(path)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String strLine

try

while ((strLine = brreadLine()) = null)

inputLinesadd(strLine)

inclose()

catch (Exception e)Catch exception if any

Systemerrprintln(Error + egetMessage())

int inputLineSize = inputLinessize()

int i = 0

while(inputLineSizegti)

i++

public static void CreateTransCriptFile(ListltStringgt dataString path)

try

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 73: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

73

FileWriter fstream = new FileWriter(pathtrue)

int numOfwriting = datasize()

int i = 0

int lineStart = 0

int lineEnd = inputLinessize()

BufferedWriter out = new BufferedWriter(fstream)

while(numOfwritinggti)

if(lineStart == lineEnd) lineStart = 0

String wavRemove = dataget(i)toString()

wavRemove = wavRemovereplaceAll(wav )

String leadTrailspaceRemoved =

inputLinesget(lineStart)toString()

leadTrailspaceRemoved = leadTrailspaceRemovedtrim()

String pattern = ltSgt +leadTrailspaceRemoved+ ltSgt

(+wavRemove+)

outwrite(pattern+n)

lineStart++

i++

Close the output stream

outclose()

catch(IOException e)

eprintStackTrace()

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

FILE OPERATOR

package ptpack

import javaioBufferedReader

import javaioBufferedWriter

import javaioDataInputStream

import javaioFileInputStream

import javaioFileWriter

import javaioInputStreamReader

import javautilArrayList

public class FileOperator

SuppressWarnings(null)

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 74: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

74

public ArrayListltObjectgt getStrings()

ArrayListltObjectgt allInputStrings=new ArrayListltObjectgt()

int aisI = 0

try

FileInputStream fstream = new

FileInputStream(Ctrainnign_filetestInputdic)

DataInputStream in = new DataInputStream(fstream)

BufferedReader br = new BufferedReader(new InputStreamReader(in))

String str

while ((str = brreadLine()) = null)

str = strtrim()

allInputStringsadd(str)

Systemoutprintln(strtrim()+ +strlength())

inclose()

catch (Exception e)

Systemerrprintln(e)

return allInputStrings

public void createFile(String finalData)

try

BufferedWriter out = new BufferedWriter(new

FileWriter(Ctrainnign_filesbs_asr_train4dic))

outwrite(finalData)

outclose()

catch (Exception e)

Systemerrprintln(e)

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PHONETIC TRANSLATION

package ptpack

import javautilArrayList

public class PhoneticTranslation

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 75: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

75

public static void main(String[] args)

PronounciationGenarator pgObj = new PronounciationGenarator()

FileOperator foObj = new FileOperator()

ArrayListltObjectgt inputStrings = new ArrayListltObjectgt()

inputStrings = foObjgetStrings()

String pro =

Systemoutprintln(in phonetic translation)

int i = 0

String is =

String fileImage =

while(inputStringssize()gti)

is = inputStringsget(i)toString()trim()

pro = pgObjgetPronouciation(is)

pro = protrim()

fileImage = fileImage+is+ +pro+n

i++

Systemoutprintln(fileImage)

foObjcreateFile(fileImage)

main end

Prodb

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class Prodb

private static final String DBURL =

jdbcmysqllocalhost3306bsruser=rootamppassword= +

ampuseUnicode=trueampcharacterEncoding=UTF-8

private static final String DBDRIVER = commysqljdbcDriver

static

try

ClassforName(DBDRIVER)newInstance()

catch (Exception e)

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 76: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

76

eprintStackTrace()

private static Connection getConnection()

Connection connection = null

try

connection = DriverManagergetConnection(DBURL)

catch (Exception e)

eprintStackTrace()

return connection

public static void showEmployee()

Connection con = getConnection()

Statement stmt =null

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select from employees

+ where EmployeeID=1001)

if (rsnext())

Systemoutprintln(EmployeeID +

rsgetInt(EmployeeID))

Systemoutprintln(Name + rsgetString(Name))

Systemoutprintln(Office + rsgetString(Office))

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 77: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

77

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

public static boolean isBanjonBorno(char ch)

Connection con = getConnection()

Statement stmt =null

int id=0

boolean isBBorno = false

try

stmt = concreateStatement()

ResultSet rs = stmtexecuteQuery(Select id from banglatab

+ where letter= +ch+)

if (rsnext())

id = rsgetInt(id)

if(idgt=13 ampamp idlt=48)

isBBorno = true

else

Systemoutprintln(No Specified Record)

rsclose()

catch(SQLException ex)

Systemerrprintln(SQLException + exgetMessage())

finally

if (stmt = null)

try

stmtclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

if (con = null)

try

conclose()

catch (SQLException e)

Systemerrprintln(SQLException + egetMessage())

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 78: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

78

return isBBorno

class end

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

PRONOUNCIATION GENARATOR

package ptpack

import javasqlConnection

import javasqlDriverManager

import javasqlResultSet

import javasqlSQLException

import javasqlStatement

public class PronounciationGenarator

Prodb pdobj = new Prodb()

String BanglaWord =

int BanglaWordLength = 0

int ConjunctsPosition[] = new int[20]

int NoConjunctsPosition[] = new int[20]

int cpi=0ncpi=0k=0

char ConjuctsIdentifyCharacter =

int calltimes = 1

public String getPronouciation(String bw)

BanglaWord = bw

BanglaWordLength = BanglaWordlength()

while(kltBanglaWordLength)

k++

k=0

while(BanglaWordLengthgtk)

if(BanglaWordcharAt(k)==ConjuctsIdentifyCharacter)

ConjunctsPosition[cpi] = k-1

cpi++

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 79: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

79

ConjunctsPosition[cpi] = k

cpi++

ConjunctsPosition[cpi] = k+1

cpi++

k++

int NoOfConjuctsPosition = cpi

k = 0

Systemoutprintln(nConjunctsPosition)

while(cpigtk)

Systemoutprint(ConjunctsPosition[k]+ )

k++

int wc[] = 012456

int i=0trace=0

cpi = 0

ncpi = 0

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0)

NoConjunctsPosition[ncpi]=i

ncpi++

cpi=0

i++

trace = 0

i=0

while(ncpigti)

i++

matching serially and making conjuct

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 80: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

80

i = 0

cpi = 0

ncpi = 0

int ai = 0

String SearchStrings[] = new String[100]

int ssi = 0

int serialityTrace =0

String tempString =

boolean tempctempc2

while(BanglaWordLengthgti)

while(NoOfConjuctsPositiongtcpi)

if(ConjunctsPosition[cpi]==i)

trace = 1

break

cpi++

if (trace==0) if nonconjuct

SearchStrings[ssi] = CharactertoString(BanglaWordcharAt(i))

ssi++

if(BanglaWordLength=(i+1))

calltimes++

tempc = pdobjisBanjonBorno(BanglaWordcharAt(i))

tempc2 = pdobjisBanjonBorno(BanglaWordcharAt(i+1))

if(tempc==true ampamp tempc2==true)

SearchStrings[ssi] = CharactertoString(অ)

ssi++

else if conjuct

tempString = CharactertoString(BanglaWordcharAt(i))

if(BanglaWordcharAt(i)==র)

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

i+=2

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 81: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

81

SearchStrings[ssi] =

CharactertoString(BanglaWordcharAt(i))

ssi++

else

while(NoOfConjuctsPositiongtserialityTrace)

if(ConjunctsPosition[serialityTrace]==i)

break

serialityTrace++

int diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

Systemoutprintln(BanglaWord = +BanglaWord+

+BanglaWordlength())

while(diffbilt=1)

Systemoutprintln(diffbi = +diffbi+ + serialityTrace

+serialityTrace)

i++

Systemoutprintln(i = +i+ BanglaWordcharAt(i)

+BanglaWordcharAt(i))

tempString += CharactertoString(BanglaWordcharAt(i))

serialityTrace++

diffbi = Mathabs(ConjunctsPosition[serialityTrace]-

ConjunctsPosition[serialityTrace+1])

SearchStrings[ssi] = tempString

ssi++

conjuct adding end

cpi=0

i++

trace = 0

i=0

while(ssigti)

i++

String phoneticTrans =

Connection conn = null

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 82: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

82

Statement stmt = null

ResultSet rs = null

try

ClassforName(commysqljdbcDriver)newInstance()

String connectionUrl =

jdbcmysqllocalhost3306bsruseUnicode=yesampcharacterEncoding=UTF-8

String connectionUser = root

String connectionPassword =

conn = DriverManagergetConnection(connectionUrl connectionUser

connectionPassword)

stmt = conncreateStatement()

i=0

while (ssigti)

rs = stmtexecuteQuery(SELECT pro FROM banglatab where

letter = +SearchStrings[i]+)

rsnext()

String pro = rsgetString(pro)

phoneticTrans = phoneticTrans+pro+

i++

rsclose()

catch (Exception e)

eprintStackTrace()

finally

try if (rs = null) rsclose() catch (SQLException e)

eprintStackTrace()

try if (stmt = null) stmtclose() catch (SQLException e)

eprintStackTrace()

try if (conn = null) connclose() catch (SQLException e)

eprintStackTrace()

return phoneticTrans

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

SBSASR_MAIN

package SBSBSRS50

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 83: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

83

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args)

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

else

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 84: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

84

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n +

n +

n +

nn)

Sbsbsr Transcriber

package SBSBSRS50

import educmusphinxfrontendutilAudioFileDataSource

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

import javaxsoundsampledUnsupportedAudioFileException

import javaawtAWTException

import javaioFile

import javaioIOException

import javanetURL

public class Transcriber

public static void main(String[] args) throws IOException

UnsupportedAudioFileException AWTException

URL audioURL

if (argslength gt 0)

audioURL = new File(args[0])toURI()toURL()

else

audioURL = TranscriberclassgetResource(sanjoy_falguni_dola_s20wav)

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 85: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

85

URL configURL = TranscriberclassgetResource(sbsbsr_transcriber_configxml)

ConfigurationManager cm = new ConfigurationManager(configURL)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

allocate the resource necessary for the recognizer

recognizerallocate()

configure the audio input for the recognizer

AudioFileDataSource dataSource = (AudioFileDataSource)

cmlookup(audioFileDataSource)

dataSourcesetAudioFile(audioURL null)

Loop until last utterance in the audio file has been decoded in which case the

recognizer will return null

Result result

while ((result = recognizerrecognize())= null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(resultText)

Desktop Command Application

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaawteventInputEvent

import javaawteventKeyEvent

import javaioIOException

public class CommandActivator

public void leftClick() throws AWTException

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 86: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

86

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void rightClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON3_MASK)

robotmouseRelease(InputEventBUTTON3_MASK)

public void doubleClick() throws AWTException

Robot robot = new Robot()

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

robotmousePress(InputEventBUTTON1_MASK)

robotmouseRelease(InputEventBUTTON1_MASK)

public void copy() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_C)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_C)

public void paste() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_V)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_V)

public void delete() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_DELETE)

robotkeyRelease(KeyEventVK_DELETE)

public void selectAll() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_A)

robotkeyRelease(KeyEventVK_CONTROL)

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 87: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

87

robotkeyRelease(KeyEventVK_A)

public void up() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void down() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void previousPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_UP)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_UP)

public void nextPage() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_PAGE_DOWN)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_PAGE_DOWN)

public void openNewFile() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_N)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_N)

public void openHere() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_O)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_O)

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 88: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

88

public void close() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_F4)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_F4)

public void startMenu() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_WINDOWS)

public void refresh() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F5)

robotkeyRelease(KeyEventVK_F5)

public void help() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_F1)

robotkeyRelease(KeyEventVK_F1)

public void showDesktop() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_D)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_D)

public void openMyComputer() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_WINDOWS)

robotkeyPress(KeyEventVK_E)

robotkeyRelease(KeyEventVK_WINDOWS)

robotkeyRelease(KeyEventVK_E)

sokrio

public void enter() throws AWTException

Robot robot = new Robot()

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 89: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

89

robotkeyPress(KeyEventVK_ENTER)

robotkeyRelease(KeyEventVK_ENTER)

porer window | ager window

public void altTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_ALT)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_ALT)

robotkeyRelease(KeyEventVK_TAB)

porer tab | ager tab

public void ctlTab() throws AWTException

Robot robot = new Robot()

robotkeyPress(KeyEventVK_CONTROL)

robotkeyPress(KeyEventVK_TAB)

robotkeyRelease(KeyEventVK_CONTROL)

robotkeyRelease(KeyEventVK_TAB)

public void openNotepad() throws AWTException IOException

ProcessBuilder proc=new ProcessBuilder(notepadexe)

Process p=procstart()

public void openBrowser() throws AWTException IOException

String theUrl = httpwwwgooglecom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openFacebook() throws AWTException IOException

String theUrl = httpwwwfacebookcom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openYahoo() throws AWTException IOException

String theUrl = httpwwwyahoocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 90: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

90

public void openTechtunes() throws AWTException IOException

String theUrl = httpwwwtechtunescombd

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

public void openProthomAlo() throws AWTException IOException

String theUrl = httpwwwprothom-alocom

RuntimegetRuntime()exec

(rundll32 urldllFileProtocolHandler + theUrl)

SBSBSRJAVA

package SBSBSRCMDAPPS12

import javaawtAWTException

import javaawtRobot

import javaioBufferedWriter

import javaioFileWriter

import javaioIOException

import orgomgCORBAportableInputStream

import orgomgCORBAportableOutputStream

import educmusphinxfrontendutilMicrophone

import educmusphinxrecognizerRecognizer

import educmusphinxresultResult

import educmusphinxutilpropsConfigurationManager

public class SBSBSR

public static void main(String[] args) throws IOException AWTException

ConfigurationManager cm

if (argslength gt 0)

cm = new ConfigurationManager(args[0])

else

cm = new

ConfigurationManager(SBSBSRclassgetResource(sbsbsrconfigxml))

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 91: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

91

allocate the recognizer

Systemoutprintln(Loading)

Recognizer recognizer = (Recognizer) cmlookup(recognizer)

recognizerallocate()

start the microphone or exit the program if this is not possible

Microphone microphone = (Microphone) cmlookup(microphone)

if (microphonestartRecording())

Systemoutprintln(Cannot start microphone)

recognizerdeallocate()

Systemexit(1)

Robot robot = new Robot()

robotdelay(2000)

giveCommand(bangla command)

robotdelay(3000)

printInstructions()

giveCommand()

loop the recognition until the programm exits

String comString =

Systemoutprintln(comString + comString + length

+comStringlength()+n)

while (true)

Systemoutprintln(Start speaking Press Ctrl-C to quitn)

Result result = recognizerrecognize()

if (result = null)

String resultText = resultgetBestResultNoFiller()

Systemoutprintln(You said + resultText +n)

giveCommand(resultText)

CommandActivator obj = new CommandActivator()

objopenMyComputer()

else

Systemoutprintln(I cant hear what you saidn)

Prints out what to say for this demo

private static void printInstructions()

Systemoutprintln(Sample sentencesn +

n )

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

Page 92: Thesis Paper of my Bachelor Degree

Bengali Speech Recognition

92

private static void giveCommand(String CompareText) throws AWTException

IOException

if(CompareTextequals( ))

CommandActivator obj = new CommandActivator()

objrightClick()

------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------