Final Neww

7/31/2019 Final Neww

1/37

CHAPTER 1

INTRODUCTION

There has been a tremendous rise wireless communication in the form of mobile users in

world(> 2 billion) in last two decades .Due to widespread growth of cellular networks and drastic

reduction in call rates and lower end mobile handsets, mobile usage has percolated all sections of

society. Any mobile having messaging facility and capability to support common AT commands

can be used in this system .Nokia model 6610 is chosen because it supports AT commands .Most

mobile manufacturer like Siemens, Motorola, LG, Samsung etc also provides AT command

capability.

SMS is store and forward way of transmitting message to and from mobiles.Each short

nmessahe shouldnt be larger than 160 characters (text/binary).Since SMS uses signaling channels

instead of dedicated data channels for its transmission and reception ,these messages can be

sent/received simultaneously with voice/fax /data services over GSM network .The major

advantage of using SMS is provision of intimation to the sender when SMS is delivered at the

destination and ability of SMSC to continue efforts to delivery of message for the specified

validity period if network is presently busy or the user is outside the coverage area.

A system is developed for remote controlling of various electrical devices using mobile

through spoken commands .The system offers several attractive features like

Control from anywhere in the world if cellular coverage is available

Acknowledgement about execution of command from system to user

Uses spoken commands from users for control

Ease of implementation and cost effectiveness approach

1.1 Overview

On the user side, microphone is used to translate the voice signal to electrical signal. The

microphone is connected to HM2007

1


2/37

In this approach, predetermined phrases of words are selected for various commands. The Mel

cepstrum features are extracted from the spoken words for recognition. Mel cepstrum exploits

auditory principles as well as discriminating property of the cepstrum and is proven to be one of

the most successful feature representations in speech related recognition tasks [1, 2]. The spoken

words are isolated and recognized after extraction of features. Learning Vector Quantization

Neural Network is used for recognition of various words used in the command. A text message is

generated if all spoken words are identified as per specified format. This message is transmitted in

form of SMS to control system mobile using AT commands [3].

On control side, system mobile is connected to AVR micro-controller based system

through RS-232C cable. Process block consists of 8 digital output ports, 8 digital input ports and

one analog input port. The configuration of number of inputs, output and analog input ports can be

varied as per the needs of the applications. Presently, LEDs are used to indicate status of output

digital ports, dip switches to change the status of input digital ports, and potential divider provided

to vary analog input voltage.

1.2 AT COMMANDS

Now-a-days extensive list of mobile related AT commands are available for carrying out various

activities like sending SMS, using GPRS services, sending fax, controlling speaker volume, battery

status indication, etc [4-6]. AT commands require sending of text strings A, T, along with

specified command strings through serial port to mobile and are executed on receipt of carriage

return. The result codes are sent by mobile to Terminal Equipment (TE) to indicate the response

after execution of command. The text message is sent to mobile using CMGS commands. CNMI

command is used to indicate to TE about the receipt of incoming SMS message from the network.

On receipt of the SMS message, text words are checked with predetermined format, which

includes password, desired device ON/OFF commands or status

query. After interpretation of valid control message, microcontroller carries out the specified tasks

and then sends SMS to pre-specified mobile number as acknowledgement of fulfillment of

command or reporting of error during execution of command. There are varieties of commands

available at our disposal like directly storing various predefined messages in phone memory,

sending messages at appropriate time by calling the relevant message number depending on

present conditions, storing incoming SMS in phone memory, deleting the message after execution

2


3/37

of command, etc. But it was decided to discard these features to ensure easy adaptation to any

mobile model having limited AT commands interpretation capability. So in our case, any

incoming SMS message is directly routed to microcontroller (TE) and any outgoing text message

is directly sent by micro-controller to designated mobile number without being stored in control

system mobile phone memory.

3


4/37

CHAPTER 2

SPEECH RECOGNITION

2.1 Design Introduction

As globalization, networking, information and digital eras coming, the demand of high

reliability of our identity verification is growing .An efficient mean to this is by authenticating

users through biometric methods. Among the existing biometric methods, voice biometrics can be

an affordable and accurate authentication technology that has been already successfully and widely

employed. Voiceprint, as a basic human physiological characteristics, possess a unique role which

is difficult to counterfeit, imitate and replace.As a non-contact identification technology, Voice

Recognition Technology is being accepted by the users.

Voice authentication refers to the process of accepting or rejecting the identity claim

of a speaker on the basis of individual information present in the speech waveform . It

has received increasing attention over the past two decades, as a convenient, user-friendly way of

replacing (or supplementing) standard password-type matching.The authentication procedure

requests from the user to pronounce a random sequence of digits. After capturing speech and

extracting voice features, individual voice characteritics are generated by registration algorithm.

The central process unit decides whether the received features match the stored voiceprint of the

customer who claims to be, and accordingly grants authentication.

2.2 Voice Recognition Technology Principle

Voice Recognition, also known as the Speaker Recognition, has two categories:

speaker identification and speaker verification. Speaker identification is used to determine which

one of the people speaks, i.e. "one out of more election" and speaker verification is used to

determine whether a person specified speaks, i.e. "one-on-one recognition".

According to the voice of different materials, voice recognition can be divided into the

text-dependent, and text-independent technology. The text-dependent voice recognition system

requires speaker pronounce in accordance with the contents of the text. Each person's individual

sound profile model is established accurately. People must also be identified by the contents of the

text during recognition to achieve better effect. Text-independent recognition system does not

4


5/37

require fixed contents of words, which is relatively difficult to model, but is convenient for user

and can be applied to a wide range.Voiceprint recognition is an application based on physiological

and behavioral characteristics of the speakers voice and linguistic patterns. Different from speech

recognition, voiceprint recognition is regardless of contents of speech.Rather, the unique features

of voice are analyzed to identify the speaker. With voice samples, the unique features will be

extracted and converted to digital symbols, and then these symbols are stored as that person's

character template. This template is stored in a computer database, a smart card or bar-coded cards.

User authentication is processed inside the recognition system to identify matching or not.

2.3 Classification of Speech Recognition System

Speech Recognition system, according to different points of view and the scope of different

applications, has different performance requirements of the design. Their implementations are thefollowing types:

Isolated words, conjunctions, continuous speech recognition, and speech

understanding of the conversation systems

Large vocabulary and small vocabulary system

Specific and non specific speech recognition system

CHAPTER 35


6/37

DYNAMIC TIME WARPING

3.1Principle

A distance measurement between time series is needed to determine similarity between time

series and for time series classification. Euclidean distance is an efficient distance

measurement that can be used. The Euclidian distance between two time series is simply the

sum of the squared distances from each nth point in one time series to the nth point in the

other. The main disadvantage of using Euclidean distance for time series data is that its results

are very unintuitive. If two time series are identical, but one is shifted slightly along the time

axis, then Euclidean distance may consider them to be very different from each other.

Dynamic time warping (DTW) was introduced to overcome this limitation and give intuitive

distance measurements between time series by ignoring both global and local shifts in the time

dimension.

Problem Formulation. The dynamic time warping problem is stated as follows:

Given two time series X, and Y, of lengths |X| and |Y|,

construct a warp path W

where K is the length of the warp path and the kth element of the warp path is

where i is an index from time series X, and j is an index from time series Y. The warp path must

start at the beginning of each time series at w1 = (1, 1) and finish at the end of both time series at

wK= (|X|, |Y|). This ensures that every index of both time series is used in the warp path. There is

6


7/37

also a constraint on the warp path that forces i and j to be monotonically increasing in the warp

path, which is why the lines representing the warp path in Figure 1 do not overlap. Every index of

each time series must be used. Stated more formally:

The optimal warp path is the warp path is the minimum-distance warp path, where the

distance of a warp path W is

Dist(W) is the distance (typically Euclidean distance) of warp path W, and Dist(wki,

wkj) is the distance between the two data point indexes (one from X and one from Y)in the kth element of the warp path.

3.2 Mel-Frequency Cepstral Coefficients

The Mel-cepstrum exploits auditory principles as well as decorrelating property of the

cepstrum. In MFCC implementation, triangular filters are used. These filters follow Mel scale

whereby band edges and corner frequencies are linear for low frequencies (


8/37

Finally discrete cosine transform (DCT) of filter bank coefficients is taken to get MFCC as under

where log{Emel(l)} is log filter bank energies & Cmel(k) is the kth MFCC and N is number of

filters.It has been observed that performance is reasonably well for 24 filters in MFCC

implementation [14]. If there are n frames in a word and 12 MFCCs are computed for each frame,

we get feature vector of length 12

n. However, the number of frames (n) varies from word toword, which in turn changes the length of feature vector. In order to obtain feature vector of

constant length, n values of each Mel Frequency Cepstral Coefficient are converted into 10 values

using resampling technique. Thus for each word, constant length feature vector of 120 (12 10)

elements is obtained. Principal Component Analysis (PCA) is carried out on the MFCC data thus

obtained. PCA transforms the input data so that the elements of the input vectors are uncorrelated.

3.3 LVQ Classifier:

The LVQ is an algorithm for learning classifiers from labeled data samples. It models the

discrimination function defined by the set of labeled codebook vectors and the nearest

neighborhood search between the codebook and data. In classification, a data point xi is assigned

to a class according to the class label of the closest codebook vector. The training algorithm

involves an iterative gradient update of the winner unit [15, 16].

The winner unit wc is defined by

c = arg min || xi wk ||

kThe update equation for the winner unit wc defined by the nearest neighbor and a data sample x(t)

is

wc(t+1) = wc(t) alpha(t) [x(t) wc(t)]

where sign depends on whether the data sample is correctly classified (+) or misclassified (-) and

alpha(t) is learning rule and must decrease monotonically in time.8


9/37

CHAPTER 4

Hardware Implementations

4.1 Transmitter Part

9


10/37

On the transmitter side, microphone is used to translate the voice signal to electrical signal. The

microphone is connected to HM2007 , a voice recognition module.

In this approach, predetermined phrases of words are selected for various commands. The Mel

cepstrum features are extracted from the spoken words for recognition. Mel cepstrum exploits

auditory principles as well as discriminating property of the cepstrum and is proven to be one of

the most successful feature representations in speech related recognition tasks [1, 2]. The spoken

words are isolated and recognized after extraction of features. Learning Vector Quantization

Neural Network is used for recognition of various words used in the command. A text message is

generated if all spoken words are identified as per specified format. This message is transmitted in

form of SMS to control system mobile using AT commands.

Figure: BLOCK DIAGRAM OF TRANSMITTER

4.1.1 NEURAL NETWORK FOR SPEECH RECOGNITION:

4.1.1.1Overview of Neural Networks:

10


11/37

Artificial neural networks are computers whose architecture is modeled

after the brain. They typically consist of many hundreds of simple processing units

which are wired together in a complex communication network. Each unit

or node is a simplified model of a real neuron which fires (sends off a new signal) if

it receives a sufficiently strong input signal from the other nodes to which it is

connected. The strength of these connections may be varied in order for the

network to perform different tasks corresponding to different patterns of node firing

activity. This structure is very different from traditional computers.

4.1.1.2 Fundamentals of Neural Network:

There are many different types of neural networks, but they all have four

basic attributes:

A set of processing units;

A set of connections;

A computing procedure

A training procedure.

4.1.1.3Artificial neural networks from the viewpoint of speech recognition:

Artificial neural networks (ANNs) are systems consisting of

interconnected computational nodes working somewhat similarly to human neurons. Neural

networks can be used e.g. to approximate functions or classify data into similar classes than can be

e.g. phonemes, sub-phoneme units, syllables or words in the speech recognition domain. The

ability to learn by adapting strengths of inter-neuron connections (synapses) is a fundamental

property of artificial neural networks. Speech recognition has been another proving ground for

neural networks.

Researchers quickly achieved excellent results in such basic tasks as voiced/unvoiced

discrimination (Watrous 1988), phoneme recognition (Waibel et al, 1989), and spoken digit

recognition (Franzini et al, 1989). However, in 1990, when this thesis was proposed, it still

remained to be seen whether neural networks could support a large vocabulary, speaker

independent, continuous speech recognition system. Of the two types of variability in speech

11


12/37

acoustic and temporal the former is more naturally posed as a static pattern matching problem

that is amenable to neural networks; therefore we use neural networks for acoustic modeling, while

we rely on conventional Hidden Markov Models for temporal modeling.

4.1.2 HM2007 FUNCTIONING:

The HM2007 is a CMOS voice recognition LSI (Large Scale Integration) circuit. The chip

contains an analog front end, voice analysis, regulation, and system control functions. The chip

may be used in a stand alone or CPU connected.

Pin configuration of HM2007

The functioning of the HM2007 IC involves the following steps:

4.1.2.1 Speech Acquisition:

12


13/37

We can easily implement speech acquisition with the HM 2007 ic. During

speech acquisition, speech samples are obtained from the speaker in real time and stored in

memory for preprocessing. Speech acquisition requires a microphone coupled with an analog-to-

digital converter (ADC) that has the proper amplification to receive the voice speech signal,

sample it, and convert it into digital speech. The system sends the analog speech through a

transducer, amplifies it, sends it through an ADC. The received samples are stored into memory on

a RAM. The microphone input port with the audio codec receives the signal, amplifies it, and

converts it into 8-bit PCM digital samples at a sampling rate of 3.57MHZ. The HM 2007 IC

requires initial configuration or training of words, which is performed using a programming board.

In the training process user trains the IC by speaking words into the microphone and assigning a

particular value for that word. For example a world hello can be assigned a value 02or 05. This

can then be later connected to a microcontroller for further functions.

4.1.2.2 Speech Preprocessing:

Preprocessing reduces the amount of processing required in later

stages. Generally, preprocessing involves taking the speech samples as input, blocking the samples

into frames, and returning a unique pattern for each sample, as described in the following steps.

1. The system must identify useful or significant samples from the speech signal. To accomplish

this goal, the system divides the speech samples into overlapped frames.

2. The system checks the frames for voice activity using endpoint detection and energy threshold

calculations.

3. The speech samples are passed through a pre-emphasis filter.

4.1.2.3 Training the IC:

An important part of speech-to-text conversion using pattern

recognition is training. Training involves creating a pattern representative of the features of a class

using one or more test patterns that correspond to speech sounds of the same class. A model

commonly used for speech recognition is the HMM, which is a statistical model used for modeling

13


14/37

an unknown system using an observed output sequence. The keypad and digital display are used to

communicate with and program the HM2007 chip.

The keypad is made up of 12 normally open momentary contact switches. When the circuit

is turned on, 00 is on the digital display, the red LED (READY) is lit and the circuit waits for a

command.

4.2 Receiver Part

On control side i.e. at the receiver side, system mobile is connected to AVR micro-controller

based system through RS-232C cable.Process block consists of 8 digital output ports, 8 digital input ports

and one analog input port. The configuration of number of inputs, output and analog input ports can be

varied as per the needs of the applications.

Figure: BLOCK DIAGRAM OF THE RECEIVER SECTION

4.2.1 Overview of Serial Communication

Computers can transfer data in two ways: parallel and serial. In parallel data transfers, often 8 or more

lines (wire conductors) are used to transfer data to a device that is only a few feet away. Examples of parallel

data transfer are printers and hard disks; each uses cables with many wire strips. Although in such cases a lot of

14


15/37

data can be transferred in a short amount of time by using many wires in parallel, the distance cannot be great.

To transfer to a device located many meters away, the serial method is used. In serial communication, the data

is sent one bit at a time, in contrast to parallel communication, in which the data is sent a byte or more at a time.

Serial communication of the 89s52 and the peripheral is the topic of this chapter.

If data is to be transferred on the telephone line, it must be converted from 0s and 1s to audio tones,

which are sinusoidal-shaped signals. A peripheral device called a modem, which stands for

modulator/demodulator, performs this conversion.

Serial data communication uses two methods, asynchronous and synchronous. The synchronous

method transfers a block of data at a time, while the asynchronous method transfers a single byte at a time.

In data transmission if the data can be transmitted and received, it is a duplex transmission. This is in

contrast to simplex transmissions such as with printers, in which the computer only sends data. Duplex

transmissions can be half or full duplex, depending on whether or not the data transfer can be simultaneous. If

data is transmitted one way at a time, it is referred to as half duplex. If the data can go both ways at the same

time, it is full duplex. Of course, full duplex requires two wire conductors for the data lines, one for

transmission and one for reception, in order to transfer and receive data simultaneously.

Asynchronous serial communication and data framing

The data coming in at the receiving end of the data line in a serial data transfer is all 0s and 1s; it is

difficult to make sense of the data unless the sender and receiver agree on a set of rules, a protocol, on how the

data is packed, how many bits constitute a character, and when the data begins and ends.

Start and stop bits

Asynchronous serial data communication is widely used for character-oriented transmissions, while

block-oriented data transfers use the synchronous method. In the asynchronous method, each character is

placed between start and stop bits. This is called framing. In the data framing for asynchronous

communications, the data, such as ASCII characters, are packed between a start bit and a stop bit. The start bit

is always one bit, but the stop bit can be one or two bits. The start bit is always a 0 (low) and the stop bit (s) is 1

(high).

Data transfer rate

The rate of data transfer in serial data communication is stated in bps (bits per second). Another

widely used terminology for bps is baud rate. However, the baud and bps rates are not necessarily equal. This

15


16/37

is due to the fact that baud rate is the modem terminology and is defined as the number of signal changes per

second. In modems a single change of signal, sometimes transfers several bits of data. As far as the conductor

wire is concerned, the baud rate and bps are the same, and for this reason we use the bps and baud

interchangeably.

The data transfer rate of given computer system depends on communication ports incorporated into

that system. For example, the early IBMPC/XT could transfer data at the rate of 100 to 9600 bps. In recent

years, however, Pentium based PCS transfer data at rates as high as 56K bps. It must be noted that in

asynchronous serial data communication, the baud rate is generally limited to 100,000bps.

Computers can transfer data in two ways: parallel and serial. In parallel data transfers, often 8 or more

lines (wire conductors) are used to transfer data to a device that is only a few feet away. Examples of parallel

transfers are printers and hard disks; each uses cables with many wire strips. Although in such cases a lot of

data can be transferred in a short amount of time by using many wires in parallel, the distance cannot be great.

To transfer to a device located many meters away, the serial method is used. In serial communication, the data

is sent one bit at a time, in contrast to parallel communication, in which the data is sent a byte or more at a time.

The 8051 has serial communication capability built into it, there by making possible fast data transfer using

only a few wires. The PC uses RS 232 as a Serial Communication Standard.

4.2.2 RS232 Standards

To allow compatibility among data communication equipment made by various manufacturers, an

interfacing standard called RS232 was set by the Electronics Industries Association (EIA) in 1960. In 1963 it

was modified and called RS232A. RS232B AND RS232C were issued in 1965 and 1969, respectively. Today,

RS232 is the most widely used serial I/O interfacing standard. This standard is used in PCs and numerous types

of equipment. However, since the standard was set long before the advert of the TTL logic family, its input and

output voltage levels are not TTL compatible. In RS232, a 1 is represented by -3 to -25V, while a 0 bit is +3 to

+25V, making -3 to +3 undefined. For this reason, to connect any RS232 to a microcontroller system we must

use voltage converters such as MAX232 to convert the TTL logic levels to the RS232 voltage levels, and vice

versa. MAX232 IC chips are commonly referred to as line drivers.

RS232 pins

16


17/37

RS232 cable connector commonly referred to as the DB-25 connector. In labeling, DB-25P

refers to the plug connector (male) and DB-25S is for the socket connector (female). Since not all

the pins are used in PC cables, IBM introduced the DB-9 Version of the serial I/O standard,

which uses 9 pins only, as shown in table.

DB-9 pin connector

1 2 3 4 5

6 7 8 9

(Out of computer and exposed end of cable)

Pin Functions:

Pin Description

1 Data carrier detect (DCD)2 Received data (RXD)

3 Transmitted data (TXD)

4 Data terminal ready(DTR)

5 Signal ground (GND)

6 Data set ready (DSR)

7 Request to send (RTS)

8 Clear to send (CTS)

9 Ring indicator (RI)

Note: DCD, DSR, RTS and CTS are active low pins.

The method used by RS-232 for communication allows for a simple connection of three linesnamely Tx, Rx, and Ground.

TXD: carries data from DTE to the DCE.

RXD: carries data from DCE to the DTE

SG: signal ground

4.2.3 8051 connection to RS232:

17


18/37

Embedded

Controller

RXD

TXD

TXD

RXD2

3

5

GND

MAX 232

The RS232 standard is not TTL compatible; therefore, it requires a Line Driver such as the MAX232

chip to convert RS232 voltage levels to TTL levels, and vice versa.

The 8051 has two pins that are used specifically for transferring and receiving data serially. These two

pins are TXD and RXD and are a part of the port 3 (P3.0 and P3.1). Pin 11 of the 8051 is designated as TXD

and pin 10 as RXD. These pins are TTL compatible; therefore, they require a line driver to make them RS232

compatible. One such line driver is the MAX232 chip.

MAX232 converts from RS232 voltage levels to TTL voltage levels, and vice versa. One advantage of

the MAX232 chip is that it uses a +5V power source which, is the same as the source voltage for the 8051. In

the other words, with a single +5V power supply we can power both the 8051 and MAX232, with no need for

the power supplies. The MAX232 has two sets of line drivers for transferring and receiving data. The line

drivers used for TXD are called T1 and T2, while the line drivers for RXD are designated as R1 and R2. In

many applications only one of each is used.

4.2.4 MAX-232

Logic Signal Voltage

18


19/37

Serial RS-232 (V.24) communication works with voltages (between -15V ... -3V and used to transmit a binary

'1' and +3V ... +15V to transmit a binary '0') which are not compatible with today's computer logic voltages. On

the other hand, classic TTL computer logic operates between 0V ... +5V (roughly 0V ... +0.8V referred to as

low for binary '0', +2V ... +5V for high binary '1' ). Modern low-power logic operates in the range of 0V ...

+3.3V or even lower.

So, the maximum RS-232 signal levels are far too high for today's computer logic electronics, and the

negative RS-232 voltage can't be grokked at all by the computer logic. Therefore, to receive serial data from an

RS-232 interface the voltage has to be reduced, and the 0 and 1 voltage levels inverted. In the other direction

(sending data from some logic over RS-232) the low logic voltage has to be "bumped up", and a negative

voltage has to be generated, too.

RS-232 TTL Logic

--------------------------------------------------------

-15V ... -3V +2V ... +5V 1

+3V ... +15V 0V ... +0.8V 0

All this can be done with conventional analog electronics, e.g. a particular power supply and a couple of

transistorsor the once popular 1488 (transmitter) and 1489 (receiver) ICs. However, since more than a decade it

has become standard in amateur electronics to do the necessary signal level conversion with an integrated

circuit (IC) from the MAX232 family (typically a MAX232A or some clone). In fact, it is hard to find some RS-

232 circuitry in amateur electronics without a MAX232A or some clone.

The MAX232 & MAX232A

19
http://en.wikipedia.org/wiki/transistorhttp://en.wikipedia.org/wiki/transistorhttp://en.wikipedia.org/wiki/transistor


20/37

Figure: A MAX232 integrated circuit

The MAX232 fromMaximwas the first IC which in one package contains the necessary drivers (two)

and receivers (also two), to adapt the RS-232 signal voltage levels to TTL logic. It became popular, because it

just needs one voltage (+5V) and generates the necessary RS-232 voltage levels (approx. -10V and +10V)

internally. This greatly simplified the design of circuitry. Circuitry designers no longer need to design and build

a power supply with three voltages (e.g. -12V, +5V, and +12V), but could just provide one +5V power supply,

e.g. with the help of a simple 78x05 voltage converter.

The MAX232 has a successor, the MAX232A. The ICs are almost identical, however, the MAX232A is

much more often used than the original MAX232, and the MAX232A only needs external capacitors 1/10th the

capacity of what the original MAX232 needs.

It should be noted that the MAX232 (A) is just a driver/receiver. It does not generate the necessary RS-

232 sequence of marks and spaces with the right timing, it does not decode the RS-232 signal, it does not

provide a serial/parallel conversion. All it does is to convert signal voltage levels. Generating serial data with

the right timing and decoding serial data has to be done by additional circuitry, e.g. by a 16550 UARTor one of

these small micro controllers (e.g.Atmel AVR,Microchip PIC) getting more and more popular.

The MAX232 and MAX232A were once rather expensive ICs, but today they are cheap. It has also helped

that many companies now produce clones (ie. Sipex). These clones sometimes need different external circuitry,

e.g. the capacities of the external capacitors vary. It is recommended to check the data sheet of the particular

manufacturer of an IC instead of relying on Maxim's original data sheet.

20
http://www.maxim-ic.com/http://www.maxim-ic.com/http://www.maxim-ic.com/http://en.wikibooks.org/wiki/Serial_Programming:8250_UART_Programminghttp://en.wikibooks.org/wiki/Serial_Programming:8250_UART_Programminghttp://en.wikibooks.org/wiki/Atmel_AVRhttp://en.wikibooks.org/wiki/Atmel_AVRhttp://en.wikibooks.org/wiki/Embedded_Systems/PIC_Microcontrollerhttp://en.wikibooks.org/wiki/Embedded_Systems/PIC_Microcontrollerhttp://www.sipex.com/products/interface.htmhttp://www.maxim-ic.com/http://en.wikibooks.org/wiki/Serial_Programming:8250_UART_Programminghttp://en.wikibooks.org/wiki/Atmel_AVRhttp://en.wikibooks.org/wiki/Embedded_Systems/PIC_Microcontrollerhttp://www.sipex.com/products/interface.htm


21/37

The original manufacturer (and now some clone manufacturers, too) offers a large series of similar ICs, with

different numbers of receivers and drivers, voltages, built-in or external capacitors, etc. E.g. The MAX232 and

MAX232A need external capacitors for the internal voltage pump, while the MAX233 has these capacitors

built-in. The MAX233 is also between three and ten times more expensive in electronic shops than the

MAX232A because of its internal capacitors. It is also more difficult to get the MAX233 than the garden

variety MAX232A.

A similar IC, the MAX3232 is nowadays available for low-power 3V logic.

MAX232(A) DIP Package

No. Name Purpose Signal VoltageCapacitor

MAX232Capacitor MAX232A

1 C1++ connector for

capacitor C1

capacitor should stand at least

16V1F 100Nf

2 V+ output of voltage pump+10V, capacitor should stand

at least 16V1F to VCC 100nF to VCC

3 C1-- connector for capacitor

C1


16V1F 100nF

21


22/37

4 C2++ connector for

capacitor C2


16V1F 100nF

5 C2-- connector for capacitor

C2


16V1F 100nF

6 V-output of voltage pump /

inverter

-10V, capacitor should stand

at least 16V1F to GND 100nF to GND

7 T2out Driver 2 output RS-232

8 R2in Receiver 2 input RS-232

9 R2out Receiver 2 output TTL

10 T2in Driver 2 input TTL

11 T1in Driver 1 input TTL

12 R1out Receiver 1 output TTL

13 R1in Receiver 1 input RS-232

14 T1out Driver 1 output RS-232

15 GND Ground 0V 1F to VCC 100nF to VCC

16 VCC Power supply +5V see above see above

V+(2) is also connected to VCC via a capacitor (C3). V-(6) is connected to GND via a capacitor (C4). And

GND(16) and VCC(15) are also connected by a capacitor (C5), as close as possible to the pins.

A Typical Application

The MAX232 (A) has two receivers (converts from RS-232 to TTL voltage levels) and two drivers (converts

from TTL logic to RS-232 voltage levels). This means only two of the RS-232 signals can be converted in each

direction. The old MC1488/1498 combo provided four drivers and receivers.

22


23/37

Typically a pair of a driver/receiver of the MAX232 is used for

TX and RX

and the second one for

CTS and RTS.

There are not enough drivers/receivers in the MAX232 to also connect the DTR, DSR, and DCD signals.

Usually these signals can be omitted when e.g. communicating with a PC's serial interface. If the DTE really

requires these signals either a second MAX232 is needed, or some other IC from the MAX232 family can be

used (if it can be found in consumer electronic shops at all). An alternative for DTR/DSR is also given below.

The circuitry is completed by connecting five capacitors to the IC as it follows. The MAX232 needs 1.0F

capacitors, the MAX232A needs 0.1F capacitors. MAX232 clones show similar differences. It is

recommended to consult the corresponding data sheet. At least 16V capacitor types should be used. If

electrolytic or tantalic capacitors are used, the polarity has to be observed. The first pin as listed in the following

table is always where the plus pole of the capacitor should be connected to.

23


24/37

Capacitor + Pin - Pin Remark

C1 1 3

C2 4 5

C3 2 16

C4 GND 6This looks non-intuitive, but because pin 6 is

on -10V, GND gets the + connector, and not the -

C5 16 GND

The 5V power supply is connected to

+5V: Pin 16

GND: Pin 15

The output of the VT pin is high only when the transmission is valid. Otherwise it is low always.

Output type: There are 2 types of output to select from:

Momentary type: The data outputs follow the encoder during a valid transmission and the reset.

Latch type: The data outputs follow the encoder during a valid

4.3 Microcontroller AT89S52

4.3.1 Overview

AT89S52 is one of the family MCS-51/52 equipped with an internal 8 Kbyte Flash

EPROM (Erasable and Programmable Read Only Memory), which allows memory to be24


25/37

reprogrammed.Designed by Atmel AT89S52 in accordance with standard instructions and pin

layout 80C5.

AT89S52 Microcontroller Features :

A CPU (Central Processing Unit) 8 Bit.

256 bytes of RAM (Random Access Memory) internally.

Four-port I / O, which each consist of eight bits

the internal oscillator and timing circuits.

Two timer / counters 16 bits

Five interrupt lines (two fruits and three external interrupt internal interruptions).

A serial port with full duplex UART (Universal Asynchronous Receiver

Transmitter).

Able to conduct the process of multiplication, division, and Boolean.

the size of 8 KByte EPROM for program memory.

Maximum speed execution of instructions per cycle is 0.5 s at 24 MHz clock

frequency.

If the microcontroller clock frequency used is 12 MHz, the speed is 1 s instruction

execution

25


26/37

4.3.2 Pin Configuration

AT89S52 microcontroller has 40 pins with a single 5 Volt power supply. The pin 40 is

illustrated as follows

Figure:AT89S52 Microcontroller pin diagram

The function of each pin AT89S52 is:

Pin 1 to 8 (Port 1) is an 8-bit parallel port of a two-way (bidirectional) that can be used for

different purposes (general purpose).

Pin 9 is a pin reset, reset is active if a high ration.

P3.0 (10): RXD (serial port data receiver)

P3.1 (11): TXD (serial port data sender)

P3.2 (12): INT0 (external interrupt 0 input, active low)

26


27/37

P3.3 (13): INT1 (ekstrernal an interrupt input, active low)

P3.4 (14): T0 (external input timer / counter 0)

P3.5 (15): T1 (external input timer / counter 1)

P3.6 (16): WR (Write, active low) control signal from port 0 write data to memory and

input-output data externally.

P3.7 (17): RD (Read, active low) control signal of the reading of input-output data

memory external to the port 0. XTAL pin 18 as the second, the output is connected to the

crystal oscillator. XTAL pin 19 as the first, high berpenguatan input to the oscillator,

connected to the crystal.

Pin 20 as Vss, is connected to 0 or ground on the circuit. Pin 21 to 28 (Port 2) is 8 bits parallel

ports in both directions. This port sends the address byte when accessing external memory is

carried on. Pin 29 as the PSEN (Program Store Enable) is the signal used for reading, move the

program the external memory (ROM / EPROM) to microcontroller (active low).

Pin 30 as the ALE (Address Latch Enable) to hold down the address for accessing external

memory. This pin also functions as a prog (active low) that is activated when the internal program

flash memory on the microcontroller (on chip).Pin 31 as the EA (External Accesss) to select the

memory to be used, the internal program memory (EA = Fcc) or external program memory (EA =

Vss), also serves as Vpp (programming supply voltage) when programming the internal flash

memory on the microcontroller Pin 32 to 39 (Port 0) is an 8-bit parallel port in both directions.

Under which functions as a multiplexed address data to access an external program and data

memory.Pin 40 as Fcc, connected to +5 V as a ration to the microcontroller.All single chip in the

family division of MCS-51 has the address space to programs and data. The separation of program

memory and data memory allows data to be accessed by a memory address 8 bits.Even so, the

address memory 16 bits of data can be generated through the DPTR register (Point Data Register).

Program memory can only be read can not be written because it is stored in the EPROM.In this

case the EPROM is available in a single chip AT89S52 for 8 Kbyte.

27


28/37

Figure:AT89S52 Microcontroller memory

28


29/37

CHAPTER 5

Transmission Of Digitized Speech Over Wireless Network

5.1 Overview Of GSM Modem

A GSM modem is a specialized type of modem which accepts a SIM card, and operates

over a subscription to a mobile operator, just like a mobile phone. From the mobile operator

perspective, a GSM modem looks just like a mobile phone.

When a GSM modem is connected to a computer, this allows the computer to use the GSM

modem to communicate over the mobile network. While these GSM modems are most frequently

used to provide mobile internet connectivity, many of them can also be used for sending and

receiving SMS and MMS messages.For the purpose of this project, the term GSM modem is used as a generic term to refer to

any modem that supports one or more of the protocols in the GSM evolutionary family, including

the 2.5G technologies GPRS and EDGE, as well as the 3G technologies WCDMA, UMTS,

HSDPA and HSUPA.

A GSM modem exposes an interface that allows applications such as NowSMS to send and

receive messages over the modem interface. The mobile operator charges for this message sending

and receiving as if it was performed directly on a mobile phone. To perform these tasks, a GSM

modem must support an extended AT command set for sending/receiving SMS messages, as

defined in theETSI GSM 07.05and and 3GPP TS 27.005 specifications.

GSM modems can be a quick and efficient way to get started with SMS, because a special

subscription to an SMS service provider is not required. In most parts of the world, GSM modems

are a cost effective solution for receiving SMS messages, because the sender is paying for the

message delivery.

29
http://www.etsi.org/http://www.etsi.org/http://www.etsi.org/http://www.3gpp.org/ftp/specs/html-info/27005.htmhttp://www.etsi.org/http://www.3gpp.org/ftp/specs/html-info/27005.htm


30/37

5.2 Overview of SIM 300

For the purpose of our project we are using a GSM modem SIM 300 both at the

transmitter and receiver side.

5.2.1 FEATURES COMPLETE GSM MODEM

HANDLES VOICE / DATA / SMS / FAX

DUAL BAND 900 / 1800 MHz GSM TRANSMISSION

ACCEPTS STANDARD SIM CARD

CAN BE USED ON STANDARD GSM NETWORK

RS232 INTERFACE

USES STANDARD AT COMMANDS

SUPPORTS CLASS 1 FAX COMMANDS

DATA TRANSMISSION UP TO 14400 BAUD

30


31/37

CHAPTER 6

RESULTS

Fig. 4 shows the speech signal waveform for a spoken command phrase Alpha Device Six On

and its energy. Fig. 5 shows the plot of MFCC coefficients {Cmel(1), Cmel(2), Cmel(3), Cmel(4)}

for the spoken word Alpha. Fifty samples of each spoken word are stored out of which, 25 are

used for training and remaining for testing. Thus database of 650 samples of words is used for

experimentation. Accuracy of correct recognition for various words in spoken commands with

principal component analysis (PCA) is shown in Table II. The accuracy for the words on and

one is relatively less because of phonetic similarity in these words. However, these words can

easily be discriminated due to difference in their utterance positions in spoken command phrase.

FIGURE: Plot of spoken phrase alpha device six on and its energy

31


32/37

FIGURE: Plot of MFFC Coefficients for spoken word alpha

CHAPTER 7

32


33/37

CONCLUSION AND FUTURE ASPECTS

In the near future, speech recognition will become the method of choice for controlling

appliances, toys, tools, computers and robotics. There is a huge commercial market waiting for this

technology to mature.

This project demonstrates in details the construction and building of a stand alone trainable

speech recognition circuit that may be interfaced to control just about anything electrical, such as;

appliances, robots, test instruments, VCR's TV's, etc. With suitable modifications the project can

be extended for various industrial automation. To control and command an appliance (computer,

VCR, TV security system, etc.) by speaking to it, will make it easier, while increasing the

efficiency and effectiveness of working with that device.

At its most basic level speech recognition allows the user to perform parallel tasks, (i.e. hands and

eyes are busy elsewhere) while continuing to work with the computer or appliances. Remote

control of devices and retrieval of information relating present status of inputs using spoken

commands have been successfully demonstrated. There is scope for lot of improvement depending

upon the user requirements like inclusion of greater number of desired commands, selection of

suitable sensor for measurement of analog parameters, etc. This approach can be easily extended to

develop many exciting products from remote process control to high-end security solutions. It can

prove to be great boon to blind/physically handicapped persons due to its capability for remote

control through speech commands.

33


34/37

ACCURACY OF WORD RECOGNITION:

However, this method is not suitable for time-critical applications as message transfer time to

destination is variable. This problem can be alleviated to certain extent by adding time at which

device should respond to message and sending the SMS well in advance before the

scheduled event. The software can be modified in this case to check if timing information is

present and accordingly schedule that event. Alternatively, it is recommended to use data calls

(Fax/ GPRS) or DTMF based calls for immediate response from system with suitable

modifications. The accuracy of spoken commands recognition system is about 98% (much better

than our previous work [17]). Moreover, spoken phrase can be extended to carry out additional

tasks like adding time duration for Device ON/OFF condition. Further, adding the speaker

verification feature can enhance security level. Presently PC is used for generation of text

34


35/37

message from voice command. For dedicated

applications, PC can be replaced by DSP processor/

FPGA based system with higher initial development cost.

CHAPTER 8

BIBLIOGRAPHY:

35


36/37

1. S. B. Davies and P. Mermelstein, Comparison of Parametric Representation for Monosyllabic

Word Recognition in Continously spoken Semantics, IEEE Transanction on Acoustics, Speech &

Signal Processing, vol ASSP-28, Aug 1980, pp 357-366.

2 .Thomas F. Quatiero, Discrete time Speech Signal Processing Principles and Practice,

Pearson Education (Singapore) Pvt. Ltd., Indian Branch, Delhi, India, 2004

3. N. P. Jawarkar, Vasif Ahmed & R. D. Thakare, Remote world-wide control through SMS using

Nokia Mobile, IETE Journal of Education, Vol 46, No. 4, Oct-Dec 2005, pp 165-170.

4. http://forum.nokia.com, AT Commands Set for Nokia

GSM and WCDMA products, Version 1.2, July 2005.

5. http://www.atmel.com/avr\

6. G. M. White and R. B. Neely, Speech Recognition Experiments with Linear Prediction,

Bandpass filtering & Dynamic Programming, IEEE Transactions on Acoustics, Speech & Signal

Processing, vol ASSP-24(2), 1976, pp 183-188.

7. L. R. Rabiner and B. H. Juang, Fundamental of Speech Recognition, Pearson Education

(Singapore) Pvt. Ltd., 2005.

8. S. Umesh, L. Cohen and D. Nelson, Frequency warping and Mel scale. IEEE Signal

Processing Letter, vol. 9, No. 3, March 2001, pp 104-107.

9. Hollmen V. Tresp and O. Simula, A Learning Vector Quantization Algorithm for Probabilistic

Model, Proceedings of EUSIPCO 2000 X European Signal Processing Conference, Volume II,

pp 721-724.

36
http://www.atmel.com/avr%5Chttp://www.atmel.com/avr%5C


37/37

10. Kohonen T, Improved version of Learning Vector Quantization, International Joint

Conference on Neural Networks, San Diego, CA, 1990, pp z:545-550.

11.. Real Time Data Transmission over GSM Voice Channel for secure Voice & Data Applications

N.N. Katugampala, K.T. Al-Naimi, S. Villette, and A.M. Kondoz, University of Surrey, United

Kingdom Email: [email protected]

Documents

Final Neww