Final Neww

Embed Size (px)

Citation preview

  • 7/31/2019 Final Neww

    1/37

    CHAPTER 1

    INTRODUCTION

    There has been a tremendous rise wireless communication in the form of mobile users in

    world(> 2 billion) in last two decades .Due to widespread growth of cellular networks and drastic

    reduction in call rates and lower end mobile handsets, mobile usage has percolated all sections of

    society. Any mobile having messaging facility and capability to support common AT commands

    can be used in this system .Nokia model 6610 is chosen because it supports AT commands .Most

    mobile manufacturer like Siemens, Motorola, LG, Samsung etc also provides AT command

    capability.

    SMS is store and forward way of transmitting message to and from mobiles.Each short

    nmessahe shouldnt be larger than 160 characters (text/binary).Since SMS uses signaling channels

    instead of dedicated data channels for its transmission and reception ,these messages can be

    sent/received simultaneously with voice/fax /data services over GSM network .The major

    advantage of using SMS is provision of intimation to the sender when SMS is delivered at the

    destination and ability of SMSC to continue efforts to delivery of message for the specified

    validity period if network is presently busy or the user is outside the coverage area.

    A system is developed for remote controlling of various electrical devices using mobile

    through spoken commands .The system offers several attractive features like

    Control from anywhere in the world if cellular coverage is available

    Acknowledgement about execution of command from system to user

    Uses spoken commands from users for control

    Ease of implementation and cost effectiveness approach

    1.1 Overview

    On the user side, microphone is used to translate the voice signal to electrical signal. The

    microphone is connected to HM2007

    1

  • 7/31/2019 Final Neww

    2/37

    In this approach, predetermined phrases of words are selected for various commands. The Mel

    cepstrum features are extracted from the spoken words for recognition. Mel cepstrum exploits

    auditory principles as well as discriminating property of the cepstrum and is proven to be one of

    the most successful feature representations in speech related recognition tasks [1, 2]. The spoken

    words are isolated and recognized after extraction of features. Learning Vector Quantization

    Neural Network is used for recognition of various words used in the command. A text message is

    generated if all spoken words are identified as per specified format. This message is transmitted in

    form of SMS to control system mobile using AT commands [3].

    On control side, system mobile is connected to AVR micro-controller based system

    through RS-232C cable. Process block consists of 8 digital output ports, 8 digital input ports and

    one analog input port. The configuration of number of inputs, output and analog input ports can be

    varied as per the needs of the applications. Presently, LEDs are used to indicate status of output

    digital ports, dip switches to change the status of input digital ports, and potential divider provided

    to vary analog input voltage.

    1.2 AT COMMANDS

    Now-a-days extensive list of mobile related AT commands are available for carrying out various

    activities like sending SMS, using GPRS services, sending fax, controlling speaker volume, battery

    status indication, etc [4-6]. AT commands require sending of text strings A, T, along with

    specified command strings through serial port to mobile and are executed on receipt of carriage

    return. The result codes are sent by mobile to Terminal Equipment (TE) to indicate the response

    after execution of command. The text message is sent to mobile using CMGS commands. CNMI

    command is used to indicate to TE about the receipt of incoming SMS message from the network.

    On receipt of the SMS message, text words are checked with predetermined format, which

    includes password, desired device ON/OFF commands or status

    query. After interpretation of valid control message, microcontroller carries out the specified tasks

    and then sends SMS to pre-specified mobile number as acknowledgement of fulfillment of

    command or reporting of error during execution of command. There are varieties of commands

    available at our disposal like directly storing various predefined messages in phone memory,

    sending messages at appropriate time by calling the relevant message number depending on

    present conditions, storing incoming SMS in phone memory, deleting the message after execution

    2

  • 7/31/2019 Final Neww

    3/37

    of command, etc. But it was decided to discard these features to ensure easy adaptation to any

    mobile model having limited AT commands interpretation capability. So in our case, any

    incoming SMS message is directly routed to microcontroller (TE) and any outgoing text message

    is directly sent by micro-controller to designated mobile number without being stored in control

    system mobile phone memory.

    3

  • 7/31/2019 Final Neww

    4/37

    CHAPTER 2

    SPEECH RECOGNITION

    2.1 Design Introduction

    As globalization, networking, information and digital eras coming, the demand of high

    reliability of our identity verification is growing .An efficient mean to this is by authenticating

    users through biometric methods. Among the existing biometric methods, voice biometrics can be

    an affordable and accurate authentication technology that has been already successfully and widely

    employed. Voiceprint, as a basic human physiological characteristics, possess a unique role which

    is difficult to counterfeit, imitate and replace.As a non-contact identification technology, Voice

    Recognition Technology is being accepted by the users.

    Voice authentication refers to the process of accepting or rejecting the identity claim

    of a speaker on the basis of individual information present in the speech waveform . It

    has received increasing attention over the past two decades, as a convenient, user-friendly way of

    replacing (or supplementing) standard password-type matching.The authentication procedure

    requests from the user to pronounce a random sequence of digits. After capturing speech and

    extracting voice features, individual voice characteritics are generated by registration algorithm.

    The central process unit decides whether the received features match the stored voiceprint of the

    customer who claims to be, and accordingly grants authentication.

    2.2 Voice Recognition Technology Principle

    Voice Recognition, also known as the Speaker Recognition, has two categories:

    speaker identification and speaker verification. Speaker identification is used to determine which

    one of the people speaks, i.e. "one out of more election" and speaker verification is used to

    determine whether a person specified speaks, i.e. "one-on-one recognition".

    According to the voice of different materials, voice recognition can be divided into the

    text-dependent, and text-independent technology. The text-dependent voice recognition system

    requires speaker pronounce in accordance with the contents of the text. Each person's individual

    sound profile model is established accurately. People must also be identified by the contents of the

    text during recognition to achieve better effect. Text-independent recognition system does not

    4

  • 7/31/2019 Final Neww

    5/37

    require fixed contents of words, which is relatively difficult to model, but is convenient for user

    and can be applied to a wide range.Voiceprint recognition is an application based on physiological

    and behavioral characteristics of the speakers voice and linguistic patterns. Different from speech

    recognition, voiceprint recognition is regardless of contents of speech.Rather, the unique features

    of voice are analyzed to identify the speaker. With voice samples, the unique features will be

    extracted and converted to digital symbols, and then these symbols are stored as that person's

    character template. This template is stored in a computer database, a smart card or bar-coded cards.

    User authentication is processed inside the recognition system to identify matching or not.

    2.3 Classification of Speech Recognition System

    Speech Recognition system, according to different points of view and the scope of different

    applications, has different performance requirements of the design. Their implementations are thefollowing types:

    Isolated words, conjunctions, continuous speech recognition, and speech

    understanding of the conversation systems

    Large vocabulary and small vocabulary system

    Specific and non specific speech recognition system

    CHAPTER 35

  • 7/31/2019 Final Neww

    6/37

    DYNAMIC TIME WARPING

    3.1Principle

    A distance measurement between time series is needed to determine similarity between time

    series and for time series classification. Euclidean distance is an efficient distance

    measurement that can be used. The Euclidian distance between two time series is simply the

    sum of the squared distances from each nth point in one time series to the nth point in the

    other. The main disadvantage of using Euclidean distance for time series data is that its results

    are very unintuitive. If two time series are identical, but one is shifted slightly along the time

    axis, then Euclidean distance may consider them to be very different from each other.

    Dynamic time warping (DTW) was introduced to overcome this limitation and give intuitive

    distance measurements between time series by ignoring both global and local shifts in the time

    dimension.

    Problem Formulation. The dynamic time warping problem is stated as follows:

    Given two time series X, and Y, of lengths |X| and |Y|,

    construct a warp path W

    where K is the length of the warp path and the kth element of the warp path is

    where i is an index from time series X, and j is an index from time series Y. The warp path must

    start at the beginning of each time series at w1 = (1, 1) and finish at the end of both time series at

    wK= (|X|, |Y|). This ensures that every index of both time series is used in the warp path. There is

    6

  • 7/31/2019 Final Neww

    7/37

    also a constraint on the warp path that forces i and j to be monotonically increasing in the warp

    path, which is why the lines representing the warp path in Figure 1 do not overlap. Every index of

    each time series must be used. Stated more formally:

    The optimal warp path is the warp path is the minimum-distance warp path, where the

    distance of a warp path W is

    Dist(W) is the distance (typically Euclidean distance) of warp path W, and Dist(wki,

    wkj) is the distance between the two data point indexes (one from X and one from Y)in the kth element of the warp path.

    3.2 Mel-Frequency Cepstral Coefficients

    The Mel-cepstrum exploits auditory principles as well as decorrelating property of the

    cepstrum. In MFCC implementation, triangular filters are used. These filters follow Mel scale

    whereby band edges and corner frequencies are linear for low frequencies (

  • 7/31/2019 Final Neww

    8/37

    Finally discrete cosine transform (DCT) of filter bank coefficients is taken to get MFCC as under

    where log{Emel(l)} is log filter bank energies & Cmel(k) is the kth MFCC and N is number of

    filters.It has been observed that performance is reasonably well for 24 filters in MFCC

    implementation [14]. If there are n frames in a word and 12 MFCCs are computed for each frame,

    we get feature vector of length 12

    n. However, the number of frames (n) varies from word toword, which in turn changes the length of feature vector. In order to obtain feature vector of

    constant length, n values of each Mel Frequency Cepstral Coefficient are converted into 10 values

    using resampling technique. Thus for each word, constant length feature vector of 120 (12 10)

    elements is obtained. Principal Component Analysis (PCA) is carried out on the MFCC data thus

    obtained. PCA transforms the input data so that the elements of the input vectors are uncorrelated.

    3.3 LVQ Classifier:

    The LVQ is an algorithm for learning classifiers from labeled data samples. It models the

    discrimination function defined by the set of labeled codebook vectors and the nearest

    neighborhood search between the codebook and data. In classification, a data point xi is assigned

    to a class according to the class label of the closest codebook vector. The training algorithm

    involves an iterative gradient update of the winner unit [15, 16].

    The winner unit wc is defined by

    c = arg min || xi wk ||

    kThe update equation for the winner unit wc defined by the nearest neighbor and a data sample x(t)

    is

    wc(t+1) = wc(t) alpha(t) [x(t) wc(t)]

    where sign depends on whether the data sample is correctly classified (+) or misclassified (-) and

    alpha(t) is learning rule and must decrease monotonically in time.8

  • 7/31/2019 Final Neww

    9/37

    CHAPTER 4

    Hardware Implementations

    4.1 Transmitter Part

    9

  • 7/31/2019 Final Neww

    10/37

    On the transmitter side, microphone is used to translate the voice signal to electrical signal. The

    microphone is connected to HM2007 , a voice recognition module.

    In this approach, predetermined phrases of words are selected for various commands. The Mel

    cepstrum features are extracted from the spoken words for recognition. Mel cepstrum exploits

    auditory principles as well as discriminating property of the cepstrum and is proven to be one of

    the most successful feature representations in speech related recognition tasks [1, 2]. The spoken

    words are isolated and recognized after extraction of features. Learning Vector Quantization

    Neural Network is used for recognition of various words used in the command. A text message is

    generated if all spoken words are identified as per specified format. This message is transmitted in

    form of SMS to control system mobile using AT commands.

    Figure: BLOCK DIAGRAM OF TRANSMITTER

    4.1.1 NEURAL NETWORK FOR SPEECH RECOGNITION:

    4.1.1.1Overview of Neural Networks:

    10

  • 7/31/2019 Final Neww

    11/37

    Artificial neural networks are computers whose architecture is modeled

    after the brain. They typically consist of many hundreds of simple processing units

    which are wired together in a complex communication network. Each unit

    or node is a simplified model of a real neuron which fires (sends off a new signal) if

    it receives a sufficiently strong input signal from the other nodes to which it is

    connected. The strength of these connections may be varied in order for the

    network to perform different tasks corresponding to different patterns of node firing

    activity. This structure is very different from traditional computers.

    4.1.1.2 Fundamentals of Neural Network:

    There are many different types of neural networks, but they all have four

    basic attributes:

    A set of processing units;

    A set of connections;

    A computing procedure

    A training procedure.

    4.1.1.3Artificial neural networks from the viewpoint of speech recognition:

    Artificial neural networks (ANNs) are systems consisting of

    interconnected computational nodes working somewhat similarly to human neurons. Neural

    networks can be used e.g. to approximate functions or classify data into similar classes than can be

    e.g. phonemes, sub-phoneme units, syllables or words in the speech recognition domain. The

    ability to learn by adapting strengths of inter-neuron connections (synapses) is a fundamental

    property of artificial neural networks. Speech recognition has been another proving ground for

    neural networks.

    Researchers quickly achieved excellent results in such basic tasks as voiced/unvoiced

    discrimination (Watrous 1988), phoneme recognition (Waibel et al, 1989), and spoken digit

    recognition (Franzini et al, 1989). However, in 1990, when this thesis was proposed, it still

    remained to be seen whether neural networks could support a large vocabulary, speaker

    independent, continuous speech recognition system. Of the two types of variability in speech

    11

  • 7/31/2019 Final Neww

    12/37

    acoustic and temporal the former is more naturally posed as a static pattern matching problem

    that is amenable to neural networks; therefore we use neural networks for acoustic modeling, while

    we rely on conventional Hidden Markov Models for temporal modeling.

    4.1.2 HM2007 FUNCTIONING:

    The HM2007 is a CMOS voice recognition LSI (Large Scale Integration) circuit. The chip

    contains an analog front end, voice analysis, regulation, and system control functions. The chip

    may be used in a stand alone or CPU connected.

    Pin configuration of HM2007

    The functioning of the HM2007 IC involves the following steps:

    4.1.2.1 Speech Acquisition:

    12

  • 7/31/2019 Final Neww

    13/37

    We can easily implement speech acquisition with the HM 2007 ic. During

    speech acquisition, speech samples are obtained from the speaker in real time and stored in

    memory for preprocessing. Speech acquisition requires a microphone coupled with an analog-to-

    digital converter (ADC) that has the proper amplification to receive the voice speech signal,

    sample it, and convert it into digital speech. The system sends the analog speech through a

    transducer, amplifies it, sends it through an ADC. The received samples are stored into memory on

    a RAM. The microphone input port with the audio codec receives the signal, amplifies it, and

    converts it into 8-bit PCM digital samples at a sampling rate of 3.57MHZ. The HM 2007 IC

    requires initial configuration or training of words, which is performed using a programming board.

    In the training process user trains the IC by speaking words into the microphone and assigning a

    particular value for that word. For example a world hello can be assigned a value 02or 05. This

    can then be later connected to a microcontroller for further functions.

    4.1.2.2 Speech Preprocessing:

    Preprocessing reduces the amount of processing required in later

    stages. Generally, preprocessing involves taking the speech samples as input, blocking the samples

    into frames, and returning a unique pattern for each sample, as described in the following steps.

    1. The system must identify useful or significant samples from the speech signal. To accomplish

    this goal, the system divides the speech samples into overlapped frames.

    2. The system checks the frames for voice activity using endpoint detection and energy threshold

    calculations.

    3. The speech samples are passed through a pre-emphasis filter.

    4.1.2.3 Training the IC:

    An important part of speech-to-text conversion using pattern

    recognition is training. Training involves creating a pattern representative of the features of a class

    using one or more test patterns that correspond to speech sounds of the same class. A model

    commonly used for speech recognition is the HMM, which is a statistical model used for modeling

    13

  • 7/31/2019 Final Neww

    14/37

    an unknown system using an observed output sequence. The keypad and digital display are used to

    communicate with and program the HM2007 chip.

    The keypad is made up of 12 normally open momentary contact switches. When the circuit

    is turned on, 00 is on the digital display, the red LED (READY) is lit and the circuit waits for a

    command.

    4.2 Receiver Part

    On control side i.e. at the receiver side, system mobile is connected to AVR micro-controller

    based system through RS-232C cable.Process block consists of 8 digital output ports, 8 digital input ports

    and one analog input port. The configuration of number of inputs, output and analog input ports can be

    varied as per the needs of the applications.

    Figure: BLOCK DIAGRAM OF THE RECEIVER SECTION

    4.2.1 Overview of Serial Communication

    Computers can transfer data in two ways: parallel and serial. In parallel data transfers, often 8 or more

    lines (wire conductors) are used to transfer data to a device that is only a few feet away. Examples of parallel

    data transfer are printers and hard disks; each uses cables with many wire strips. Although in such cases a lot of

    14

  • 7/31/2019 Final Neww

    15/37

    data can be transferred in a short amount of time by using many wires in parallel, the distance cannot be great.

    To transfer to a device located many meters away, the serial method is used. In serial communication, the data

    is sent one bit at a time, in contrast to parallel communication, in which the data is sent a byte or more at a time.

    Serial communication of the 89s52 and the peripheral is the topic of this chapter.

    If data is to be transferred on the telephone line, it must be converted from 0s and 1s to audio tones,

    which are sinusoidal-shaped signals. A peripheral device called a modem, which stands for

    modulator/demodulator, performs this conversion.

    Serial data communication uses two methods, asynchronous and synchronous. The synchronous

    method transfers a block of data at a time, while the asynchronous method transfers a single byte at a time.

    In data transmission if the data can be transmitted and received, it is a duplex transmission. This is in

    contrast to simplex transmissions such as with printers, in which the computer only sends data. Duplex

    transmissions can be half or full duplex, depending on whether or not the data transfer can be simultaneous. If

    data is transmitted one way at a time, it is referred to as half duplex. If the data can go both ways at the same

    time, it is full duplex. Of course, full duplex requires two wire conductors for the data lines, one for

    transmission and one for reception, in order to transfer and receive data simultaneously.

    Asynchronous serial communication and data framing

    The data coming in at the receiving end of the data line in a serial data transfer is all 0s and 1s; it is

    difficult to make sense of the data unless the sender and receiver agree on a set of rules, a protocol, on how the

    data is packed, how many bits constitute a character, and when the data begins and ends.

    Start and stop bits

    Asynchronous serial data communication is widely used for character-oriented transmissions, while

    block-oriented data transfers use the synchronous method. In the asynchronous method, each character is

    placed between start and stop bits. This is called framing. In the data framing for asynchronous

    communications, the data, such as ASCII characters, are packed between a start bit and a stop bit. The start bit

    is always one bit, but the stop bit can be one or two bits. The start bit is always a 0 (low) and the stop bit (s) is 1

    (high).

    Data transfer rate

    The rate of data transfer in serial data communication is stated in bps (bits per second). Another

    widely used terminology for bps is baud rate. However, the baud and bps rates are not necessarily equal. This

    15

  • 7/31/2019 Final Neww

    16/37

    is due to the fact that baud rate is the modem terminology and is defined as the number of signal changes per

    second. In modems a single change of signal, sometimes transfers several bits of data. As far as the conductor

    wire is concerned, the baud rate and bps are the same, and for this reason we use the bps and baud

    interchangeably.

    The data transfer rate of given computer system depends on communication ports incorporated into

    that system. For example, the early IBMPC/XT could transfer data at the rate of 100 to 9600 bps. In recent

    years, however, Pentium based PCS transfer data at rates as high as 56K bps. It must be noted that in

    asynchronous serial data communication, the baud rate is generally limited to 100,000bps.

    Computers can transfer data in two ways: parallel and serial. In parallel data transfers, often 8 or more

    lines (wire conductors) are used to transfer data to a device that is only a few feet away. Examples of parallel

    transfers are printers and hard disks; each uses cables with many wire strips. Although in such cases a lot of

    data can be transferred in a short amount of time by using many wires in parallel, the distance cannot be great.

    To transfer to a device located many meters away, the serial method is used. In serial communication, the data

    is sent one bit at a time, in contrast to parallel communication, in which the data is sent a byte or more at a time.

    The 8051 has serial communication capability built into it, there by making possible fast data transfer using

    only a few wires. The PC uses RS 232 as a Serial Communication Standard.

    4.2.2 RS232 Standards

    To allow compatibility among data communication equipment made by various manufacturers, an

    interfacing standard called RS232 was set by the Electronics Industries Association (EIA) in 1960. In 1963 it

    was modified and called RS232A. RS232B AND RS232C were issued in 1965 and 1969, respectively. Today,

    RS232 is the most widely used serial I/O interfacing standard. This standard is used in PCs and numerous types

    of equipment. However, since the standard was set long before the advert of the TTL logic family, its input and

    output voltage levels are not TTL compatible. In RS232, a 1 is represented by -3 to -25V, while a 0 bit is +3 to

    +25V, making -3 to +3 undefined. For this reason, to connect any RS232 to a microcontroller system we must

    use voltage converters such as MAX232 to convert the TTL logic levels to the RS232 voltage levels, and vice

    versa. MAX232 IC chips are commonly referred to as line drivers.

    RS232 pins

    16

  • 7/31/2019 Final Neww

    17/37

    RS232 cable connector commonly referred to as the DB-25 connector. In labeling, DB-25P

    refers to the plug connector (male) and DB-25S is for the socket connector (female). Since not all

    the pins are used in PC cables, IBM introduced the DB-9 Version of the serial I/O standard,

    which uses 9 pins only, as shown in table.

    DB-9 pin connector

    1 2 3 4 5

    6 7 8 9

    (Out of computer and exposed end of cable)

    Pin Functions:

    Pin Description

    1 Data carrier detect (DCD)2 Received data (RXD)

    3 Transmitted data (TXD)

    4 Data terminal ready(DTR)

    5 Signal ground (GND)

    6 Data set ready (DSR)

    7 Request to send (RTS)

    8 Clear to send (CTS)

    9 Ring indicator (RI)

    Note: DCD, DSR, RTS and CTS are active low pins.

    The method used by RS-232 for communication allows for a simple connection of three linesnamely Tx, Rx, and Ground.

    TXD: carries data from DTE to the DCE.

    RXD: carries data from DCE to the DTE

    SG: signal ground

    4.2.3 8051 connection to RS232:

    17

  • 7/31/2019 Final Neww

    18/37

    Embedded

    Controller

    RXD

    TXD

    TXD

    RXD2

    3

    5

    GND

    MAX 232

    The RS232 standard is not TTL compatible; therefore, it requires a Line Driver such as the MAX232

    chip to convert RS232 voltage levels to TTL levels, and vice versa.

    The 8051 has two pins that are used specifically for transferring and receiving data serially. These two

    pins are TXD and RXD and are a part of the port 3 (P3.0 and P3.1). Pin 11 of the 8051 is designated as TXD

    and pin 10 as RXD. These pins are TTL compatible; therefore, they require a line driver to make them RS232

    compatible. One such line driver is the MAX232 chip.

    MAX232 converts from RS232 voltage levels to TTL voltage levels, and vice versa. One advantage of

    the MAX232 chip is that it uses a +5V power source which, is the same as the source voltage for the 8051. In

    the other words, with a single +5V power supply we can power both the 8051 and MAX232, with no need for

    the power supplies. The MAX232 has two sets of line drivers for transferring and receiving data. The line

    drivers used for TXD are called T1 and T2, while the line drivers for RXD are designated as R1 and R2. In

    many applications only one of each is used.

    4.2.4 MAX-232

    Logic Signal Voltage

    18

  • 7/31/2019 Final Neww

    19/37

    Serial RS-232 (V.24) communication works with voltages (between -15V ... -3V and used to transmit a binary

    '1' and +3V ... +15V to transmit a binary '0') which are not compatible with today's computer logic voltages. On

    the other hand, classic TTL computer logic operates between 0V ... +5V (roughly 0V ... +0.8V referred to as

    low for binary '0', +2V ... +5V for high binary '1' ). Modern low-power logic operates in the range of 0V ...

    +3.3V or even lower.

    So, the maximum RS-232 signal levels are far too high for today's computer logic electronics, and the

    negative RS-232 voltage can't be grokked at all by the computer logic. Therefore, to receive serial data from an

    RS-232 interface the voltage has to be reduced, and the 0 and 1 voltage levels inverted. In the other direction

    (sending data from some logic over RS-232) the low logic voltage has to be "bumped up", and a negative

    voltage has to be generated, too.

    RS-232 TTL Logic

    --------------------------------------------------------

    -15V ... -3V +2V ... +5V 1

    +3V ... +15V 0V ... +0.8V 0

    All this can be done with conventional analog electronics, e.g. a particular power supply and a couple of

    transistorsor the once popular 1488 (transmitter) and 1489 (receiver) ICs. However, since more than a decade it

    has become standard in amateur electronics to do the necessary signal level conversion with an integrated

    circuit (IC) from the MAX232 family (typically a MAX232A or some clone). In fact, it is hard to find some RS-

    232 circuitry in amateur electronics without a MAX232A or some clone.

    The MAX232 & MAX232A

    19

    http://en.wikipedia.org/wiki/transistorhttp://en.wikipedia.org/wiki/transistorhttp://en.wikipedia.org/wiki/transistor
  • 7/31/2019 Final Neww

    20/37

    Figure: A MAX232 integrated circuit

    The MAX232 fromMaximwas the first IC which in one package contains the necessary drivers (two)

    and receivers (also two), to adapt the RS-232 signal voltage levels to TTL logic. It became popular, because it

    just needs one voltage (+5V) and generates the necessary RS-232 voltage levels (approx. -10V and +10V)

    internally. This greatly simplified the design of circuitry. Circuitry designers no longer need to design and build

    a power supply with three voltages (e.g. -12V, +5V, and +12V), but could just provide one +5V power supply,

    e.g. with the help of a simple 78x05 voltage converter.

    The MAX232 has a successor, the MAX232A. The ICs are almost identical, however, the MAX232A is

    much more often used than the original MAX232, and the MAX232A only needs external capacitors 1/10th the

    capacity of what the original MAX232 needs.

    It should be noted that the MAX232 (A) is just a driver/receiver. It does not generate the necessary RS-

    232 sequence of marks and spaces with the right timing, it does not decode the RS-232 signal, it does not

    provide a serial/parallel conversion. All it does is to convert signal voltage levels. Generating serial data with

    the right timing and decoding serial data has to be done by additional circuitry, e.g. by a 16550 UARTor one of

    these small micro controllers (e.g.Atmel AVR,Microchip PIC) getting more and more popular.

    The MAX232 and MAX232A were once rather expensive ICs, but today they are cheap. It has also helped

    that many companies now produce clones (ie. Sipex). These clones sometimes need different external circuitry,

    e.g. the capacities of the external capacitors vary. It is recommended to check the data sheet of the particular

    manufacturer of an IC instead of relying on Maxim's original data sheet.

    20

    http://www.maxim-ic.com/http://www.maxim-ic.com/http://www.maxim-ic.com/http://en.wikibooks.org/wiki/Serial_Programming:8250_UART_Programminghttp://en.wikibooks.org/wiki/Serial_Programming:8250_UART_Programminghttp://en.wikibooks.org/wiki/Atmel_AVRhttp://en.wikibooks.org/wiki/Atmel_AVRhttp://en.wikibooks.org/wiki/Embedded_Systems/PIC_Microcontrollerhttp://en.wikibooks.org/wiki/Embedded_Systems/PIC_Microcontrollerhttp://www.sipex.com/products/interface.htmhttp://www.maxim-ic.com/http://en.wikibooks.org/wiki/Serial_Programming:8250_UART_Programminghttp://en.wikibooks.org/wiki/Atmel_AVRhttp://en.wikibooks.org/wiki/Embedded_Systems/PIC_Microcontrollerhttp://www.sipex.com/products/interface.htm
  • 7/31/2019 Final Neww

    21/37

    The original manufacturer (and now some clone manufacturers, too) offers a large series of similar ICs, with

    different numbers of receivers and drivers, voltages, built-in or external capacitors, etc. E.g. The MAX232 and

    MAX232A need external capacitors for the internal voltage pump, while the MAX233 has these capacitors

    built-in. The MAX233 is also between three and ten times more expensive in electronic shops than the

    MAX232A because of its internal capacitors. It is also more difficult to get the MAX233 than the garden

    variety MAX232A.

    A similar IC, the MAX3232 is nowadays available for low-power 3V logic.

    MAX232(A) DIP Package

    No. Name Purpose Signal VoltageCapacitor

    MAX232Capacitor MAX232A

    1 C1++ connector for

    capacitor C1

    capacitor should stand at least

    16V1F 100Nf

    2 V+ output of voltage pump+10V, capacitor should stand

    at least 16V1F to VCC 100nF to VCC

    3 C1-- connector for capacitor

    C1

    capacitor should stand at least

    16V1F 100nF

    21

  • 7/31/2019 Final Neww

    22/37

    4 C2++ connector for

    capacitor C2

    capacitor should stand at least

    16V1F 100nF

    5 C2-- connector for capacitor

    C2

    capacitor should stand at least

    16V1F 100nF

    6 V-output of voltage pump /

    inverter

    -10V, capacitor should stand

    at least 16V1F to GND 100nF to GND

    7 T2out Driver 2 output RS-232

    8 R2in Receiver 2 input RS-232

    9 R2out Receiver 2 output TTL

    10 T2in Driver 2 input TTL

    11 T1in Driver 1 input TTL

    12 R1out Receiver 1 output TTL

    13 R1in Receiver 1 input RS-232

    14 T1out Driver 1 output RS-232

    15 GND Ground 0V 1F to VCC 100nF to VCC

    16 VCC Power supply +5V see above see above

    V+(2) is also connected to VCC via a capacitor (C3). V-(6) is connected to GND via a capacitor (C4). And

    GND(16) and VCC(15) are also connected by a capacitor (C5), as close as possible to the pins.

    A Typical Application

    The MAX232 (A) has two receivers (converts from RS-232 to TTL voltage levels) and two drivers (converts

    from TTL logic to RS-232 voltage levels). This means only two of the RS-232 signals can be converted in each

    direction. The old MC1488/1498 combo provided four drivers and receivers.

    22

  • 7/31/2019 Final Neww

    23/37

    Typically a pair of a driver/receiver of the MAX232 is used for

    TX and RX

    and the second one for

    CTS and RTS.

    There are not enough drivers/receivers in the MAX232 to also connect the DTR, DSR, and DCD signals.

    Usually these signals can be omitted when e.g. communicating with a PC's serial interface. If the DTE really

    requires these signals either a second MAX232 is needed, or some other IC from the MAX232 family can be

    used (if it can be found in consumer electronic shops at all). An alternative for DTR/DSR is also given below.

    The circuitry is completed by connecting five capacitors to the IC as it follows. The MAX232 needs 1.0F

    capacitors, the MAX232A needs 0.1F capacitors. MAX232 clones show similar differences. It is

    recommended to consult the corresponding data sheet. At least 16V capacitor types should be used. If

    electrolytic or tantalic capacitors are used, the polarity has to be observed. The first pin as listed in the following

    table is always where the plus pole of the capacitor should be connected to.

    23

  • 7/31/2019 Final Neww

    24/37

    Capacitor + Pin - Pin Remark

    C1 1 3

    C2 4 5

    C3 2 16

    C4 GND 6This looks non-intuitive, but because pin 6 is

    on -10V, GND gets the + connector, and not the -

    C5 16 GND

    The 5V power supply is connected to

    +5V: Pin 16

    GND: Pin 15

    The output of the VT pin is high only when the transmission is valid. Otherwise it is low always.

    Output type: There are 2 types of output to select from:

    Momentary type: The data outputs follow the encoder during a valid transmission and the reset.

    Latch type: The data outputs follow the encoder during a valid

    4.3 Microcontroller AT89S52

    4.3.1 Overview

    AT89S52 is one of the family MCS-51/52 equipped with an internal 8 Kbyte Flash

    EPROM (Erasable and Programmable Read Only Memory), which allows memory to be24

  • 7/31/2019 Final Neww

    25/37

    reprogrammed.Designed by Atmel AT89S52 in accordance with standard instructions and pin

    layout 80C5.

    AT89S52 Microcontroller Features :

    A CPU (Central Processing Unit) 8 Bit.

    256 bytes of RAM (Random Access Memory) internally.

    Four-port I / O, which each consist of eight bits

    the internal oscillator and timing circuits.

    Two timer / counters 16 bits

    Five interrupt lines (two fruits and three external interrupt internal interruptions).

    A serial port with full duplex UART (Universal Asynchronous Receiver

    Transmitter).

    Able to conduct the process of multiplication, division, and Boolean.

    the size of 8 KByte EPROM for program memory.

    Maximum speed execution of instructions per cycle is 0.5 s at 24 MHz clock

    frequency.

    If the microcontroller clock frequency used is 12 MHz, the speed is 1 s instruction

    execution

    25

  • 7/31/2019 Final Neww

    26/37

    4.3.2 Pin Configuration

    AT89S52 microcontroller has 40 pins with a single 5 Volt power supply. The pin 40 is

    illustrated as follows

    Figure:AT89S52 Microcontroller pin diagram

    The function of each pin AT89S52 is:

    Pin 1 to 8 (Port 1) is an 8-bit parallel port of a two-way (bidirectional) that can be used for

    different purposes (general purpose).

    Pin 9 is a pin reset, reset is active if a high ration.

    P3.0 (10): RXD (serial port data receiver)

    P3.1 (11): TXD (serial port data sender)

    P3.2 (12): INT0 (external interrupt 0 input, active low)

    26

  • 7/31/2019 Final Neww

    27/37

    P3.3 (13): INT1 (ekstrernal an interrupt input, active low)

    P3.4 (14): T0 (external input timer / counter 0)

    P3.5 (15): T1 (external input timer / counter 1)

    P3.6 (16): WR (Write, active low) control signal from port 0 write data to memory and

    input-output data externally.

    P3.7 (17): RD (Read, active low) control signal of the reading of input-output data

    memory external to the port 0. XTAL pin 18 as the second, the output is connected to the

    crystal oscillator. XTAL pin 19 as the first, high berpenguatan input to the oscillator,

    connected to the crystal.

    Pin 20 as Vss, is connected to 0 or ground on the circuit. Pin 21 to 28 (Port 2) is 8 bits parallel

    ports in both directions. This port sends the address byte when accessing external memory is

    carried on. Pin 29 as the PSEN (Program Store Enable) is the signal used for reading, move the

    program the external memory (ROM / EPROM) to microcontroller (active low).

    Pin 30 as the ALE (Address Latch Enable) to hold down the address for accessing external

    memory. This pin also functions as a prog (active low) that is activated when the internal program

    flash memory on the microcontroller (on chip).Pin 31 as the EA (External Accesss) to select the

    memory to be used, the internal program memory (EA = Fcc) or external program memory (EA =

    Vss), also serves as Vpp (programming supply voltage) when programming the internal flash

    memory on the microcontroller Pin 32 to 39 (Port 0) is an 8-bit parallel port in both directions.

    Under which functions as a multiplexed address data to access an external program and data

    memory.Pin 40 as Fcc, connected to +5 V as a ration to the microcontroller.All single chip in the

    family division of MCS-51 has the address space to programs and data. The separation of program

    memory and data memory allows data to be accessed by a memory address 8 bits.Even so, the

    address memory 16 bits of data can be generated through the DPTR register (Point Data Register).

    Program memory can only be read can not be written because it is stored in the EPROM.In this

    case the EPROM is available in a single chip AT89S52 for 8 Kbyte.

    27

  • 7/31/2019 Final Neww

    28/37

    Figure:AT89S52 Microcontroller memory

    28

  • 7/31/2019 Final Neww

    29/37

    CHAPTER 5

    Transmission Of Digitized Speech Over Wireless Network

    5.1 Overview Of GSM Modem

    A GSM modem is a specialized type of modem which accepts a SIM card, and operates

    over a subscription to a mobile operator, just like a mobile phone. From the mobile operator

    perspective, a GSM modem looks just like a mobile phone.

    When a GSM modem is connected to a computer, this allows the computer to use the GSM

    modem to communicate over the mobile network. While these GSM modems are most frequently

    used to provide mobile internet connectivity, many of them can also be used for sending and

    receiving SMS and MMS messages.For the purpose of this project, the term GSM modem is used as a generic term to refer to

    any modem that supports one or more of the protocols in the GSM evolutionary family, including

    the 2.5G technologies GPRS and EDGE, as well as the 3G technologies WCDMA, UMTS,

    HSDPA and HSUPA.

    A GSM modem exposes an interface that allows applications such as NowSMS to send and

    receive messages over the modem interface. The mobile operator charges for this message sending

    and receiving as if it was performed directly on a mobile phone. To perform these tasks, a GSM

    modem must support an extended AT command set for sending/receiving SMS messages, as

    defined in theETSI GSM 07.05and and 3GPP TS 27.005 specifications.

    GSM modems can be a quick and efficient way to get started with SMS, because a special

    subscription to an SMS service provider is not required. In most parts of the world, GSM modems

    are a cost effective solution for receiving SMS messages, because the sender is paying for the

    message delivery.

    29

    http://www.etsi.org/http://www.etsi.org/http://www.etsi.org/http://www.3gpp.org/ftp/specs/html-info/27005.htmhttp://www.etsi.org/http://www.3gpp.org/ftp/specs/html-info/27005.htm
  • 7/31/2019 Final Neww

    30/37

    5.2 Overview of SIM 300

    For the purpose of our project we are using a GSM modem SIM 300 both at the

    transmitter and receiver side.

    5.2.1 FEATURES COMPLETE GSM MODEM

    HANDLES VOICE / DATA / SMS / FAX

    DUAL BAND 900 / 1800 MHz GSM TRANSMISSION

    ACCEPTS STANDARD SIM CARD

    CAN BE USED ON STANDARD GSM NETWORK

    RS232 INTERFACE

    USES STANDARD AT COMMANDS

    SUPPORTS CLASS 1 FAX COMMANDS

    DATA TRANSMISSION UP TO 14400 BAUD

    30

  • 7/31/2019 Final Neww

    31/37

    CHAPTER 6

    RESULTS

    Fig. 4 shows the speech signal waveform for a spoken command phrase Alpha Device Six On

    and its energy. Fig. 5 shows the plot of MFCC coefficients {Cmel(1), Cmel(2), Cmel(3), Cmel(4)}

    for the spoken word Alpha. Fifty samples of each spoken word are stored out of which, 25 are

    used for training and remaining for testing. Thus database of 650 samples of words is used for

    experimentation. Accuracy of correct recognition for various words in spoken commands with

    principal component analysis (PCA) is shown in Table II. The accuracy for the words on and

    one is relatively less because of phonetic similarity in these words. However, these words can

    easily be discriminated due to difference in their utterance positions in spoken command phrase.

    FIGURE: Plot of spoken phrase alpha device six on and its energy

    31

  • 7/31/2019 Final Neww

    32/37

    FIGURE: Plot of MFFC Coefficients for spoken word alpha

    CHAPTER 7

    32

  • 7/31/2019 Final Neww

    33/37

    CONCLUSION AND FUTURE ASPECTS

    In the near future, speech recognition will become the method of choice for controlling

    appliances, toys, tools, computers and robotics. There is a huge commercial market waiting for this

    technology to mature.

    This project demonstrates in details the construction and building of a stand alone trainable

    speech recognition circuit that may be interfaced to control just about anything electrical, such as;

    appliances, robots, test instruments, VCR's TV's, etc. With suitable modifications the project can

    be extended for various industrial automation. To control and command an appliance (computer,

    VCR, TV security system, etc.) by speaking to it, will make it easier, while increasing the

    efficiency and effectiveness of working with that device.

    At its most basic level speech recognition allows the user to perform parallel tasks, (i.e. hands and

    eyes are busy elsewhere) while continuing to work with the computer or appliances. Remote

    control of devices and retrieval of information relating present status of inputs using spoken

    commands have been successfully demonstrated. There is scope for lot of improvement depending

    upon the user requirements like inclusion of greater number of desired commands, selection of

    suitable sensor for measurement of analog parameters, etc. This approach can be easily extended to

    develop many exciting products from remote process control to high-end security solutions. It can

    prove to be great boon to blind/physically handicapped persons due to its capability for remote

    control through speech commands.

    33

  • 7/31/2019 Final Neww

    34/37

    ACCURACY OF WORD RECOGNITION:

    However, this method is not suitable for time-critical applications as message transfer time to

    destination is variable. This problem can be alleviated to certain extent by adding time at which

    device should respond to message and sending the SMS well in advance before the

    scheduled event. The software can be modified in this case to check if timing information is

    present and accordingly schedule that event. Alternatively, it is recommended to use data calls

    (Fax/ GPRS) or DTMF based calls for immediate response from system with suitable

    modifications. The accuracy of spoken commands recognition system is about 98% (much better

    than our previous work [17]). Moreover, spoken phrase can be extended to carry out additional

    tasks like adding time duration for Device ON/OFF condition. Further, adding the speaker

    verification feature can enhance security level. Presently PC is used for generation of text

    34

  • 7/31/2019 Final Neww

    35/37

    message from voice command. For dedicated

    applications, PC can be replaced by DSP processor/

    FPGA based system with higher initial development cost.

    CHAPTER 8

    BIBLIOGRAPHY:

    35

  • 7/31/2019 Final Neww

    36/37

    1. S. B. Davies and P. Mermelstein, Comparison of Parametric Representation for Monosyllabic

    Word Recognition in Continously spoken Semantics, IEEE Transanction on Acoustics, Speech &

    Signal Processing, vol ASSP-28, Aug 1980, pp 357-366.

    2 .Thomas F. Quatiero, Discrete time Speech Signal Processing Principles and Practice,

    Pearson Education (Singapore) Pvt. Ltd., Indian Branch, Delhi, India, 2004

    3. N. P. Jawarkar, Vasif Ahmed & R. D. Thakare, Remote world-wide control through SMS using

    Nokia Mobile, IETE Journal of Education, Vol 46, No. 4, Oct-Dec 2005, pp 165-170.

    4. http://forum.nokia.com, AT Commands Set for Nokia

    GSM and WCDMA products, Version 1.2, July 2005.

    5. http://www.atmel.com/avr\

    6. G. M. White and R. B. Neely, Speech Recognition Experiments with Linear Prediction,

    Bandpass filtering & Dynamic Programming, IEEE Transactions on Acoustics, Speech & Signal

    Processing, vol ASSP-24(2), 1976, pp 183-188.

    7. L. R. Rabiner and B. H. Juang, Fundamental of Speech Recognition, Pearson Education

    (Singapore) Pvt. Ltd., 2005.

    8. S. Umesh, L. Cohen and D. Nelson, Frequency warping and Mel scale. IEEE Signal

    Processing Letter, vol. 9, No. 3, March 2001, pp 104-107.

    9. Hollmen V. Tresp and O. Simula, A Learning Vector Quantization Algorithm for Probabilistic

    Model, Proceedings of EUSIPCO 2000 X European Signal Processing Conference, Volume II,

    pp 721-724.

    36

    http://www.atmel.com/avr%5Chttp://www.atmel.com/avr%5C
  • 7/31/2019 Final Neww

    37/37

    10. Kohonen T, Improved version of Learning Vector Quantization, International Joint

    Conference on Neural Networks, San Diego, CA, 1990, pp z:545-550.

    11.. Real Time Data Transmission over GSM Voice Channel for secure Voice & Data Applications

    N.N. Katugampala, K.T. Al-Naimi, S. Villette, and A.M. Kondoz, University of Surrey, United

    Kingdom Email: [email protected]