Speech Recog Report - For Merge

Embed Size (px)

Citation preview

  • 8/4/2019 Speech Recog Report - For Merge

    1/78

    1

    PART-I

    Preliminary Investigation

  • 8/4/2019 Speech Recog Report - For Merge

    2/78

    2

    1.1 Introduction

    This report is a System analysis and design project, which is a study of global

    positioning system software receiver Technology. In this project we studied

    how gps receiver will works and processed the signal get desired location

    ,time and position . We start with with gps and its various components

    ,process and receiver tracking system . Hence, this system makes it possible

    tracking the location of things which consists gps receiver. This processes

    changes the signal to digits.. The process involves many models and theoriesthat makes the gps successful.

    Gps is used in large number of areas. For examples mobile phone

    tracking,vehicle tracking system information providing using automated call ,

    defence uses, robotics, etc. It facilitates the human computer interaction and

    also provides a way to communicate with satellite communication.

    The ultimate goal of the technology is to be able to produce a system that can

    recognize with 100% accuracy the time and location .. Even after years of

    research in this area, the best gps software applications still cannot recognize

    location with 100% accuracy. Some applications are able to recognize over 95%

    position when environment factors are constant.

    Computer software that tracks the location of real world objects enable user to

    have conversations with the satellite.

  • 8/4/2019 Speech Recog Report - For Merge

    3/78

    3

    2. Objective

    To study global positioning system and its various hardware

    components and software used for this. . In this project our aim is to:

    Working of mobile phone gps receiver

    Hardware components of gps

    Software used for gps receiver

    Algorithms used for software

  • 8/4/2019 Speech Recog Report - For Merge

    4/78

    4

    3. Problem definitionSoftware GPS receivers can provide full access to base

    Band signal processing inside the receiver channels. Thus,

    It has become the key component when investigating and

    Developing advanced GPS signal processing techniques.

    In this presentation, a pure software gps receiver, developed in the

    plan group of the university of Calgary , It consists of receivers thatdecode the signals from the satellites.

    The receiver performs following tasks:

    Selecting one or more satellites

    Acquiring GPS signals

    Measuring and tracking

    Recovering navigation data

  • 8/4/2019 Speech Recog Report - For Merge

    5/78

    5

    4.Working of gps For those who are unfamiliar with the term, GPS stands for Global

    Positioning System, and is a way of locating a receiver in three dimensionalspace anywhere on the Earth, and even in orbit about it.

    GPS is arguably one of the most important inventions of our time, and has

    so many different applications that many technologies and ways of working

    are continually being improved in order to make the most of it.

    To understand exactly why it is so useful and important, we should first

    look at how GPS works. More importantly, looking at what technological

    achievements have driven the development of this fascinating positioning

    system.

    4.1Signals

    In order for GPS to work, a network of satellites was placed into orbit

    around planet Earth, each broadcasting a specific signal, much like a

    normal radio signal. This signal can be received by a low cost, lowtechnology aerial, even though the signal is very weak.

    Rather than carrying an actual radio or television program, the signals that

    are broadcast by the satellites carry data that is passed from the aerial,

    decoded and used by to the GPS software.

  • 8/4/2019 Speech Recog Report - For Merge

    6/78

    6

    The information is specific enough that the GPS software can identify the

    satellite, its location in s pace, and calculate the time that the signal took to

    travel from the satellite to the GPS receiver.

    Using different signals from different satellites, the GPS software is able to

    calculate the position of the receiver. The principle is very similar to that

    which is used in orienteering if you can identify three places on your map,

    take a bearing to where they are, and draw three lines on the map, then you

    will find out where you are on the map.

    The lines will intersect, and, depending on the accuracy of the bearings, the

    triangle that they form where they intersect will approximate your position,

    within a margin of error.

    GPS software performs a similar kind of exercise, using the known

    positions of the satellites in space, and measuring the time that the signalhas taken to travel from the satellite to Earth.

    The result of the trilateration (the term used when distances are used

    instead of bearings) of at least three satellites, assuming that the clocks are

    all synchronized enables the software to calculate, within a margin of error,

    where the device is located in terms of its latitude (East-West) and

    longitude (North-South) and distance from the center of the Earth.

  • 8/4/2019 Speech Recog Report - For Merge

    7/78

    7

    4.2Timing & Correction

    In a perfect world, the accuracy should be absolute, but there are many

    different factors which prevent this. Principally, it is impossible to ensure

    that the clocks are all synchronized.

    Since the satellites each contain atomic clocks which are extremely

    accurate, and certainly accurate with respect to each other, we can assumethat most of the problem lies with the clock inside the GPS unit itself.

    Keeping the cost of the technology down to a minimum is a key part of the

    success of any consumer device, and it is simply not possible to fit each

    GPS unit with an atomic clock costing tens of thousands of dollars. Luckily,

    in creating the system, the designers designed GPS to work whether the

    receivers clock is accurate or not.

    There are a few solutions. However the solution that was chosen uses a

    fourth satellite to provide a cross check in the trilateration process. Since

    trilateration from three signals should pinpoint the location exactly, adding

    a fourth will move that location; that is, it will not intersect with the

    calculated location.

    This indicates to the GPS software that there is a discrepancy, and so it

    performs an additional calculation to find a value that it can use to adjust all

    the signals so that the four lines intersect.

  • 8/4/2019 Speech Recog Report - For Merge

    8/78

    8

    Usually, this is as simple as subtracting a second (for example) from each

    of the calculated travel times of the signals. Thus, the GPS software can

    also update its own internal clock; and means that not only do we have an

    accurate positioning device, but also an atomic clock in the palm of our

    hands.

    4.3 Mapping

    Knowing where the device is in space is one thing, but it is fairly useless

    information without something to compare it with. Thus, the mapping part

    of any GPS software is very important; it is how GPS works our possible

    routes, and allows the user to plan trips in advance.

    In fact, it is often the mapping data which elevates the price of the GPS

    solution; it must be accurate and updated reasonably frequently. There are,

    however, several kinds of map, and each is intended for different users,

    with different needs.

    Road users, for example, require that their mapping data contains accurate

    information about the road network in the region that they will be traveling

    in, but will not require detailed information about the lie of the land they

    do not really worry about the height of hills and so forth.

    On the other hand, hiking GPS users might wish to have a detailed map of

    the terrain, rivers, hills and so forth, and perhaps tracks and trails, but not

    roads. They might also like to adorn their map with specific icons of things

  • 8/4/2019 Speech Recog Report - For Merge

    9/78

    9

    that they find along the way and that they wish to keep a record of not to

    mention waypoints; locations to make for on their general route.

    Finally, marine users need very specific information relating to the sea bed,navigable channels, and other pieces of maritime data that enables them to

    navigate safely. Of course, the sea itself is reasonably featureless, but

    underneath quite some detail is needed to be sure that the boat will not

    become grounded.

    Fishermen also use marine GPS to locate themselves and track themovement of shoals of fish both in real time, and to predict where they will

    be the next day. The advent of GPS fixing has also meant that co-operative

    fishing has become much easier, where there are several boats all relaying

    their locations to each other while they locate the best fishing waters.

    Special kinds of marine GPS, known as fishfinders, also combine severalfunctions in one to help fishermen. A fishfinder comprises GPS and also

    sonar, along with advanced tracking functions and storage for various kinds

    of fishing and maritime information.

    5.Requirements of gps

    5.1Hardware components

    Antenna

    RF Board

    RF Front End

  • 8/4/2019 Speech Recog Report - For Merge

    10/78

    10

    RF/IF down-conversion board (with FPGA)

    DSP Board

    DSP

    5.2Software components Firmware

    RF Board FPGA

    DSP Board FPGA

    SW

    Signal Processing SW

    Navigation SW

    Hardware components:

  • 8/4/2019 Speech Recog Report - For Merge

    11/78

    11

    2: Architecture of Signal Tap

    AntennaThe GPS antenna combines a planar antenna and a frequency converter,

    which translates the high-frequency phase-modulated spread spectrum

    signal of the GPS system to an intermediate frequency. This way a

    standard coaxial cable (e.g. RG58) can be used for the connection with the

    GPS clock and a distance of up to 300 meters (with RG58) or even 700

    meters (with a low-loss cable type like RG213) between receiver and

    antenna is possible without additional amplifier.

    Ambient temperature: -40 ... 65C Warranty: Three-Year Warranty RoHS-

    Status of the product: This product is fully RoHS compliant WEEE status of

    the product: This product is handled as a B2B category product. In order to

  • 8/4/2019 Speech Recog Report - For Merge

    12/78

    12

    secure a WEEE compliant waste disposal it has to be returned to the

    manufacturer. Any transportation expenses for returning this product (at

    2 RF BOARD

    RF board stands for Radio Frequency Printed Circuit Boards. The

    frequency for RF board is normally between 300MHz ~ 3GHz, or much

    bigger, so normally FR4 board cannot meet the requirements, so we need to

    use special material to achieve the high frequency and we named this kind

    of boards as RF boards. RF board is excellent in high frequency

    performance due to its low dielectric tolerance and loss of material.

    RF board is ideal for applications with higher operating frequency

    requirements. Right now, we normally use following material:The fabricate

    process is similar like FR4, but the copper plating is more complex than

    FR4, because material characteristics, its much harder to metalize the

    through hole (copper plating), and other process is complex than FR4, so

    need unique handling method and experienced workers.e from the computerfans, squeaking chairs, or heavy breathing. e.g., creative sound cards, intel

    sound cards, acer sound card, philips sound cards.

    3 RF FRONT:

    In a radio receiver circuit, the RF front end is a generic term for all the

    circuitry between the antenna and the first intermediate frequency (IF)

    http://www.bestpcbs.com/products/FR4-pcb.htmhttp://www.bestpcbs.com/products/FR4-pcb.htmhttp://www.bestpcbs.com/products/FR4-pcb.htmhttp://en.wikipedia.org/wiki/Radio_receiverhttp://en.wikipedia.org/wiki/Radio_receiverhttp://en.wikipedia.org/wiki/Electrical_circuithttp://en.wikipedia.org/wiki/Electrical_circuithttp://en.wikipedia.org/wiki/Antenna_%28radio%29http://en.wikipedia.org/wiki/Antenna_%28radio%29http://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Antenna_%28radio%29http://en.wikipedia.org/wiki/Electrical_circuithttp://en.wikipedia.org/wiki/Radio_receiverhttp://www.bestpcbs.com/products/FR4-pcb.htm
  • 8/4/2019 Speech Recog Report - For Merge

    13/78

    13

    stage. It consists of all the components in the receiver that process the

    signal at the original incoming radio frequency (RF), before it is converted

    to a lower intermediate frequency (IF). In microwave and satellite receivers

    it is often called the low-noise block (LNB) or low-noise downconverter

    (LND) and is often located at the antenna, so that the signal from the

    antenna can be transferred to the rest of the receiver at the more easily

    handled intermediate frequency.

    For most super-heterodyne architectures, the RF front end consists of:

    An impedance matching circuit to match the input impedance of the

    receiver with the antenna, so the maximum power is transferred

    from the antenna;

    A 'gentle' band-pass filter (BPF) to reduce input noise and image

    frequency response; An RF amplifier , often called the low-noise amplifier (LNA). Its

    primary responsibility is to increase the sensitivity of the receiver by

    amplifying weak signals without contaminating them with noise, so

    they are above the noise level in succeeding stages. It must have a

    very low noise figure (NF).

    The mixer , which mixes the incoming signal with the signal from a

    local oscillator (LO) to convert the signal to the intermediate

    frequency (IF).

    RF/IF DOWN CONVERSION:

    http://en.wikipedia.org/wiki/Radio_frequencyhttp://en.wikipedia.org/wiki/Radio_frequencyhttp://en.wikipedia.org/wiki/Microwavehttp://en.wikipedia.org/wiki/Microwavehttp://en.wikipedia.org/wiki/Microwavehttp://en.wikipedia.org/wiki/Satellite_receiverhttp://en.wikipedia.org/wiki/Satellite_receiverhttp://en.wikipedia.org/wiki/Satellite_receiverhttp://en.wikipedia.org/wiki/Low-noise_block_converterhttp://en.wikipedia.org/wiki/Low-noise_block_converterhttp://en.wikipedia.org/wiki/Superheterodyne_receiverhttp://en.wikipedia.org/wiki/Superheterodyne_receiverhttp://en.wikipedia.org/wiki/Superheterodyne_receiverhttp://en.wikipedia.org/wiki/Impedance_matchinghttp://en.wikipedia.org/wiki/Impedance_matchinghttp://en.wikipedia.org/wiki/Impedance_matchinghttp://en.wikipedia.org/wiki/Band-pass_filterhttp://en.wikipedia.org/wiki/Band-pass_filterhttp://en.wikipedia.org/wiki/Image_frequencyhttp://en.wikipedia.org/wiki/Image_frequencyhttp://en.wikipedia.org/wiki/Image_frequencyhttp://en.wikipedia.org/wiki/Amplifierhttp://en.wikipedia.org/wiki/Amplifierhttp://en.wikipedia.org/wiki/Low-noise_amplifierhttp://en.wikipedia.org/wiki/Low-noise_amplifierhttp://en.wikipedia.org/wiki/Noise_figurehttp://en.wikipedia.org/wiki/Noise_figurehttp://en.wikipedia.org/wiki/Noise_figurehttp://en.wikipedia.org/wiki/Frequency_mixerhttp://en.wikipedia.org/wiki/Frequency_mixerhttp://en.wikipedia.org/wiki/Local_oscillatorhttp://en.wikipedia.org/wiki/Local_oscillatorhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Intermediate_frequencyhttp://en.wikipedia.org/wiki/Local_oscillatorhttp://en.wikipedia.org/wiki/Frequency_mixerhttp://en.wikipedia.org/wiki/Noise_figurehttp://en.wikipedia.org/wiki/Low-noise_amplifierhttp://en.wikipedia.org/wiki/Amplifierhttp://en.wikipedia.org/wiki/Image_frequencyhttp://en.wikipedia.org/wiki/Image_frequencyhttp://en.wikipedia.org/wiki/Band-pass_filterhttp://en.wikipedia.org/wiki/Impedance_matchinghttp://en.wikipedia.org/wiki/Superheterodyne_receiverhttp://en.wikipedia.org/wiki/Low-noise_block_converterhttp://en.wikipedia.org/wiki/Satellite_receiverhttp://en.wikipedia.org/wiki/Microwavehttp://en.wikipedia.org/wiki/Radio_frequency
  • 8/4/2019 Speech Recog Report - For Merge

    14/78

    14

    The LBC-4000 L-Band IF to 70 MHz IF (140 MHz optional) indoor

    converter is a 1RU 19-inch chassis with

    two front panel accessible up converter or down converter modules.

    It contains two diode OR -ed internal

    power supplies, for increased reliability and microprocessor-based

    Monitor & Control (M&C) functions.

    The LBC-4000 up converter module translates a 70 MHz IF input

    signal (140 MHz optional) up to a userselected

    frequency at L-Band (950 to 2000 MHz). The L-Band output candrive the input of the Comtech EF

    Data MBT-4000 block up converter or other RF equipment with an L-

    Band input.

    The LBC-4000 down converter module translates an L-Band (950 to

    2000 MHz) IF input signal down to a

    user selected frequency in the 70 MHz (140 MHz optional) IF band.

    The LBC-4000 can be locked to an

    internal reference or an external 5 or 10 MHz reference signal. The

    LBC-4000 is an excellent choice for

    interfacing legacy 70 or 140 MHz equipment to quad-band or tri-

    band block converters.

    DSP BOARD:

    DSP boards or digital signal processor computer boards are central to the

    implementation of high-performance industrial systems. They collect and

    process digital data from many sources, and distribute the results to other

  • 8/4/2019 Speech Recog Report - For Merge

    15/78

    15

    elements of the system. There are three main sources of data in a real

    system: signals (in and out from the DSP processor), messages to

    communicate with system controllers, and messages to communicate with

    other DSP boards. Important features of DSP boards include a fast

    processor and good communication channels as DSP boards need to collect

    and distribute data from/to many different sources.

    Computer backplane or bus choices for DSP boards include PCI , ISA or

    EISA, PCMCIA, PC/104, Mac PCI, SUN Sbus, PMC bus , PXI bus,

    Multibus, STD bus, VME bus, VXI or MXI bus, and DT-connect I and II

    interface. PCI is a local bus system designed for high-end computer

    systems. ISA is a standard for I/O buses that was set back in 1984 when

    IBM was the standard. PCMCIA devices (PC Cards) are credit-card-sized

    peripherals predominantly used in laptop computers. PC/104 gets its name

    from the desktop personal computers designed by IBM (PCs), and from thenumber of pins used to connect the cards together (104). Mac PCI is a local

    bus standard developed by the Intel Corporation. Designed by Sun in 1989,

    the SBus board was the standard I/O inter-connect for Sun computers,

    which typically run under the Solaris or SunOS flavor of the UNIX

    operating system. The PMC Bus is actually a form factor, not a bus -- it is

    electrically the same as the PCI Bus, but the shape of the card and the bus

    connectors are different. PXI is a superset of CompactPCI and adds timing

    and triggering functions, imposes requirements for documenting

    environmental tests, and establishes a standard Windows-based software

    framework. STD bus is often referred to as the "Blue Collar Bus" because

    of its rugged design and small size, the STD Bus was originally designed

    http://www.globalspec.com/datasheets/76/areaspec/bus_pcihttp://www.globalspec.com/datasheets/76/areaspec/bus_pcihttp://www.globalspec.com/datasheets/76/areaspec/bus_pmchttp://www.globalspec.com/datasheets/76/areaspec/bus_pmchttp://www.globalspec.com/datasheets/76/areaspec/bus_pmchttp://www.globalspec.com/datasheets/76/areaspec/bus_pci
  • 8/4/2019 Speech Recog Report - For Merge

    16/78

    16

    for factory and industrial environments. It uses 16-bit architecture. VME

    bus is a 32-bit bus used in industrial, commercial and military

    applications. Motorola developed the VME standard, with others, in the

    late 1970s. DT-connect I and II is Data Translation's DT-Connect

    Interface.

    Important processor or DSP performance specifications to consider for DSP

    boards include number of processors, clock speed, floating point

    performance, integer performance, operations, maximum addressable

    memory, and operating temperature. General features and options to

    consider when looking for DSP boards include real-time clock, interrupt

    controller, memory management unit, dual port memory, and direct

    memory access. Communications options include serial I/O ports, parallel

    I/O ports, on board A/D converter, and on board D/A converter. Some DSP

    boards can accept daughter boards and some DSP boards are daughterboards. An important environmental parameter to consider when searching

    for DSP boards is the operating temperature.

    DSP:-

    Digital signal processing algorithms typically require a large number of

    mathematical operations to be performed quickly and repetitively on a set

    of data. Signals (perhaps from audio or video sensors) are constantly

    converted from analog to digital, manipulated digitally, and then converted

    again to analog form, as diagrammed below. Many DSP applications have

    constraints on latency ; that is, for the system to work, the DSP operation

    http://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Latency_%28engineering%29http://en.wikipedia.org/wiki/Latency_%28engineering%29http://en.wikipedia.org/wiki/Latency_%28engineering%29http://en.wikipedia.org/wiki/Algorithm
  • 8/4/2019 Speech Recog Report - For Merge

    17/78

    17

    must be completed within some fixed time, and deferred (or batch)

    processing is not viableA simple digital processing system

    Most general-purpose microprocessors and operating systems can executeDSP algorithms successfully, but are not suitable for use in portable devices

    such as mobile phones and PDAs because of power supply and space

    constraints. A specialized digital signal processor, however, will tend to

    provide a lower-cost solution, with better performance, lower latency, and

    no requirements for specialized cooling or large batteries.

    The architecture of a digital signal processor is optimized specifically for

    digital signal processing. Most also support some of the features as an

    applications processor or microcontroller, since signal processing is rarely

    the only task of a system. Some useful features for optimizing DSP

    algorithms are outlined below.

    SOFTWARE COMPONENTS:-

    FIRMWARE:

    Firmware is software that is embedded in hardware. You can update your

    firmware in most GPS receivers. Firmware is the software that controls

    how hardware works and responds to inputs. Its called firmware instead of

    software because users generally arent supposed to play around with it. But

    youre not just any old user, are you? Almost all electronic hardware

    contains some form of firmware. A television remote control containsfirmware that controls what signals are sent via IR depending on what

    http://en.wikipedia.org/wiki/Firmwarehttp://en.wikipedia.org/wiki/Firmwarehttp://en.wikipedia.org/wiki/Firmware
  • 8/4/2019 Speech Recog Report - For Merge

    18/78

    18

    button is pressed. A cell phone contains a lot of firmware controlling cell

    access, phone books, security, and much, much more.

    A GPS contains a lot of firmware controlling many of the key functions of

    the device (as shown in Figure 6-1):

    Reception of satellite data

    Decoding of positional information

    Processing of data

    Conversion of data into different formats

    Interpretation and display of information

    External communication with devices

    Storing and managing route/waypoint data

    RFPGA:- The FPGA (Field-Programmable Gate Array)

    implementation of an adaptive filter for narrow band

    interference excision in Global Positioning Systems is

    described. The algorithm implemented is a delayed LMS

    (Least Mean Squares) adaptive algorithm improved by

    incorporating a leakage factor, rounding and constant

    resetting of the filter weights. This was necessary as the

    original adaptive algorithm had stability problems : the

    filter weights did not remain fixed, and tended to drift

    until they overflowed, causing the filter response to

    degrade. Each model was first tested in Simulink,

    implemented in VHDL (Verilog Hardware Description

    Language) and then downloaded to an FPGA board forfinal testing. Experimental measurements of anti-jam

  • 8/4/2019 Speech Recog Report - For Merge

    19/78

    19

    margins were obtained

    Single channel adaptive filtering techniques have been

    shown to be an effective technique for mitigating

    multiple narrowband interferences to GPS systems

    (Robert, 1999, Landry et al., 1997). Since they can be

    seamlessly inserted between the existing GPS antenna

    and receiver.

    they offer a cost effective solution that involves minimum

    system disruption. However to become a fully practicalsolution the size and power demands of their hardware

    implementation should be minimised. FPGAs (Field-

    Programmable Gate Arrays) offer the potential for

    achieving the goals of small size, weight and power

    consumption and in this paper the implementation of an

    adaptive filter using an FPGA device is described.

    In Section 2 an experimental system, termed mini-

    GISMO, is described and an overview of the system

    architecture is presented. The use of interpolation and

    decimation filters within the FPGA is also described.

    The main adaptive algorithm implemented is the delayed

    LMS (Least Mean Squares) adaptive algorithm (Haykin,

    2002). As discussed in Section 3 this algorithm is well

    suited to FPGA implementations. However, particularly

    in the presence of strong interferences, the original

    adaptive algorithm had stability problems (Sethares et al.,1986), as on convergence, the filter weights did not

  • 8/4/2019 Speech Recog Report - For Merge

    20/78

    20

    remain fixed, and tended to drift until they overflowed,

    causing the filter response to degrade. In Section 4 it is

    shown that incorporating a leakage term (Nascimento et

    al.,1999) and rounding instead of truncating resulted in

    the weights remaining near the optimal values. However,

    this solution introduced memory effects, which produced

    a second null when the interference frequency was

    changed. Resetting the weights every second removed

    this problem and appeared to have the least stabilityeffects, as a short pulse in the output every second didnt

    cause any undesirable results in this algorithm. Also, the

    bit allocations were optimised to reduce the quantisation

    error. By reducing the quantisation noise power a smaller

    leakage factor is required to stabilise the adaptive

    algorithm resulting in a slower drift of the weight towards

    DIGITAL SIGNAL :-

    Digital signal processing has traditionally been done using enhanced

    microprocessors. While the high volume of generic product provides a low cost

    solution, the performance falls seriously short for many applications. Until recently,

    the only alternatives were to develop custom hardware (typically board level or

    ASIC designs), buy expensive fixed function processors (eg. an FFT chip), or use

    an array of microprocessor.

    Signal processing:

    The antenna preamplifier of a GPS receiver generally converts the incoming signal (see Figure 1below) to a signal of a lower frequency. This intermediate frequency is obtained by mixing the

  • 8/4/2019 Speech Recog Report - For Merge

    21/78

    21

    incoming signal with a pure sinusoidal signal generated by the local oscillator (the quartz "clock").

    The frequency of this beat frequency is the difference between the original (doppler-shifted) received

    carrier frequency and the local oscillator. The intermediate or beat frequency is then processed by

    the signal tracking e

    NEVIGATIONAL SIGNAL PROCEESING:

    Digital signal processing is the processing of digitised discrete time

    sampled signals. Processing is done by general-purpose computers or by

    digital circuits such as ASICs , field-programmable gate arrays or

    specialized digital signal processors (DSP chips). Typical arithmetical

    operations include fixed-point and floating-point , real-valued and complex-

    valued, multiplication and addition. Other typical operations supported by

    the hardware are circular buffers and look-up tables . Examples of

    algorithms are the Fast Fourier transform (FFT), finite impulse response

    (FIR) filter, Infinite impulse response (IIR) filter, and adaptive filters such

    as the Wiener and Kalman filters .

    http://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Computerhttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Field-programmable_gate_arrayhttp://en.wikipedia.org/wiki/Field-programmable_gate_arrayhttp://en.wikipedia.org/wiki/Digital_signal_processorhttp://en.wikipedia.org/wiki/Digital_signal_processorhttp://en.wikipedia.org/wiki/Fixed-point_arithmetichttp://en.wikipedia.org/wiki/Fixed-point_arithmetichttp://en.wikipedia.org/wiki/Fixed-point_arithmetichttp://en.wikipedia.org/wiki/Floating-pointhttp://en.wikipedia.org/wiki/Floating-pointhttp://en.wikipedia.org/wiki/Floating-pointhttp://en.wikipedia.org/wiki/Circular_bufferhttp://en.wikipedia.org/wiki/Circular_bufferhttp://en.wikipedia.org/wiki/Look-up_tablehttp://en.wikipedia.org/wiki/Look-up_tablehttp://en.wikipedia.org/wiki/Fast_Fourier_transformhttp://en.wikipedia.org/wiki/Fast_Fourier_transformhttp://en.wikipedia.org/wiki/Finite_impulse_responsehttp://en.wikipedia.org/wiki/Finite_impulse_responsehttp://en.wikipedia.org/wiki/Infinite_impulse_responsehttp://en.wikipedia.org/wiki/Infinite_impulse_responsehttp://en.wikipedia.org/wiki/Adaptive_filterhttp://en.wikipedia.org/wiki/Adaptive_filterhttp://en.wikipedia.org/wiki/Wiener_filterhttp://en.wikipedia.org/wiki/Wiener_filterhttp://en.wikipedia.org/wiki/Wiener_filterhttp://en.wikipedia.org/wiki/Kalman_filterhttp://en.wikipedia.org/wiki/Kalman_filterhttp://en.wikipedia.org/wiki/Kalman_filterhttp://en.wikipedia.org/wiki/Kalman_filterhttp://en.wikipedia.org/wiki/Wiener_filterhttp://en.wikipedia.org/wiki/Adaptive_filterhttp://en.wikipedia.org/wiki/Infinite_impulse_responsehttp://en.wikipedia.org/wiki/Finite_impulse_responsehttp://en.wikipedia.org/wiki/Fast_Fourier_transformhttp://en.wikipedia.org/wiki/Look-up_tablehttp://en.wikipedia.org/wiki/Circular_bufferhttp://en.wikipedia.org/wiki/Floating-pointhttp://en.wikipedia.org/wiki/Fixed-point_arithmetichttp://en.wikipedia.org/wiki/Digital_signal_processorhttp://en.wikipedia.org/wiki/Field-programmable_gate_arrayhttp://en.wikipedia.org/wiki/Application-specific_integrated_circuithttp://en.wikipedia.org/wiki/Computer
  • 8/4/2019 Speech Recog Report - For Merge

    22/78

    22

    Statistical signal processing analyzing and extracting information from

    signals and noise based on their stochastic properties

    Audio signal processing for electrical signals representing sound,such as speech or music

    Speech signal processing for processing and interpreting spoken

    words

    Image processing in digital cameras, computers, and various

    imaging systems Video processing for interpreting moving pictures

    Array processing for processing signals from arrays of sensors

    Time-frequency signal processing for processing non-stationary

    signals [3]

    Filtering used in many fields to process signals

    Software based receiver:

    Global Navigation Satellite System has become a necessity tool for navigation and positioning

    in both civilian and military field and applications. Global Positioning System (GPS) is a

    satellite-based navigation system. It is based on the computation of range from the receiver to

    multiple satellites by multiplying the time delay that a GPS signal needs to travel from the

    satellites to the receiver by velocity of light. GPS has already been used widely both in civilian

    and military community for positioning, navigation, timing and other position related

    applications. The system has already proved its reliability, availability and good accuracy for

    many applications. Due to this nature, in future, other countries like Europe are going to launch

    new satellite-based navigation system called Galileo. There is also a proposal to launch Quasi

    Zenith Satellite System for navigation in Japan.

    It is necessary to simulate and analyze new signal structures for the development of new

    satellite-based navigation systems. In the research community, many researchers come outwith

    http://en.wikipedia.org/wiki/Statistical_signal_processinghttp://en.wikipedia.org/wiki/Statistical_signal_processinghttp://en.wikipedia.org/wiki/Audio_signal_processinghttp://en.wikipedia.org/wiki/Audio_signal_processinghttp://en.wikipedia.org/wiki/Speech_signal_processinghttp://en.wikipedia.org/wiki/Speech_signal_processinghttp://en.wikipedia.org/wiki/Image_processinghttp://en.wikipedia.org/wiki/Image_processinghttp://en.wikipedia.org/wiki/Video_processinghttp://en.wikipedia.org/wiki/Video_processinghttp://en.wikipedia.org/wiki/Array_processinghttp://en.wikipedia.org/wiki/Array_processinghttp://en.wikipedia.org/wiki/Time-frequency_analysishttp://en.wikipedia.org/wiki/Time-frequency_analysishttp://en.wikipedia.org/wiki/Signal_processing#cite_note-2http://en.wikipedia.org/wiki/Signal_processing#cite_note-2http://en.wikipedia.org/wiki/Filter_%28signal_processing%29http://en.wikipedia.org/wiki/Filter_%28signal_processing%29http://en.wikipedia.org/wiki/Filter_%28signal_processing%29http://en.wikipedia.org/wiki/Signal_processing#cite_note-2http://en.wikipedia.org/wiki/Time-frequency_analysishttp://en.wikipedia.org/wiki/Array_processinghttp://en.wikipedia.org/wiki/Video_processinghttp://en.wikipedia.org/wiki/Image_processinghttp://en.wikipedia.org/wiki/Speech_signal_processinghttp://en.wikipedia.org/wiki/Audio_signal_processinghttp://en.wikipedia.org/wiki/Statistical_signal_processing
  • 8/4/2019 Speech Recog Report - For Merge

    23/78

    23

    new ideas and algorithms for better accuracy of GPS by mitigating or minimizing various types

    of errors and effects like multipath. However, it is quite difficult to implement the user

    developed algorithms in the current hardware-based GPS receivers. The hardware-based GPS

    receivers contain ASICs that provide the least user flexibility. Thus, it is necessary to have

    Software-based GPS receivers, at least in the research community for easy and quickimplementation, simulation and analysis of algorithms, parameters and threshold values. Since,

    the CPU processing power is increasing with reduced cost, it is now possible to build real-time

    software-based GPS receivers at least for static or low dynamic environments. As predicted by

    Moors Law, the CPU power is increasing and we hope that this trend will continue in future as

    well and hence, it will be possible to develop real-time all environment software-based GPS

    receivers. In this paper, we briefly introduce the architecture of a SGR, signal processing

    technique and give some examples of simulation using SGR.

    2 SOFTWARE-BASED GPS RECEIVER ARCHITECTURE

    The architecture of a conventional GPS receiver is shown in Figure 1. It consists of RF front-

    end and signal processor that are all built upon IC chips. The outputs of the signal processor

    are either displayed directly on the receiver display unit or fed to a PC for further processing or

    integration with other devices. Since, the signal processing is all done inside the hardware

    chips,users have limited access to change the parameters or install new algorithms. Figure 2 shows

    architecture of a software-based GPS receiver (SGR). It consists of a RF front-end device,

    which is still a hardware component. The rest of the signal processing is done using high level

    programming language like C/C++, Matlab etc. If we compare Figure 1 and Figure 2, the only

    difference we see is the replacement of hardware components by software tools for signal

    processing. We still need RF front-end since the present capacity of CPU is still not able to

    process the signal directly from the antenna at 1.5GHz. Figure 3 shows the merits and demerits

    of using hardware-based and software-based receiver. A hardware-based receiver is fastest in

    signal processing however, it has the least level of flexibility, where as a software-based

    receiver has the highest level of flexibility but is the slowest in processing speed. There are

    products using FPGA-based receivers which is the compromise between the two.

    Processing

  • 8/4/2019 Speech Recog Report - For Merge

    24/78

    24

    GPS SIGNAL PROCESSING:

  • 8/4/2019 Speech Recog Report - For Merge

    25/78

    25

    L1 band GPS signal is transmitted at 1.5 Ghz and since the receiver can not process the signal

    directly at this frequency, the RF front-end device down converts from 1.5Ghz to a much lower

    frequency of about 4Mhz. This frequency is called Intermediate Frequency (IF). During this

    conversion process, the signal is also digitized (A/D conversion) at 1bit, 2bit or higher rate and

    sampled at some frequency, e.g. 16Mhz. We use the down-converted signal for further

    processing. The first task of signal processing is to identify the visible satellites by finding the

    satellite code phase and Doppler frequency. The code phase provides the beginning of C/A

    code.

    Since, the satellites are moving all the time (and probably the receiver may also move) we

    always have some Doppler frequency. The rough estimation process of code phase and

    Doppler frequency is called acquisition. Basically, for acquisition, we generate C/A code for the

    satellite and modulate with the carrier wave. This receiver generated signal is then correlatedwith incoming signal and the correlation value is evaluated to make decision whether a satellite

    visible. If we think that the satellite is visible, then the code phase value and Doppler frequency

    is noted. Once, we complete acquisition successfully, we know the satellites that are visible at

    that time.

    In the next step, we track the visible satellites continuously for fine tuning of the code phase and

    Doppler frequency. This process is called tracking. The tracking process removes the C/A code

    and carrier wave from the GPS signal and hence the remaining signal represents navigation

    data and some noise. Thus, from navigation output, we can extract navigation data parameters

    which are necessary to compute pseudorange from the receiver to satellite. Please refer, [2] for

    details on GPS signal processing. Figure 4 (a) shows raw GPS data collected from antenna and

    downconverted

    to IF. This data just looks like noise and no information can be known unless we

    perform acquisition and tracking on the data. This is due to the fact that the GPS signal level is

    below the noise level or the signal is weaker than the noise. Figure 4 (b) shows the result of

    acquisition from raw data shown in Figure 4 (a). The acquisition output shows the code phase(beginning point of C/A code) and Doppler frequency. Figure 5 shows tracking results. The

  • 8/4/2019 Speech Recog Report - For Merge

    26/78

    26

    tracking result extracts navigation data bits as shown in Figure 6, which are simply the

    sequence

  • 8/4/2019 Speech Recog Report - For Merge

    27/78

    27

    SGR AS RESEARCH AND SIMULATION TOOL

    We mentioned earlier that SGR has much flexibility compared to

    conventional receiver. We will discuss and give some examples how

    these flexibilities of SGR are used to extract information that are

    otherwise not possible in conventional GPS receiver. Figure 7 shows

  • 8/4/2019 Speech Recog Report - For Merge

    28/78

    28

    SGR AS RESEARCH AND SIMULATION TOOL

    some of the fundamental parameters of signal processing in SGR. IF

    frequency and sampling frequency are fixed for a particular front-enddevice. By changing these two values, we can use

    the same software tool for different types of frontend device that

    acquire GPS signal from the

    antenna. Below we will discuss some of the flexibilities point by

    point.

  • 8/4/2019 Speech Recog Report - For Merge

    29/78

    29

    Weak Signal Processing:

    The Doppler frequency search step, code period acquisition integration time,noise bandwidth code period tracking integration time depends on the signal

    quality. If the signal level is normal, we can use 1000Hz Doppler frequency

    step and 1ms code period integration time for acquisition.

    However, if the signal is weak, and then we need Figure 7: Basic parameters

    that can be changed by a user in SGR for various types of signal processing

    and simulation to reduce the Doppler frequency search step and increase the

    code period integration time in acquisition. For example, if we integrate raw

    data for 3ms for acquisition then we need to reduce the Doppler frequency

    search step to 300Hz. This will increase processing speed but help us in

    detecting weak signals. Also, we need to increase the integration time in

    tracking loop. This type of signal processing by changing the parameter values

    is not possible in conventional GPS receiver. Figure 8 shows an example for

    increase in integration time from 1ms to 3ms. When the integration time is

    1ms, the correlation peak is not clear enough to make a decision for satellite

    visibility. But, when the integration time is increased to 3ms, we can see a very

  • 8/4/2019 Speech Recog Report - For Merge

    30/78

    30

    clear correlation peak and we can make a decision that a particular satellite is

    now detected. Figure 8: (a) Signal acquisition using 1ms integration time. The

    result is not so clear with multiple peaks. (b) Signal acquisition using 3ms

    integration time with the same data as in (a). Now, the correlation peak is quiteclear and a decision can be made regarding visibility of satellite.

    Multipath Mitigation Technique

    In spite of continuing improvements in GPS receivers and antenna technology,

    multipath signal has remained a major source of error in GPS positioning. Inorder to minimize the error due to multipath, we need to understand the

    multipath behaviour and corresponding signal characteristics. In order to

    understand the effect of multipath we can analyze the signal by using

    various types of correlators (narrow, wide etc) by defining chip delay (listed in

    Figure 7) between early and late chips. We can compute the correlation peak

    for every code period. A correlation peak will appear as a perfect triangle had

  • 8/4/2019 Speech Recog Report - For Merge

    31/78

    31

    there been no effect from multipath. Due to multipath, the two sides of the

    triangle will be neither symmetrical nor straight lines. The shape

    and amplitude of the triangle is deformed by the amount of multipath and

    some other noise. Thus by analysing the correlation peak (triangularshape) we

    can estimate the amount of multipathand hence develop a technique

    to minimize or mitigate the multipath. In this regard, we are

    conducting research using left hand and right hand circular polarized

    GPS antenna to analyze how the reflected signal (which accounts

    formultipath) affects a correlation peak. Refer [1] for details of this

    experiment. Figure 9 shows a correlation peak obtained by

    processing a raw GPS signal shown in Figure 4. Correlation peak

    computed from raw GPS signal for 0.5 chip delay. The peak shape is

    not a perfect triangular due to effect

  • 8/4/2019 Speech Recog Report - For Merge

    32/78

    32

    from multipath

    Remote Sensing using GPS Signal :

    Recently, GPS signals have been used for remote sensing purpose.

    GPS signals are transmitted at 1.2Ghz and 1.5Ghz in two different

    bands. This is similar to microwave remote sensing. GPS signals are

    transmitted with right hand circular polarization. When, this signal is

    reflected by some object the polarization may change from righthand to left hand and vice versa. Thus by observing the reflected

    signal together with two different types of antennas with right hand

    and left hand polarization, we can predict the object type that reflects

    the GPS signal. Using this technique, soil moisture and wind velocity

    has been estimated. Refer [3] for details on this research. In order to

    conduct this type of analysis, we need software-based receiver so

    that we can process the received signal with different parameter

    values using our own algorithms. The reflected signals are much

    weaker than direct signal and hence a conventional receiver can not

    be used. Also, we need to compute many intermediate values like

    shape of the correlation peak and it s amplitude rather than theposition of the GPS antenna itself. This is possible only in software-based receivers.Besides these analysis and simulation listed above,we need software-based receiver for analyzing noise andinterference (jamming), simulate new codes, limitation of navigationdata length and many other things. In current GPS signal, thenavigation data length is limited to20ms. This impose a restriction ondata integration beyond 20ms during the tracking process.

  • 8/4/2019 Speech Recog Report - For Merge

    33/78

    33

    However, for tracking very weak signal, we do need to integratelonger data period. Thus weneed to see what will happen if wechange the navigation data length from 20ms to somethingelse in our new design. On the other hand we can also have a data

    less component of the signal in one of the phases of the signal whichis now implemented in new forthcoming GPS signals.This assists the receiver in processing weak signals and hencemake the receiver capable of indoor positioning. All these can besimulated if we have software-based receivers. In SGR, wecan generate different types of signals for interference analysis. Thiswill help us how different types of signal with different level ofstrength affect GPS signal processing. For example, we can

    simulate the effect of a TV signal on GPS or we can analyze theeffect of other GNSS signals on GPS or vice versa. These are againpossible in software-based .

    part -11

  • 8/4/2019 Speech Recog Report - For Merge

    34/78

    34

    Flow chart of gps working:

  • 8/4/2019 Speech Recog Report - For Merge

    35/78

    35

    Models for gps:

  • 8/4/2019 Speech Recog Report - For Merge

    36/78

    36

    SIGNAL ANALYSIS TOOL:

  • 8/4/2019 Speech Recog Report - For Merge

    37/78

    37

    Some speech recognizers support the ability todynamically adjust to the voice of a speaker andoften the ability to store adaptation data for that voicefor future use. The speaker data may also includelists of words more often spoken by the user

    Speech Recognizer configuration:

    It holds some standard setting and functions forrecognizer.

    Lexicons :Grammar holds the pronunciation of the wordsreferenced by grammar.

    Other speech Processing Capabilities :Grammar has a capability of recognizing language,speaker identification and verification. Thesecapabilities may be associated with the recognizer.

    1.7.3 Speech Recognizer :-

    It is software which performs the tasks involved in speechrecognition. The speech recognizer software may be available as a freeproduct or may have to buy. This software varies from platform to platform.

    e.g., For windows:

    Dragon Natural Speaking, Microsoft Speech Recognition Voice Assist for window from creative labs.

    For LINUX:

    IN CUBE Pure Speech

  • 8/4/2019 Speech Recog Report - For Merge

    38/78

    38

    Myers HMM Software

    For integrated circuit and dedicated hardware:

    Speech commander Voice control system Recognition

    1.8 Applications

    Speech recognition is emerging technology in computer science. It

    has some weakness but despite of that it is used in many areas to solve

    problems. These are listed below:

    Playing back simple information: In many call centres customers

    require quick information and do not actually want to speak to

    like operator. So speech recognition is useful to provide such

    quick information.

    Call steering: By introducing speech recognition, you can allowcallers to cho ose a self -service route or alternatively say what

    they want and are directed to the correct department or

    individual.

    Defence uses: Speech recognition is also used in defence

    applications. It is used to quickly perform some action by

  • 8/4/2019 Speech Recog Report - For Merge

    39/78

    39

    responding to voice rather than pressing the buttons or other

    input methods.

    Artificial intelligence: Used in many applications of artificial

    intelligence and is most useful in robotics to interact with robots

    and machines. In fact speech recognition is a part of artificial

    intelligence.

    Hands Free computing: Speech recognition is used for handsfree computing because it can provide a user interface in which

    user interact with computer by dictation.

    Language learning: The person who wants to learn a new

    language can use speech recognition system.

    People with disabilities: People with physical disabilities can

    benefit from speech recognition system. It is especially useful

    for computer who has difficulties using their hands or paralysed

    people.

    Court reporting: For replacing the court reporter by computer.

  • 8/4/2019 Speech Recog Report - For Merge

    40/78

    40

    1.9 Feasibility Study

    Economical feasibility :

    To design a speech recognition system we require following things:-

    1. Microphone(Rs.600- 5000)

    2. Sound card(Rs.1200 - 25000)3. Computer(min. 400 MHz processor Rs.15000 or above)

    4. Good programmers.

    So it makes Rs.16800 + programmers pay hence it is feasibleeconomically.

    Social Feasibility:

    We can make computer very decent by adding the vocabulary

    that is socially feasible.

    Technical feasibility:

    The microphones available today are sufficient

    Processing speed of todays processors is more than enough

  • 8/4/2019 Speech Recog Report - For Merge

    41/78

    41

    The sound cards available can perform A/D conversion very

    efficiently

    PART -II

    System Analysis

  • 8/4/2019 Speech Recog Report - For Merge

    42/78

    42

    2.1 Components of speech recognition system:

    FIG 2.1 : components of speech recognition system

    Speech representation can be done by:

    representation,

    modelling and

    searching

    Here three models are used to recognize speech. One of the three model is

    used to match correct word is used. These models are:-

    Acoustic Model Lexical Models

    Input speech Output

  • 8/4/2019 Speech Recog Report - For Merge

    43/78

    43

    Language Models

    2.1.1 Acoustic Model:

    In this type of model we have a stored pattern of representation for each

    word. This technique uses this pattern to match with the pattern that is

    obtained after processing. This technique selects that pattern which has

    minimum acoustical difference from stored pattern. Every processed

    pattern has a probability associated with it such that it can occur in speech.The word which has maximum probability is chosen in speech. This type of

    model uses pattern matching for recognizing speech.

    2.1.2 Lexical Model:-

    Lexical means related to words or dictionary. It is a neural

    network based approach to model the lexicon of the language with a limited

    amount training data. The training data is necessarily a database of a

    language with the phone set of the language. The neural network learns

    how the phones of the language vary with different instances of context.

    The trained network is capable of recognizing the pronunciation of a word

    given its native phonetic composition.

  • 8/4/2019 Speech Recog Report - For Merge

    44/78

    44

    Example:

    Consider the following words:

    START S-T-AA-R-TD

    STARTING S-T-AA-R-DX-IX-NG

    STARTED S-T-AA-R-DX-IX-DD

    STARTUP S-T-AA-R-T-AX-PD

    START-UP S-T-AA-R-T-AX-PD

    FIG 2.2 Lexical Tree Structure of above words

  • 8/4/2019 Speech Recog Report - For Merge

    45/78

    45

  • 8/4/2019 Speech Recog Report - For Merge

    46/78

    46

    2.1.3 Language Model :-

    The language model attempts to convey the behaviour of the language. It

    aims to predict the occurrence of specific word sequences possible in the

    language. From the perspective of the recognition system, the language

    model helps narrow down the search space for a valid combination of

    words. Most Speech Recognition systems use the stochastic language

    models. SLMs use the N-gram LM where it is assumed that the probability

    of occurrence of a word is dependent only on the past N-1 words.

    Language Models help a speech recognizer figure out how likely a word

    sequence is, independent of the acoustics. A lot of candidates can be

    eliminated and it is possible to give other words higher probabilities. This

    lets the recognizer make the right guess when two different sentences sound

    the same.

    For example:

    Its fun to r ecognize speech?

    Its fun to wreck a nice beach?

    Another type of language model is Hidden Markov Model.

  • 8/4/2019 Speech Recog Report - For Merge

    47/78

    47

    2.2Flow chart of the System

    Working of the Speech Recognition System

    FIG 2.3 Flow Chart of the System

  • 8/4/2019 Speech Recog Report - For Merge

    48/78

    48

    In the matching and comparison step, we may obtain two or more than two

    units of a words, phone or utterance depending upon the approach in use.

    These matched units are stored in memory and various models are applied

    to select appropriate unit, which forms a recognized output. Depending

    upon the result of the matching and comparison unit, corresponding action

    can be performed.

  • 8/4/2019 Speech Recog Report - For Merge

    49/78

    49

    1.3 Data Flow Diagrams For Speech Recognition:

    2.3 .1 Level 0 DFD:

    FIG 2.4 : Level 0 Data Flow diagram for speech Recognition

  • 8/4/2019 Speech Recog Report - For Merge

    50/78

    50

    2.3.2 Level 1 DFD:

    FIG 2.5 : Level 1 Data Flow Diagram for Speech Recognition

  • 8/4/2019 Speech Recog Report - For Merge

    51/78

    51

    2.3.3 Level 2 DFD:

  • 8/4/2019 Speech Recog Report - For Merge

    52/78

    52

    FIG 2.6 : Level 2 Data Flow Diagram for Pattern matching

  • 8/4/2019 Speech Recog Report - For Merge

    53/78

    53

    2.4 Training data types:-

    The speech grammar can be designed in different ways and these vary on

    the basis of the size of the grammar and the accuracy by which you want

    your speech to be recognized. These models are:-

    1. Whole-word models :-

    Whole words fea tures are stored in the grammar, so while extracting

    the features of the sound signal the whole words feature are

    calculated and compared. This type of model is suitable for small

    vocabulary recognition. With whole word model high accuracy rate

    can be attained.

    2. Phone models :-

    In this small set of speech sounds that can be distinguished by the

    speakers of a particular language are used for speech grammar. This is

    suitable for large vocabulary recognition. The accuracy rate for this

    model is very low.

    3. Syllable models :-

    In this model the units larger then phone are used to do feature extraction.

    This model can be used for large language grammar with high accuracy

    rate.

  • 8/4/2019 Speech Recog Report - For Merge

    54/78

    54

    PART-III

    System Design

  • 8/4/2019 Speech Recog Report - For Merge

    55/78

    55

    3.1 Interface Design :

    This project report is a case study of existing speech recognition system. In

    this project we have taken the reference of Dragon Natural Speaking, a

    software package for speech recognition in windows. The latest version of

    this software is version 10.0. Which has following interface design:-

    Icon Design:

    FIG 3.1 : Startup Interface Design:

    FIG 3.2 : Training Interface Design:

  • 8/4/2019 Speech Recog Report - For Merge

    56/78

    56

    3.2 Using the interface of Dragon Naturally speaking :-

    When you start Dragon Naturally Speaking we have to perform followingsteps:-

    FIG 3.3 : Create a user profile for a user. The interface is as shown:

    After creating a user if you have selected training, you have to dictate

    few text on your specified microphones.

    This will help the software to recognize user.

    After this, the software prompt user to check your microphones by

    making user dictates little text.

  • 8/4/2019 Speech Recog Report - For Merge

    57/78

    57

    After this the software will create some user files and prepare the

    software for first use.

  • 8/4/2019 Speech Recog Report - For Merge

    58/78

    58

    3.3 Utilities of Dragon Naturally Speaking software:

    3.3.1 Tasks Performed by Dragon Naturally Speaking software :-

    Dragon Naturally Speaking software performs following tasks:-

    Speech to text conversion

    Have some inbuilt commands which perform some tasks

    Dictation

    3.3.2 Additional tools in Dragon Naturally Speaking software:-

    - Add a new user

    - Add a new command

    - Managing Users( creating deleting, other changes)

    - Train a User

    This software is 97% accurate. That is why it is mostly used speech

    recognition package.

    3.4 Technical Features of Dragon Natural ly Speaking:

    - Sampling rate 512 samples(16 Khz Sampling Rate)

    - 30 ms of window for frequency domain analysis.

    - It is programmed in C language and Uses Hidden Markov Model and

    viterbi search.

    - It contains following Basic files:

    -mdef.c definition of basic phones on the basis of HMM in

    form of matrix.

  • 8/4/2019 Speech Recog Report - For Merge

    59/78

    59

    -dict.c It is a pronunciation dictionary

    -lextree.c Lexical tree Search

    -hmm.h contains implementation of HMM using Viterbi

    Search.

    PART-IV

    Appendices

  • 8/4/2019 Speech Recog Report - For Merge

    60/78

    60

    Appendix A

    Hidden Markov Model (HMM):-

    HMM is a statistical model in which the system being modelled is

    assumed to be a Markov process with unobserved state. Hidden

    Markov models are especially known for their application in temporal

    pattern recognition such as speech, handwriting, gesturerecognition.so input and output of a HMM will be:

    Input: A sequence of feature vectors.

    Output: Words with highest probability being spoken.

    There are following four things in a HMM:-

    States (words, phones or syllables)

    State transition probabilities

    Symbol emission probabilities

    Observations (features of the signal)

    In HMM we will find the most probable state (words) on the basis of the

    observations (audio input).

  • 8/4/2019 Speech Recog Report - For Merge

    61/78

    61

    FIG A.1Probabilistic parameters of a hidden Markov model (example)

    x states y possible observations

    a state transition probabilities

    b output probabilities

    The diagram below shows the general architecture of an instantiated HMM.Each oval shape represents a random variable that can adopt any of a

  • 8/4/2019 Speech Recog Report - For Merge

    62/78

    62

    number of values. The random variable x(t ) is the hidden state at

    time t (with the model from the above diagram x(t ) { x1, x2, x3 } ). The

    random variable y(t ) is the observation at time t ( y(t ) { y1, y2, y3, y4 }). The

    arrows in the diagram (often called a trellis diagram) denote conditional

    dependencies.

    From the diagram, it is clear that the conditional probability distribution of

    the hidden variable x(t ) at time t , given the values of the hidden variable x at

    all times, depends only on the value of the hidden variable x(t 1). This is

    called the Markov property. Similarly, the value of the observed

    variable y(t ) only depends on the value of the hidden variable x(t ) (both at

    time t ).

    FIG A.2

    There are three main functions in an HMM

    1. Evaluation :-

    Given the observation sequence O and the model , how

    do we efficiently compute P(O| ), the probability of the observation

    sequence, given the mode:

    Enumerate all possible state sequences S of length T Sum up all probabilities of these sequences

    http://en.wikipedia.org/wiki/Trellis_(graph)http://en.wikipedia.org/wiki/Conditional_probability_distributionhttp://en.wikipedia.org/wiki/Markov_propertyhttp://en.wikipedia.org/wiki/File:Hmm_temporal_bayesian_net.svghttp://en.wikipedia.org/wiki/Markov_propertyhttp://en.wikipedia.org/wiki/Conditional_probability_distributionhttp://en.wikipedia.org/wiki/Trellis_(graph)
  • 8/4/2019 Speech Recog Report - For Merge

    63/78

    63

    Probability of path S (calculate for all paths):

    State sequence probability.

    2. Decode :-

    Finding the sequence of hidden states that most probably

    generated an observed sequence

    Given the parameters of the model and a particular output

    sequence, find the state sequence that is most likely to have

    generated that output sequence.

    This requires finding a maximum over all possible state

    sequences

    3. Learning:-

    Adjust the model parameter to maximize the joint

    probability

    First make an initial guess of the parameters (which may be

    entirely wrong)

    Refine it by assessing its worth, attempt to reduce provoked

    errors when fitted to the given data

    Feed sample speech data along with phonemes of spoken words

  • 8/4/2019 Speech Recog Report - For Merge

    64/78

    64

  • 8/4/2019 Speech Recog Report - For Merge

    65/78

    65

    Appendics B

    Digitizing the Analog signal:

    The must be in digital form so that computer can understand it. A signalmust be converted to analog signal by using following steps:

    The bandlimited signal is first sampled, converting the analog signal

    into a discrete time continuous-amplitude signal.

    The amplitude of each sample is quantised into 2 n levels, where n is

    the number of bits used to represent a sample.

    The discrete amplitude levels are represented or encoded into distinct

    binary words each of n bits.

    This process is shown in following figure:

    FIG B.1 : Block Diagram for digitizing an analog signal

  • 8/4/2019 Speech Recog Report - For Merge

    66/78

    66

    The process of converting a continuous-time continuous-signal to

    discrete- time continuous-signal is called Sampling.

    The process of converting a discrete-time continuous-signal to discrete-

    time discrete-signal is called Quantization.

    Sampling is done by multiplying the input signal with a periodic train of

    unit amplitude as shown:

    FIG B.2 : Sampling Analog Signal

    This sampling is carried out with sampling frequency 2F M , where F M is the

    maximum frequency component of input signal. This signal can be

    accurately reconstructed at receiver end.

  • 8/4/2019 Speech Recog Report - For Merge

    67/78

    67

    Quantisation of signal is done by using a step size which is taken very

    small and increased whenever signal value increases as shown:

    FIG B.3 Quantization of Signal

  • 8/4/2019 Speech Recog Report - For Merge

    68/78

    68

    Appendics C

    History

    The speech recognitions foundation was with the turning model given by

    Alan Turning (1950). Turning test was to know whether the computer can

    think or not? In this there was three participants one computer and other

    two human. Each of participants was separated from each other by a wall

    and the will talk to each other. One of the human participants was aninterrogator. And the remaining two will prove the he is human and other is

    not human. This test led many developers to do research on the speech

    recognition.

    AT&T Bell Laboratories developed a primitive device that could recognize

    speech in the 1950s.

    In the 1960s, researchers turned their focus towards a series of smaller

    goals that would aid in developing the larger speech recognition system. As

    a first step, developers created a device that would use discrete speech.

    In the 1970s, continuous speech recognition, which does not require the

    user to pause between words, began. This technology became functional

    during the 1980s and is still being developed and refined today.

    Technological advances have made speech recognition software anddevices more functional and user friendly, today speech recognition has

  • 8/4/2019 Speech Recog Report - For Merge

    69/78

    69

    accuracy more than 90 %. The error rates of various types of recognition

    are:

    FIG C.1 : errors rate of different speech recognition

  • 8/4/2019 Speech Recog Report - For Merge

    70/78

    70

    Conclusion

    In this project we discussed basic concept that are used in speech

    recognition. Speech recognition engines work in a similar manner. There

    are following things that can be concluded from the study:-

    The Knowledge of language and linguistics for that language is

    required.

    Most of the Speech Recognition packages use Hidden Markov Model

    in implementing speech recognition.

    Speech recognized can be represented in many ways. e.g., speech to

    text conversion, speech production, language learning, information

    extraction, etc.

    This project allows us to differentiate between the accuracy that can

    be achieved by Appling different models.

    We can specify the best hardware and software requirement for

    Speech Recognition. We can us efficiently use Dragon Naturally

    Speaking software with following specification:

    Intel Pentium IV processor of 1.5 GHz or above speed.

    1GB of RAM

    WINDOWS XP or above version of windows.

    Creative Microphones having ambient noise removingtechnique.

  • 8/4/2019 Speech Recog Report - For Merge

    71/78

    71

    Intel Sound Card with 16 kHz of Sample rate and signal to

    noise ratio of 100 dB.

  • 8/4/2019 Speech Recog Report - For Merge

    72/78

    72

    Bibliography

    [1]. Speech and Language Processing-2 nd edition by Jurafsky & Martin

    [2].Schaums ouTlines Discrete Mathematics 3 rd edition by Seymour

    Lipschutz and Marc Lipson

    [3].Principles of digital communication by Taub and Schilling.

    [4] http://www.faqs.org/docs/Linux-HOWTO/Speech-Recognition-

    HOWTO.html

    [5].http://en.wikipedia.org/wiki/Speech_recognition

    [6].http://cslu.cse.ogi.edu/HLTsurvey/ch1node4.html

    [7].www.ee.ic.ac.uk/hp/staff/pnaylor/notes/recog.pdf

  • 8/4/2019 Speech Recog Report - For Merge

    73/78

    73

  • 8/4/2019 Speech Recog Report - For Merge

    74/78

    74

    Index

    A

    acoustic model 22

    artificial intelligence 17

    B

    Bibliography 43

    C

    call steering 17

    Comparison 10

    court reporting 18

    D

    defence uses 17

    digitizing 40

    discourse analysis 5

    dragon natural

    speaking 32 34

    E

    economical 19

  • 8/4/2019 Speech Recog Report - For Merge

    75/78

    75

    feasibility

    F

    feature extraction 8

    Filtering 10

    Framing 8

    H

    History 43

    hidden markov

    model 36-39

    K

    knowledge based

    approach 10

    L

    language learning 18

    language model 23

    lexical model 22

    Lexicon 16

    M

    10

  • 8/4/2019 Speech Recog Report - For Merge

    76/78

    76

    matching

    Memory 14

    Microphone 13

    Morphology 5

    N

    neural network

    approach 11

    P

    pattern matching

    approach 10

    people with

    disabilities 18

    phone model 30

    phonetics 5

    Phonology 5

    play back of

    information 17

    Pragmatics 5

    pre filtering 7

    probabilistic model 11

    processor 14

    Q

  • 8/4/2019 Speech Recog Report - For Merge

    77/78

    77

    quantization 41

    S

    sampling 41

    semantics 5

    social feasibility 19

    sound card 13

    spectral features 9

    speech grammar 15

    speech recognizer 16

    syllable model 30

    Syntax 5

    T

    technical feasibility 19

    temporal features 9

    W

    whole word model 30

    windowing 8

    word detection 7

  • 8/4/2019 Speech Recog Report - For Merge

    78/78