Parallel Structure of Decoder in Automatic Speech Recognition

  • Upload
    viet-vo

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    1/24

    PARALLEL STRUCTURE OF

    DECODER IN AUTOMATIC SPEECH

    RECOGNITION SYSTEM

    STUDENT VO QUOC VIET

    SUPERVISORS DR. DANG TRONG TRINH

    DR. HOANG TRANG

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    2/24

    CONTENT

    1. Introduction

    2. Literature review

    3. Methodology and System description

    4. Design specification

    5. Implementation

    6. Test plan

    7. Discussion

    8. Conclusion and future work

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    3/24

    1. INTRODUCTION

    http://zagg-blog.s3.amazonaws.com/community/blog/wp-

    content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-

    AM.pnghttp://www.bmwblog.com/wp-

    content/uploads/Siri_VoiceControl1.png

    http://www.blogcdn.com/www.engadget.com/

    media/2012/04/lg-voice-control.jpg

    Automation control

    Smart interaction TV

    Voice control insmart phone

    Smart phone needs

    to connect to

    server via internet

    Some applications do not

    need an extremely large

    vocabulary but need short

    processing time or real-timecontrol

    Some portable

    devices need a small

    and less power

    consumption

    hardware design

    Need parallel structure of decoder

    http://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://www.bmwblog.com/wp-content/uploads/Siri_VoiceControl1.pnghttp://www.bmwblog.com/wp-content/uploads/Siri_VoiceControl1.pnghttp://www.blogcdn.com/www.engadget.com/media/2012/04/lg-voice-control.jpghttp://www.blogcdn.com/www.engadget.com/media/2012/04/lg-voice-control.jpghttp://www.blogcdn.com/www.engadget.com/media/2012/04/lg-voice-control.jpghttp://www.blogcdn.com/www.engadget.com/media/2012/04/lg-voice-control.jpghttp://www.blogcdn.com/www.engadget.com/media/2012/04/lg-voice-control.jpghttp://www.blogcdn.com/www.engadget.com/media/2012/04/lg-voice-control.jpghttp://www.blogcdn.com/www.engadget.com/media/2012/04/lg-voice-control.jpghttp://www.blogcdn.com/www.engadget.com/media/2012/04/lg-voice-control.jpghttp://www.bmwblog.com/wp-content/uploads/Siri_VoiceControl1.pnghttp://www.bmwblog.com/wp-content/uploads/Siri_VoiceControl1.pnghttp://www.bmwblog.com/wp-content/uploads/Siri_VoiceControl1.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.pnghttp://zagg-blog.s3.amazonaws.com/community/blog/wp-content/uploads/2012/11/Screen-Shot-2012-11-28-at-11.19.31-AM.png
  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    4/24

    2. LITERATURE REVIEW

    Pipeline structure of decoder [3]

    Controller is so complicated

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    5/24

    2. LITERATURE REVIEW

    Parallel structure of decoder [2]

    Processing

    element

    Consume many

    resources

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    6/24

    3. METHODOLOGY AND

    SYSTEM DESCRIPTION

    Diagram of decoder [1]

    Result

    Data RAM controller

    (DRC)

    12 x 26

    Viterbi searching

    FLASH RAM

    RAM

    + X X +

    + X X +

    CU1

    CU2

    CU7

    CU8

    X Reg

    X Reg

    + X X +

    + X X +

    X Reg

    X Reg

    .

    .

    .

    .

    GCU

    Log (b j(Ot))

    Log (b j(Ot+12))

    Viterbi searching

    j(t) j(t+1) j(t+12)

    j-1(t) j-1(t+1) j-1(t+12) 1(t)

    2(t)

    11(t)

    12(t)

    Result

    REG

    REGTMP 1

    REGTMP 2

    Final RegGaussian Calculation Unit (GCU)

    => output probability calculation

    Data RAM controller (DRC)

    Pipeline Viterbi searching

    Calculation

    Unit

    Input data

    Model data

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    7/24

    4. DESIGN SPECIFICATION

    Factors Specification

    Technology 90 nm

    Vdd 2.5V 3V

    Power consumption 1mw

    Area 100nm2

    Recognition accuracy 85% for 50 words

    Frequency 100 MHz

    Number of transistor 50.000

    Maximum number of states 16

    Maximum number of mixture components 8

    Maximum number of parallel calculation units 8

    Maximum decoding time for one word About 0.221s

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    8/24

    5. IMPLEMENTATION

    Software system

    MFCC

    extraction

    Training and

    creating model

    Check model

    Speech

    Convert

    model

    FLASH

    memory

    Software ASR

    TEST

    GENERATETest file.txt

    Recognition

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    9/24

    5. IMPLEMENTATION

    Software system simulation result with 400 voice samples

    and 20 models. Each model is corresponding to one word

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    10/24

    5. IMPLEMENTATION

    Hardware design

    Data RAM controller

    Data RAM controller

    (DRC)

    8 x 26

    O_mfcc_data1

    FLASH RAM

    For model

    RAM For

    MFCC vectors

    MFCC RAM controller

    16 x 26 x 16b

    Reg

    Reg

    Mean RAM

    8 x 26 x 16b

    j,1 j,2 j,26

    8,1 8,2 8,26

    Sigma RAM

    8 x 26 x 16b 8,1 8,2

    8,26

    j,1 j,2 j,26

    Ot,1 Ot,2 Ot,26

    O2,1 O2,2 O2,26

    O16,1 O16,2 O16,26

    8

    21

    16

    13 O15,1 O15,2 O15,26

    16

    16

    16

    16

    16

    16

    5

    5 mfcc_addr

    model_addr

    model_finish

    mem_mfcc_finish

    Control signal

    O_mfcc_addr

    i_model_data

    O_model_addr

    i_mfcc_data O_mfcc_data2

    O_mfcc_data8

    O_mfcc_data7

    Model RAM controller

    MFCC RAM controller

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    11/24

    5. IMPLEMENTATION5555

    E38E

    F0F0

    0F0F

    Hardware design

    Data RAM controller

    => Write operation

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    12/24

    5. IMPLEMENTATION

    Hardware design

    Data RAM controller => Read operation

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    13/24

    5. IMPLEMENTATION

    Calculation unit

    + X X +

    CU1

    Frame xt

    Parameter ,

    X Reg

    Log {bj(ot)}

    Overflow

    Control signal

    16bit adder 16bit Multiplier

    26bit Multiplier 52bit Adder

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    14/24

    5. IMPLEMENTATION

    Gaussian Calculation Unit and Viterbi searching modules

    + X X +

    + X X +

    CU1

    CU2

    CU7

    CU8

    X Reg

    X Reg

    + X X +

    + X X +

    X Reg

    X Reg

    .

    .

    .

    .

    Frame Ot

    Log (bj(Ot))

    Frame Ot+7Log (b j(Ot+7))

    Log (bj(Ot+6))

    Log (bj(Ot+1))

    Frame Ot+6

    Frame Ot+1

    Control signal

    GCU

    Log (b j(Ot))

    Log (b j(Ot+12))

    Viterbi searching

    j(t) j(t+1) j(t+12)

    j-1(t) j-1(t+1) j-1(t+12) 1(t)

    2(t)

    11(t)

    12(t)

    Result

    REG

    REGTMP 1

    REGTMP 2

    Final Reg

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    15/24

    6.TEST PLAN

    Module Input Output Method

    Data Ram

    Controller

    (DRC)

    Data from text file Store input data in

    internal register banks

    and export to 8 16

    output ports

    Build in self test

    Completed

    system with

    Log-add

    Data from text file

    generated by

    Matlab

    The index of model The result will be

    compared with the

    value from Matlab

    Completed

    system withoutLog-add

    Data from text file

    generated by

    Matlab

    The index of model The result will be

    compared with the

    value from Matlab

    FPGA test

    Model parameter

    Data of all feature

    vector from Matlab

    Display result on 7

    segment LED

    The result will be

    compared with the

    value from Matlab

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    16/24

    7. DISCUSSION AND TIMELINE The characteristic of English

    Some suffixes like s, ed, t or d

    One word may have many syllables

    Diffcult to detect word

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    17/24

    8. CONCLUSION AND

    FUTURE WORK

    Software system => implemented successfully for parallel

    Viterbi algorithm with offline recognition

    Two first hardware sub-module is verified successfully

    Future work

    Improve word detecting function

    GCU and Viterbi searching sub-module

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    18/24

    THANK YOU

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    19/24

    REFERENCE

    [1] Kazuhiro Nakamura, Ryo Shimazaki, Masatoshi Yamamoto, Kazuyoshi Takagi

    and Naofumi Takagi, AVLSI architecture for output probability computationsof hmm based recognition systems,in VLSI, Rijeka, Croatia, InTech, 2010, pp.

    274-284.

    [2] Yoshizawa S., Wada N., Hayasaka N. and Miyanaga Y., Scalablearchitecture

    for word HMM-based speech recognition and VLSI implementation in

    complete system,Circuits and Systems, vol. 53, no. 1, pp. 70-77, 2006.

    [2] Wei H., Cheong F. C., Chiu S. C. and Kong P. P., ASpeech Recognizer with

    Selectable Model Parameters,in ISCAS, 2005.

    [3] Bok-Gue P., Koon-Shik C. and Jun-dong C., Lowpower VLSI architecture of

    Viterbi scorer for HMM-based isolated word recognition,in Quality ElectronicDesign, 2002.

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    20/24

    2. LITERATURE REVIEW

    ASR built as an embedded system

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    21/24

    3. SYSTEM DESCRIPTION

    a. Computing flow in Viterbi algorithm [1]

    1lnln~

    2lnlnln

    maxmax

    maxmaxminmin

    MXCXC

    tj eXCob

    NjTt

    obaijj tjijtNi

    tt

    1;2

    ,~~~max~ln~ 11

    BFPP: Block Frame Parallel processing

    Output probability (2)

    Partial probability (1)

    [4]

    Computing flow: 2-1-2-1-2-1-2-1..

    Computing flow:

    {2-2-2-2}-1-1-1-1-{2-2-2-2-2}-1-1-1-1-1

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    22/24

    3. SYSTEM DESCRIPTION

    Flow chart and diagram of entire decoder

    v++

    Start

    t++

    j++

    p++

    Pp

    Nj

    Tt

    Vv

    LogbjOt LogbjOt+1 LogbjOt+M-1

    Loop A

    Loop B

    Loop C

    Loop D

    End

    N

    Y

    N

    Y

    N

    Y

    N

    Y

    RegAjj=0

    Regtmp=0

    Regfinal=0

    t=1

    j=Q

    Procedure 2

    i++

    t=T

    i=N

    i=0

    j++

    Regfinal [j] =RegAjj[N]

    j=0

    t=t+M

    j=1Procedure 1

    i++

    Start

    N

    Y

    Y

    N

    Y

    N

    N

    Finish

    Loop A

    Loop B

    Loop C

    Y

    Result

    Data RAM controller

    (DRC)

    12 x 26

    Viterbi searching

    FLASH RAM

    RAM

    + X X +

    + X X +

    CU1

    CU2

    CU7

    CU8

    X Reg

    X Reg

    + X X +

    + X X +

    X Reg

    X Reg

    .

    .

    .

    .

    GCU

    Output probability Partial probability

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    23/24

    5. IMPLEMENTATION

    Fail to recognize

    Recognize successfully

  • 8/13/2019 Parallel Structure of Decoder in Automatic Speech Recognition

    24/24

    5. IMPLEMENTATION

    Hardware design

    Data RAM controller

    Test register

    bank 0 & 1W en = R en =1

    Test register

    bank 2 & 3

    W en = R en =1

    Test register

    bank 4 & 5

    W en = R en =1

    Test register bank

    6 & 7

    W R 1

    Test register bank

    0 & 1

    W 0 R 1

    RAM controller

    Testbench

    Test_case.txt

    Model.out Virtual RAM Store Address

    ADDR

    ADDR

    Input

    Data

    Control

    signal

    Comparator

    Output

    Data

    W_en/R_en

    Expected

    data

    Error_signal