Evaluation of multi user system of voice interaction using grammars(slide share)

  • Published on
    25-Jul-2015

  • View
    104

  • Download
    1

Embed Size (px)

Transcript

  • Evaluation of Multi-user System of Voice Interaction Using GrammarsElizabete Munzlinger, Fabricio da Silva Soares, and Carlos Henrique Quartucci Forster{bety, p2p, forster}@ita.br ITA Instituto Tecnolgico de Aeronutica EEC-I Engenharia Eletrnica e Computao Informtica Diviso de Cincia da Computao

  • AgendIntroductionGrammar DesignTests and Results of AccuracyConclusion

  • IntroductionExaustive trainingFig. 1. Train the system to recognize ones voice through the exhaustive reading of texts

  • Several contextsIntroductionFig. 2. Systems which particular application and contexts

  • 1onPort [4], Action [true]IntroductionDomotic systemFig. 3. Prototype of Domotic system

  • Domotic systemIntroductionFig. 3. Prototype of Domotic system

  • Grammar DesignGrammar treeMainRule2Rule3Rule1Rule1Rule4Rule5Rule6Terminal symbolsTerminal symbolsTerminal symbolsRule8Rule7......Terminal symbolsFig. 4. The grammar tree composed by nodes

  • Grammar DesignGrammar in Java Speech Grammar Formatgrammar br.ita.domovox;public = [] [] [] []; = [] [] []; = ; = [] [] [] [] [] []; = [] [] ; = []; = [] [] [] [] []; = [] []; = [] [] [] [] []; = [] | []; = [] | []; = por favor | faz favor | por gentileza | por obsquio | faa a gentileza | faa o favor | fazer o favor | fazer a gentileza; = pc | computador | notebook | mquina | sistema | domovox | sistema domovox | sistema de voz | sistema de fala | meu | cara | bicho | mano | maluco; = eu | tu | ele | ela | ns | vs | eles | elas | voc | vocs | mim | gente; = [] | [] | [] | [] | [] | []; = quero | queres | quer | queremos | quereis | querem | querendo; = desejo | desejas | deseja | desejamos | desejais | desejam | desejando; = preciso | precisas | precisa | precisamos | precisais | precisam | precisando; = necessito | necessitas | necessita | necessitamos | necessitais | necessitam | necessitando; = vou | vais | vai | vamos | vo; = pode | podes; = | ; = (ligar | ligue | ativar | ative | ascender | ascenda) {true}; = (desligar | desligue | desativar | desative | apagar | apague) {false} = [] | []; = o | a | os | as; = esse | essa | este | esta | aquele | aquela | aquilo | todos | todos os | todas as | tudo; = | | | | | ; = (tudo | dispositivos | aparelhos) {0}; = (luz | lmpada) {1}; = (ventilador | aparelho ventilador) {2}; = (tv | tev | televiso | televisor | aparelho de tv | aparelho televisor) {3}; = (abajur | luminria | candelabro) {4}; = (outros) {5}; = j | agora | nesse momento | nesse minuto | nesse segundo | agora mesmo; = aqui | a | l | ambiente | quarto | sala | pea | lugar | casa | apartamento | ap; = meu | minha | meus | minhas | nosso | nossa | nossos | nossas | vosso | vossa | vossos | vossas | dele | dela | deles | delas | desse | dessa | desses | dessas | nesse | nessa | nesses | nessas; = que | da | de | do | mesmo | para | pra | momento | mandando | tambm | inclusive | estou | a | | | | hum | mas | pode;

  • Grammar DesignComputational resourcesGraph. 1. Graphic of allocation and processing of the structure of the grammar

  • Grammar DesignRedesign of grammarComandoAoObjetoComplemento*FalsoVerdadeiroPorta 01Porta 02 ...ligar, ativar, acender...desligar, desativar, apagar ...1, luz, lmpada...Por favor, eu voc, sistema, do, preciso, meu, pode, de, quarto, a, o...2, TV, televisor, televiso...Fig. 5. The new grammar tree

  • Grammar Design100%50%Graph. 2. Graphic of allocation and processing of the structure of the grammar

    Computational resources

  • Representation of grammar

    Grammar DesignFig. 6. Grammar represented through a state machine with a recursivity rule

  • Grammar DesignAccepted commandsTable 1. Examples of simple and complex commands based in the rules of grammar

  • Tests and Results of AccuracyDomotic systemFig. 7. Comparison between registered spoken words and the log system

    registeredlogged

  • Tests and Results of AccuracyGeneral rates of acceptationGraph. 3. Rates of acceptation of all commands

  • Tests and Results of AccuracyRates of acceptation by simple and complex commands Graph. 4. Rates of definite articles acceptation by simple and complex commands

  • Tests and Results of AccuracyRates of acceptation by numbers from 1 to 32Graph. 5. Rates of acceptation by numbers from 1 to 32

  • Tests and Results of AccuracyRates of errors in the numbers recognition

    Highest rates of error:21, 27 and 31Mistook words with similar sound:2120 eu3130 a eu30 a vou30 a o30 a os30 aqui os30 aqui eu30 eu30 emThis happened in 70% of the cases

  • ConclusionBehavior of a voice interface systemDesign of grammarExperiments with usersRedesign of grammar

  • ReferencesBurstein, A., Stolzle, A., Brodersen, R.W.: Using Speech Recognition in a Personal Communications System. In: Communications, 1992. ICC 92, Conference record, SUPERCOMM/ICC 92, IEEE, Los Alamitos (1992)Pfaff, G.E.: User Interface Management Systems, p. 72. Springer, New York (1985)Seneff, S.: TINA: A Natural Language System for Spoken Language Applications. Comput. Linguist. 18, 6186 (1992)Sun Microsystems Ltd, Java Speech API Programmers Guide Version 1.0, [online at], http://java.sun.com/products/javamedia/speech/Vieira, R., Lima, V.L.: Lingstica Computacional: Princpios e Aplicaes. In: JAIA ENIA, 2001, Fortaleza (2001)

  • Evaluation of Multi-user System of Voice Interaction Using GrammarsElizabete Munzlinger, Fabricio da Silva Soares, and Carlos Henrique Quartucci Forster{bety, p2p, forster}@ita.br ITA Instituto Tecnolgico de Aeronutica EEC-I Engenharia Eletrnica e Computao Informtica Diviso de Cincia da Computao

    *Hello, we are from the Technological Institute of aeronautics. Our works name is: Evaluation of Multi-user System of Voice Interaction Using Grammars.*In this paper we shows an experimental study about the design of grammars for a voice interface system. The influence of the grammar design on the behavior of the voice recognition system regarding accuracy and computational cost is assessed through tests. With the re-design of a grammar we show that those characteristics can be expressively improved.

    Este artigo apresenta um estudo experimental sobre o projeto de gramticas para um sistema de interface por voz. A influncia da gramtica no comportamento do sistema de reconhecimento de voz quanto exatido e desempenho so apresentados atravs de testes. Apresentamos uma nova proposta quando a composio de regras da gramtica*Many speech recognition systems need every new user to train the system to recognize ones voice through the exhaustive reading of texts. This training is necessary because these systems often use extended vocabularies of words. It is desirable to have a system independent of the training and able to recognize the same words when spoken by different voices, with different accents.

    Muitos sistemas de reconhecimento de fala necessitam que cada usurio treine o sistema para reconhecer sua fala atravs da leitura exaustiva de textos. A necessidade desse treinamento devida ao extenso vocabulrio de palavras que esses sistemas podem utilizar [1]. desejvel que um sistema independente de treinamento seja capaz de reconhecer as mesmas palavras quando pronunciadas por diferentes vozes, com diferentes sotaques [5].*Applications that use recognized commands dont need such extended vocabulary, which can be restricted to the needs of the particular application and contexts. Example: sciences, computation, economy, astronomy, domotics, aeronautics. By the use of grammars associated to the application a limit of possible words to every context is determined. The right design of a grammar can make the application become a multi-user system.

    Aplicaes que utilizam reconhecimento de comandos no necessitam de um vocabulrio to abrangente, podendo este ser restrito s necessidades do aplicativo. Atravs do uso de gramticas associadas ao aplicativo, imposto um limite de possveis palavras para cada contexto. O uso correto de gramticas pode capacitar aplicativos a se tornarem multi-usurio. *The grammar was used in a prototype of Domotic system that controls up to 32 devices through voice recognition. The system uses the parallel port of the computer and is connected to an electronic circuit that activates the devices. For the Automatic Speech Recognition system, IBM Via Voice was chosen because its acceptance of Brazilian Portuguese. The Domotic application was developed in Java and uses IBM Java Speech Technology API.

    As gramticas foram aplicadas em um prottipo de sistema Domtico que controla at 32 dispositivos atravs de interface de voz. O sistema utiliza a porta paralela do computador e est interligado a um Circuito Eletrnico projetado para tal controle. Foi utilizado como sistema de ASR o IBM Via Voice porque admite o Portugus Brasileiro. O aplicativo Domtico desenvolvido na linguagem Java e utiliza a API IBM Java Speech Technology.*The grammar was used in a prototype of Domotic system that controls up to 32 devices through voice recognition. The system uses the parallel port of the computer and is connected to an electronic circuit that activates the devices. For the Automatic Speech Recognition system, IBM Via Voice was chosen because its acceptance of Brazilian Portuguese. The Domotic application was developed in Java and uses IBM Java Speech Technology API.

    As gramticas foram aplicadas em um prottipo de sistema Domtico que controla at 32 dispositivos atravs de interface de voz. O sistema utiliza a porta paralela do computador e est interligado a um Circuito Eletrnico projetado para tal controle. Foi utilizado como sistema de ASR o IBM Via Voice porque admite o Portugus Brasileiro. O aplicativo Domtico desenvolvido na linguagem Java e utiliza a API IBM Java Speech Technology.*A grammar is built from a set of sentences separate by production rules and structured as a tree composed by nodes. The nodes of the grammar are contained in a static structure describing a hierarchy of nodes from the main node and a set of nodes dependent on it. In two-dimensional disposition it is possible to see the possibilities of connections between the levels of the tree following its hierarchy until reaching the terminal symbols.

    Uma gramtica construda a partir de um conjunto de sentenas separadas por regras e estruturadas em forma de rvore composta por ns. Os ns da gramtica so contidos em uma estrutura esttica que descreve uma hierarquia de ns a partir do n pai, e um conjunto de ns dependentes deste. Em disposio bidimensional podem-se observar as possibilidades de ligaes permissveis entre os nveis da rvore seguindo a sua hierarquia, at alcanar os smbolos terminais.*At first we designed a grammar for general use, (by systems with several contexts), based on the morphological analysis used in the sentences of the Portuguese language, and made of many rules that determines, for example, verbs, subjects, treatments, pronouns and articles. Thus the rule, that defines an article, comprises other two sub-rules, for definite articles and indefinite articles. In the end the grammar has a total of 64 sub-rules and 167 terminal symbols.

    Inicialmente foi projetada uma gramtica para uso geral (por sistemas de diversos contextos) baseada na anlise morfolgica aplicada nas oraes da lngua portuguesa, consistindo em muitas regras que definem, por exemplo, verbo, sujeito, tratamento, pronomes e artigos. Assim, a regra que define artigo foi composta por outras duas sub-regras, para artigos definidos e artigos indefinidos. Ao final, a gramtica totalizou 64 sub-regras com um total de 167 smbolos terminais.*It was noticed that this complex grammar lowers the performance of the recognition system making it impossible to execute the application. It took at least 980 Mega Bytes of memory and 100% of CPU occupancy during 1 minute for allocation and processing of the structure of the grammar. Therefore, there were no computational resources remaining to analyze any sentence. To solve this problem, the grammar was restructured with changes to the rules composition resulting in the new tree.

    Observou-se que esta gramtica complexa, torna baixo o desempenho do sistema de reconhecimento de voz ao ponto de impossibilitar a execuo do aplicativo. Para alocao da estrutura da gramtica, foram ocupados em mdia 980 MB de memria, com 100% de uso de CPU por at 1 min. Desse modo no restaram recursos computacionais para anlise de sentena alguma.Para resolver esse problema, a gramtica passou ento por uma reestruturao com modificaes na composio das regras resultando na rvore apresentada na Figura 1.*The main node of the grammar is the rule command, that is composed by the sub-rules, complement, action and object. The rule action has two sub-rules, true and false, that controls the activation condition of the devices in the Domotic system. The rule object, has one sub-rule for each one of the 32 devices to be controlled, and also must return the value of accepting of just one of its rules. The sub-rule complemet has no value of acceptation and contains 165 terminal symbols extracted from the 35 sub-rules morphologically separated beforehand. Using this grammar, the consumption of memory went down to an average of 423 Mega Bytes, and the duration of total use of the CPU was less than one second.

    Como n pai tem a regra comando que composta pelas sub-regras complemento, ao e objeto. A regra ao contm duas sub-regras, verdadeiro e falso, que controlam o estado de funcionamento dos dispositivos no sistema Domtico. A regra objeto contm uma sub-regra para cada um dos 32 dispositivos a serem controlados, e tambm dever retornar obrigatoriamente o valor de aceitao de somente uma de suas sub-regras. A sub-regra complemento no possui valor de aceitao e contm 165 smbolos terminais extrados das 35 sub-regras antes separas morfologicamente. Para esta gramtica o consumo de memria apresentou mdia de 423 MB para alocao da estrutura da gramtica e pico de 100% de uso de CPU por menos de 1 segundo. *The main node of the grammar is the rule command, that is composed by the sub-rules, complement, action and object. The rule action has two sub-rules, true and false, that controls the activation condition of the devices in the Domotic system. The rule object, has one sub-rule for each one of the 32 devices to be controlled, and also must return the value of accepting of just one of its rules. The sub-rule complemet has no value of acceptation and contains 165 terminal symbols extracted from the 35 sub-rules morphologically separated beforehand. Using this grammar, the consumption of memory went down to an average of 423 Mega Bytes, and the duration of total use of the CPU was less than one second.

    Como n pai tem a regra comando que composta pelas sub-regras complemento, ao e objeto. A regra ao contm duas sub-regras, verdadeiro e falso, que controlam o estado de funcionamento dos dispositivos no sistema Domtico. A regra objeto contm uma sub-regra para cada um dos 32 dispositivos a serem controlados, e tambm dever retornar obrigatoriamente o valor de aceitao de somente uma de suas sub-regras. A sub-regra complemento no possui valor de aceitao e contm 165...