1
tude,phase, andtheir timeof birth anddeath. These parameters aretrans- mitted and then used to control multiple sinewavegenerators that reproduce the input signal. Output speech is saidto be indistinguishable from the input.One wouldthink music mightbe equally well handled.-- DLR 4,821,326 43.72.Gy NON-AUDIBLE SPEECH GENERATION METHOD AND APPARATUS Norman MacLeod, assignor to Macrowave TechnologyCorporation 11 April 1989 {Class381/51); filed 16 November 1987 This interesting device allows silently mouthed speech articulations to be detected andtransmitted such that the speech sounds may be produced by a speech synthesizer at the receiving end. It should be useful wherethe talkerdoes notwantto beoverheard, as well as in highambient noise condi- tions. An ultrasonic pulse generator with spectral outputranging from 15 kHz to over 100kHz is placed against the throat,where it excites the vocal tract. An ultrasonic transducer worn as a normal headset mike picksup vocal tract-induced amplitude modulations of the high-pitched source that aretransmitted, then down-shifted in frequency at thereceiver to an audible signal.--DLR 4,850,022 43.72.Gy SPEECH SIGNAL PROCESSING SYSTEM Masaaki Honda andTakehiro Moriya, assignors to Nippon Telegraph and TelephonePublic Corporation 18 July 1989 (Class 381/36); filed in Japan 21 March 1984 Many recentattemptsto improvethe quality of voice transmission using a linear prediction vocoder haveconcentrated on methods of encod- ing an excitation signal derivedfrom the LPC residual. The devicede- scribed here detects theresidual signal's phase spectrum andthen passes the residual througha phase-equalizing filter suchthat the signal is shifted towardzero phase. This hasthe effect of concentrating the signal energy near points corresponding to the fundamental period. The concentration of energy allows a more efficient encoding of the excitation.--DLR 4,783,806 43.72.Ne SPEECH RECOGNITION APPARATUS Kazuo Nakamura and Tadao Norjiri, assignors to Nippondenso Com- pany 8 November 1988 (Class381/43); filed in Japan 22 January 1986 Thispatent describes a pattern matching algorithm for use in a speech recognizer. As an input utterance is collected, it is divided into shortdura- tion partial patterns. These partial patterns are compared with eachstored reference pattern, firstbeing normalized in duration andthensubjected to a dynamic time warping operation to determine a partial matchscore. Both the descriptive text and the claims section of the patentinclude detailed formulas covering the methods of computing and combining the partial match scores to achieve the overall recognition score.--DLR 4,783,807 43.72.Ne SYSTEM AND METHOD FOR SOUND RECOGNITION WITH FEATURE SELECTION SYNCHRONIZED TO VOICE PITCH John Marley, Scottsdale,AZ 8 November1988 {Class 381/43}; filed 27 August1984 This method of extracting acoustic features for use in speech recogni- tion begins by locating the maximum peak amplitude of the speech wave- ADJUSTMENT form for each fundamental period during voiced sounds. The durations are thenmeasured from thetimeof theamplitude peak to the first two positive- goingcrossings of an adjustable threshold level. Thesetwo duration mea- sures are used along with certain silence detection conditions to classify the phonetic quality in whatisessentially a mapping of thearticulatory space of tongue andlip shapes.--DLR 4,783,808 43.72.Ne CONNECTED WORD RECOGNITION ENROLLMENT METHOD George R.Doddington and Michael L. McMahan, assignors to Texas Instruments, Incorporated 8 November 1988 (Class381/43); filed 25 April 1986 A method is described here for combining speech recognition tem- plates from words or phrases spoken in isolation to arrive at templates for the same items asspoken together in normal, connected speech. Utterances must beavailable containing the items spoken together in a connected man- ner. The isolated templates are used to locatethe itemsin the connected utterance, at which pointa newconnected template maybeproduced from the analyzed material.The patentalso discusses using a speech synthesizer to regenerate speech from recognition templates as a way to monitor the qualityof the template materials.--DLR 4,783,809 43.72.Ne AUTOMATIC SPEECH RECOGNIZER FOR REAL TIME OPERATION StephenC. Glinski, assignor to AT&T Bell Laboratories 8 November1988 (Class381/43); filed 7 November1984 This patentcovers a method of matching frames of an input utterance with multiplecandidate pathways through a set of stored reference frames. The method appears to be a sortof multipath dynamic time warping tech- nique described in terms of multiple"indexlevels." These levels areused as a housekeeping mechanism to maintain the various pathways being warped for time alignment.--DLR 4,825,384 43.72.Ne SPEECH RECOGNIZER Atsushi Sakurai, assignor to Canon Kabushiki Kaisha 25 April 1989 (Class364/513.5); filed in Japan 27 August1981 The idea of canceling aninterfering signal andleaving a cleaner copy of desired signalis well known in telephony circles.Perhaps this is its first application withinthecontext of a speech recognition system. The goal here is to eliminate from the speech input signalinterfering sounds from an attached speech synthesis device. The method of cancellation described in thisembodiment consists of a variable delayfollowed by a linear prediction inverse filter that is controlled by the synthesized speech parameter data.-- DLR 491 J. Acaust. Sac. Am. 89(1), Jan. 1991; 0001-4966/91/01491-01 $00.80; 991 Acaust. Sac. Am.; Patent Reviews 491 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 75.102.73.105 On: Fri, 21 Nov 2014 23:53:26

Connected word recognition enrollment method

Embed Size (px)

Citation preview

Page 1: Connected word recognition enrollment method

tude, phase, and their time of birth and death. These parameters are trans- mitted and then used to control multiple sinewave generators that reproduce the input signal. Output speech is said to be indistinguishable from the input. One would think music might be equally well handled.-- DLR

4,821,326

43.72.Gy NON-AUDIBLE SPEECH GENERATION METHOD AND APPARATUS

Norman MacLeod, assignor to Macrowave Technology Corporation 11 April 1989 {Class 381/51); filed 16 November 1987

This interesting device allows silently mouthed speech articulations to be detected and transmitted such that the speech sounds may be produced by a speech synthesizer at the receiving end. It should be useful where the talker does not want to be overheard, as well as in high ambient noise condi- tions. An ultrasonic pulse generator with spectral output ranging from 15 kHz to over 100 kHz is placed against the throat, where it excites the vocal tract. An ultrasonic transducer worn as a normal headset mike picks up vocal tract-induced amplitude modulations of the high-pitched source that are transmitted, then down-shifted in frequency at the receiver to an audible signal.--DLR

4,850,022

43.72.Gy SPEECH SIGNAL PROCESSING SYSTEM

Masaaki Honda and Takehiro Moriya, assignors to Nippon Telegraph and Telephone Public Corporation

18 July 1989 (Class 381/36); filed in Japan 21 March 1984

Many recent attempts to improve the quality of voice transmission using a linear prediction vocoder have concentrated on methods of encod- ing an excitation signal derived from the LPC residual. The device de- scribed here detects the residual signal's phase spectrum and then passes the residual through a phase-equalizing filter such that the signal is shifted toward zero phase. This has the effect of concentrating the signal energy near points corresponding to the fundamental period. The concentration of energy allows a more efficient encoding of the excitation.--DLR

4,783,806

43.72.Ne SPEECH RECOGNITION APPARATUS

Kazuo Nakamura and Tadao Norjiri, assignors to Nippondenso Com- pany

8 November 1988 (Class 381/43); filed in Japan 22 January 1986

This patent describes a pattern matching algorithm for use in a speech recognizer. As an input utterance is collected, it is divided into short dura- tion partial patterns. These partial patterns are compared with each stored reference pattern, first being normalized in duration and then subjected to a dynamic time warping operation to determine a partial match score. Both the descriptive text and the claims section of the patent include detailed formulas covering the methods of computing and combining the partial match scores to achieve the overall recognition score.--DLR

4,783,807

43.72.Ne SYSTEM AND METHOD FOR SOUND

RECOGNITION WITH FEATURE SELECTION SYNCHRONIZED TO VOICE PITCH

John Marley, Scottsdale, AZ 8 November 1988 {Class 381/43}; filed 27 August 1984

This method of extracting acoustic features for use in speech recogni- tion begins by locating the maximum peak amplitude of the speech wave-

ADJUSTMENT

form for each fundamental period during voiced sounds. The durations are then measured from the time of the amplitude peak to the first two positive- going crossings of an adjustable threshold level. These two duration mea- sures are used along with certain silence detection conditions to classify the phonetic quality in what is essentially a mapping of the articulatory space of tongue and lip shapes.--DLR

4,783,808

43.72.Ne CONNECTED WORD RECOGNITION ENROLLMENT METHOD

George R. Doddington and Michael L. McMahan, assignors to Texas Instruments, Incorporated

8 November 1988 (Class 381/43); filed 25 April 1986

A method is described here for combining speech recognition tem- plates from words or phrases spoken in isolation to arrive at templates for the same items as spoken together in normal, connected speech. Utterances must be available containing the items spoken together in a connected man- ner. The isolated templates are used to locate the items in the connected utterance, at which point a new connected template may be produced from the analyzed material. The patent also discusses using a speech synthesizer to regenerate speech from recognition templates as a way to monitor the quality of the template materials.--DLR

4,783,809

43.72.Ne AUTOMATIC SPEECH RECOGNIZER FOR

REAL TIME OPERATION

Stephen C. Glinski, assignor to AT&T Bell Laboratories 8 November 1988 (Class 381/43); filed 7 November 1984

This patent covers a method of matching frames of an input utterance with multiple candidate pathways through a set of stored reference frames. The method appears to be a sort of multipath dynamic time warping tech- nique described in terms of multiple "index levels." These levels are used as a housekeeping mechanism to maintain the various pathways being warped for time alignment.--DLR

4,825,384

43.72.Ne SPEECH RECOGNIZER

Atsushi Sakurai, assignor to Canon Kabushiki Kaisha 25 April 1989 (Class 364/513.5); filed in Japan 27 August 1981

The idea of canceling an interfering signal and leaving a cleaner copy of desired signal is well known in telephony circles. Perhaps this is its first application within the context of a speech recognition system. The goal here is to eliminate from the speech input signal interfering sounds from an attached speech synthesis device. The method of cancellation described in this embodiment consists of a variable delay followed by a linear prediction inverse filter that is controlled by the synthesized speech parameter data.-- DLR

491 J. Acaust. Sac. Am. 89(1), Jan. 1991; 0001-4966/91/01491-01 $00.80; 991 Acaust. Sac. Am.; Patent Reviews 491

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 75.102.73.105 On: Fri, 21 Nov 2014 23:53:26