4
45 Kiyoshi HONDA 1 Introduction Human speech plays a significant role in our daily life and social organization as a vehicle of information exchange. When we consider the functional aspects of speech, we can first recognize it as a medium of thought, which is often referred to as “internal speech.” This symbolic speech, not expressed in sound, is used for reasoning and mental writing, and is fundamental to intellectual activity and cul- tural transmission. Speech is widely recog- nized as a tool of communication and its value is inestimable. If the interruption of telecom- munication were to somehow take place, we can readily guess that this would give rise to extreme confusion in our society. Speech acts reflect emotional consciousness through tonal changes, and play an important role in creat- ing human relationships. In addition, speech functions as a means of identification of talk- ers and a source of information concerning physical conditions. More specifically, we can guess who and how a person is even from coughing or other non-speech sounds. Such a variety of speech functions are the subject of extensive investigations; if machin- ery were to emerge to technologically compre- hend all of human speech functions, such machinery would have value equivalent to humans as an artificial organism. Speech research must aim at understanding basic bio- logical functions of humans and aim towards creating machinery capable of performing a variety of speech functions. 2 Issues in speech-communica- tion research 2.1 Brief history of speech research Speech research has spanned a period of 300 years, originating in an experiment on 3-3 Study of Speech Communication Mechanisms based on the Understanding of Human Biological Functions Kiyoshi HONDA Advanced Telecommunications Research Institute International Human speech acts as vehicles of thought, communication, emotional state, and physi- cal identity; these functions together contribute to our life and social organization. The value of speech is given by its versatility, which is derived basically from innate biological abilities equipped to humans. If the machinery emerged that handled all the speech functions, it would deserve human-equivalent significance. Towards evolution of spoken language automata as a prospective goal, we launch a research project to look into biological func- tions underlying human speech in the three areas; form and function of the speech organs, biophysical modeling of speech production mechanisms, and functional imaging of speech- related pathways. Keywords Speech, Biological functions, Individuality, Speaker normalization, Magnetic resonance imaging

3-3 Study of Speech Communication Mechanisms based on the … · 2013-11-21 · logical functions of humans and aim towards creating machinery capable of performing a variety of speech

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 3-3 Study of Speech Communication Mechanisms based on the … · 2013-11-21 · logical functions of humans and aim towards creating machinery capable of performing a variety of speech

45Kiyoshi HONDA

1 Introduction

Human speech plays a significant role inour daily life and social organization as avehicle of information exchange. When weconsider the functional aspects of speech, wecan first recognize it as a medium of thought,which is often referred to as “internal speech.”This symbolic speech, not expressed in sound,is used for reasoning and mental writing, andis fundamental to intellectual activity and cul-tural transmission. Speech is widely recog-nized as a tool of communication and its valueis inestimable. If the interruption of telecom-munication were to somehow take place, wecan readily guess that this would give rise toextreme confusion in our society. Speech actsreflect emotional consciousness through tonalchanges, and play an important role in creat-ing human relationships. In addition, speechfunctions as a means of identification of talk-

ers and a source of information concerningphysical conditions. More specifically, wecan guess who and how a person is even fromcoughing or other non-speech sounds.

Such a variety of speech functions are thesubject of extensive investigations; if machin-ery were to emerge to technologically compre-hend all of human speech functions, suchmachinery would have value equivalent tohumans as an artificial organism. Speechresearch must aim at understanding basic bio-logical functions of humans and aim towardscreating machinery capable of performing avariety of speech functions.

2 Issues in speech-communica-tion research

2.1 Brief history of speech researchSpeech research has spanned a period of

300 years, originating in an experiment on

3-3 Study of Speech Communication Mechanisms based on the Understandingof Human Biological Functions

Kiyoshi HONDAAdvanced Telecommunications Research Institute International

Human speech acts as vehicles of thought, communication, emotional state, and physi-cal identity; these functions together contribute to our life and social organization. The valueof speech is given by its versatility, which is derived basically from innate biological abilitiesequipped to humans. If the machinery emerged that handled all the speech functions, itwould deserve human-equivalent significance. Towards evolution of spoken languageautomata as a prospective goal, we launch a research project to look into biological func-tions underlying human speech in the three areas; form and function of the speech organs,biophysical modeling of speech production mechanisms, and functional imaging of speech-related pathways.

Keywords Speech, Biological functions, Individuality, Speaker normalization, Magnetic resonanceimaging

Page 2: 3-3 Study of Speech Communication Mechanisms based on the … · 2013-11-21 · logical functions of humans and aim towards creating machinery capable of performing a variety of speech

46

vocal cord vibration in 1700 in France. In thelate 18th century, von Kempelen analyzedmechanisms of producing vowels and conso-nants, and manufactured a mechanical speechsynthesizer. In the 19th century, Wheatstoneand Helmholtz developed vowel-productiontheories, and they found vocal cavity reso-nance, or formant, to be a characteristic thatdetermines vowel color[1]. At the time, J.Muller was conducting research in phonationmechanisms using a model of the vocal foldsand is now generally credited as being the firstperson to advocate the source-filter theory.His theory has led, in the 20th century, toChiba and Kajiyama’s vowel research basedon X-ray images and to Fant’s research for theacoustic theory of speech production[2]. Thisacoustic theory was gradually simplified, andthe vocal tract is now commonly regarded as asingle tube. Although such simplificationleads to a general understanding, it is unrealis-tic to assume such simple mechanisms in thespeech production process. It may be time toreexamine the biological mechanism by whichspeech is produced.

2.2 Issues to be investigatedWhile human speech mechanisms have in

many ways been clarified in the past 300years, there remain many issues to be investi-gated. We will discuss the following twoissues as tasks to be solved in the 21st century.

The first topic is the talker characteristics,or individual information conveyed by speechsound. Tone color varies from person to per-son, just as the appearance of individual facesvaries. The individuality in face and speech isan aspects of one’s physical identity that can-not be modified by intention. In order todevelop future biometric technologies, knowl-edge of physical factors causing talker charac-teristics must be established. Since we havelittle evidence on this matter, it may be rea-sonable to begin with the most likely sourcessuch as the gross shape and side-branches ofthe vocal tract. As shown in Fig. 1, the vocaltract consists of two side-branches, i.e., thenasal cavity and the periform fossa, and theircharacteristics are reflected by the detail of thespectral envelope[3]. Furthermore, the lengthand shape of the vocal tract have been hypoth-esized to be factors determining talker charac-teristics, and it is further speculated that thelength of the pharyngeal cavity influences theformants[4].

Second is the “exchangeability” of vowels.There is a marked difference in vowel colorbetween adults and children, due to differ-ences in body size. However, we know thatspeech comprehension is maintained in par-ent-child communication, without conscious-ness of this difference. Such vowel exchange-ability is referred to as “vowel normalization”and has long been debated; however, it

Journal of the Communications Research Laboratory Vol.48 No.3 2001

Fine structure of the speech organ (left) and an acoustic tube model (right)Fig.1

Page 3: 3-3 Study of Speech Communication Mechanisms based on the … · 2013-11-21 · logical functions of humans and aim towards creating machinery capable of performing a variety of speech

47

remains unresolved in speech-perception theo-ries to date. The vowel formants vary widely,in inverse proportion to the length of the vocaltract. The vowels /a/ in adults and /o/ in chil-dren are distributed within nearly the samearea, whereas they are easy to identify. Howwe categorize vowels from among speakers isan important issue to be investigated in speechrecognition research. However, so far wehave no physiological evidence of vowel nor-malization mechanisms.

2.3 Biological foundations of speechThe above two issues explain that speech

demonstrates exchangeability as a communi-cation tool, while it sufficiently distinguishesindividuality. Such individual identificationappears to have been generated through theevolutionary process, from the protomorphicstage onward. In animals, vocal sounds act asmeans of individual identification and mes-sage communication. In particular, vocalidentification between parents and childrenhas been investigated in birds and mammals.In the case of birds, auditory function isknown to be more dominantly involved inindividual identification than is visual func-tion. Studies of animal speech-behavior couldprovide useful knowledge in understandingthe biological functions of speech in humans.

3 Speech research using magnet-ic resonance imaging (MRI)

The Biological Speech Science project atATR is conducting a morphological measure-ment of the speech organs, a modeling of thespeech-production mechanisms, and function-al brain imaging during speech production,with a focus on the biological aspects ofspeech communication. A few findingsobtained by MRI are introduced below.

.3.1 Morphological measurement ofthe speech organ

The shape of the vocal tract has been stud-ied by X-ray imaging. MRI has enabled thefirst three-dimensional imaging of vocal tract,and the accuracy of measurement has beeninsured by dental imaging technique. Fig. 2(a) shows an image of the vocal tract extractedfrom the MRI data when the vowel /e/ is pro-duced. As illustrated by this figure, the vocaltract, long considered a simple tube, consistsof large and small branches. Recent technicaladvancement has also enabled three-dimen-sional motion imaging[5]. Fig. 2 (b) shows thetime course of vocal tract area functionextracted from three-dimensional motionimaging when the Japanese vowels /a/, /i/, /u/,/e/, and /o/ are produced. Thus, MRI presentsa significant contribution to the reevaluationof the acoustic theory of speech production

Kiyoshi HONDA

Three-dimensional features of the vocal tract (left) and temporal sequence of the areafunction associated with the voiced vowels /a/, /i/, /u/, /e/, and /o/

Fig.2

(a) (b)

Page 4: 3-3 Study of Speech Communication Mechanisms based on the … · 2013-11-21 · logical functions of humans and aim towards creating machinery capable of performing a variety of speech

48 Journal of the Communications Research Laboratory Vol.48 No.3 2001

and the elucidation of factors involved inspeech individualities.

3.2 Brain activity during speech production

The cerebral cortex has been investigatedin detail in relation to brain activity duringspeech production and hearing. It is neces-sary, from the standpoint of biological func-tions, to extend measurement areas to includethe brain stem. Functional MRI (fMRI) facesseveral technical problems in the imaging ofthe brain stem. We investigated orientation ofthe imaging plane to be suitable for recordingbrainstem activity during speech production.Fig.3 shows a result during spontaneousspeech production. As shown in the figure,activity of the general motor circuitry, such as

the thalamus, pons, and cerebellum, is visible.We are also planning to investigate a methodfor reducing noises that are generated from themachinery in anticipation of speech perceptionresearch, with the aim of using MRI for visu-alizing the neuronal pathways involved in theproduction and perception of speech.

References1 T. Chiba and M. Kajiyama, "The Vowel: Its Nature and Structure", Tokyo: Tokyo-Kaiseikan, 1941.

2 G. Fant, "Acoustic Theory of Speech Production", The Hague, Mouton, 1960.

3 K. Honda, "Biological foundations of speech", Gengo no Kagaku, Vol. 2. Tokyo, Iwanami, 1998.

4 K. Honda, "Facial appearance and vocal quality in man", J. Acoust. Soc. Jpn, 57, 308-313, 2001.

5 H. Takemoto, K. Honda, S. Masaki, Y. Shimada, I. Fujimoto, S. Takano, and K. Takeo, "Three-dimensional

motion imaging of speech organs using a synchronized MRI sampling method", Proceedings of Acoust.

Soc, Jpn, pp.251-252, Mar. 2001.

Kiyoshi HONDA, M.D., D. Ms.

Project Leader, Biological Speech Sci-ence Project, Information SciencesDivision, ATR International

Speech Science

Active areas in the cerebral cortex(left) and activities of the motor path-way in the brain stem (right) duringspeech production

Fig.3