7
Human Interface Studies for the Communication between Human and Systems Hideyuki Sawada Faculty of Engineering Kagawa University Abstract - The human interface technologies and their applications for computer intelligence and human-robot communication will be presented. Speech, vision and tactile sensations will be especially subjected with the recent results of speech signal processing, human motion tracking and humanoid robotics studies. Assistive technologies for supporting handicapped and aged people will be also introduced. 1. Introduction The importance of human-machine interface is widely acknowledged in accordance with the development of computers and associated technologies. The faster the processing speed becomes, the wider the gap between the computing ability and the accessibility of humans to the computer becomes. In human to human communication, we are able to communicate with each other by using not only verbal media but also the five senses which are vision, audition, olfaction, palate and tactile sensation. Information transmitted through the five senses is especially able to affect our emotions and feelings directly making for smooth communication. Whereas an employment of a conventional computer and systems is not suitable to sense the communication media employed in the human world. Although a variety of sensing devices and equipment has been developed for the measurement of physical data, the data processing ability of a computer is quite different from the one that humans have. If a computer is able to understand human communication media and reacts to humans as they do, it will open a new world where humans and computers live together in mutual prosperity. In this paper, the human interface technologies and their applications for computer intelligence and human-robot communication will be introduced. Speech, vision and tactile sensations will be especially subjected with the recent results of speech signal processing, human motion tracking and humanoid robotics studies. 1 1.1 Human Interface Studies Nowadays computer science has developed dramatically, and has widely contributed to the control engineering, theoretical calculations and amusements, not only in the expert field of engineering but also in daily human life. The theoretical aspect of computer science has been constructed on the basis of C. E. Shannon's information theory in the framework of the information processing, which contributed to the development of various software and hardware applications having procedural or grammatical structures. Based on the results of such applications to logical matters, computer sciences have extended their research targets into illogical matters such as human behavior, human mind and human artistic activities Human interface studies deal with the illogical aspects of human activities and present a new system, which would contribute to interface the gap between a machine and a human who uses or interacts with it, as shown in Fig. 1. 1.2 Speech, Vision and Tactile Sensations The author has been paying attention to the information transmitted through speech, vision, gestures and tactile sensations, and has constructed multimodal systems that deal the media as a human is doing. In the presentation, a talking robot which speaks and sings like a human, a robotic arm system which tracks a particular sound among multiple sounds, a gesture recognition system, a face tracking and recognition system, a tactile display which presents various tactile sensations will be introduced by showing the demonstrations. In this paper, the human interface technologies and their applications for computer intelligence and human-robot communication was introduced. Several application systems, which accept and react to human communication media were also presented together with the demonstrations. Fig. 1 Concept of Human Interface Object, Machine Computer Human Interface Human Intuitive Actions through Vision, Audition and Body Convincing Reactions

Human Interface Studies for the Communication between Human

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Human Interface Studies for the Communication between Human and Systems

Hideyuki Sawada

Faculty of Engineering Kagawa University

Abstract - The human interface technologies and their applications for computer intelligence and human-robot communication will be presented. Speech, vision and tactile sensations will be especially subjected with the recent results of speech signal processing, human motion tracking and humanoid robotics studies. Assistive technologies for supporting handicapped and aged people will be also introduced. 1. Introduction

The importance of human-machine interface is widely acknowledged in accordance with the development of computers and associated technologies. The faster the processing speed becomes, the wider the gap between the computing ability and the accessibility of humans to the computer becomes. In human to human communication, we are able to communicate with each other by using not only verbal media but also the five senses which are vision, audition, olfaction, palate and tactile sensation. Information transmitted through the five senses is especially able to affect our emotions and feelings directly making for smooth communication. Whereas an employment of a conventional computer and systems is not suitable to sense the communication media employed in the human world. Although a variety of sensing devices and equipment has been developed for the measurement of physical data, the data processing ability of a computer is quite different from the one that humans have. If a computer is able to understand human communication media and reacts to humans as they do, it will open a new world where humans and computers live together in mutual prosperity.

In this paper, the human interface technologies and their applications for computer intelligence and human-robot communication will be introduced. Speech, vision and tactile sensations will be especially subjected with the recent results of speech signal processing, human motion tracking and humanoid robotics studies.

1

1.1 Human Interface Studies

Nowadays computer science has developed dramatically, and has widely contributed to the control engineering, theoretical calculations and amusements,

not only in the expert field of engineering but also in daily human life.

The theoretical aspect of computer science has been constructed on the basis of C. E. Shannon's information theory in the framework of the information processing, which contributed to the development of various software and hardware applications having procedural or grammatical structures. Based on the results of such applications to logical matters, computer sciences have extended their research targets into illogical matters such as human behavior, human mind and human artistic activities

Human interface studies deal with the illogical aspects of human activities and present a new system, which would contribute to interface the gap between a machine and a human who uses or interacts with it, as shown in Fig. 1.

1.2 Speech, Vision and Tactile Sensations

The author has been paying attention to the information transmitted through speech, vision, gestures and tactile sensations, and has constructed multimodal systems that deal the media as a human is doing. In the presentation, a talking robot which speaks and sings like a human, a robotic arm system which tracks a particular sound among multiple sounds, a gesture recognition system, a face tracking and recognition system, a tactile display which presents various tactile sensations will be introduced by showing the demonstrations.

In this paper, the human interface technologies and their applications for computer intelligence and human-robot communication was introduced. Several application systems, which accept and react to human communication media were also presented together with the demonstrations.

Fig. 1 Concept of Human Interface

Object,Machine

Computer Human InterfaceHuman

Intuitive Actions through Vision, Audition and Body

Convincing Reactions

2. Active Tracking of Particular Person Using Visual and Auditory Information

A human has various sensory perceptions, and

effectively uses them in communication. Auditory and visual functions especially play an important role for recognizing someone to talk to and understanding the conversation. In vocal communication, we are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. In addition, we are able to detect a particular person by recognizing body features and individual gestures. By realizing this mechanism using a computer, new applications will be presented, which are utilized in the communication with humans. We are working for the identification of a particular person using microphones and optical motion trackers. The paper describes the development of information fusion system and how to deal with multiple data obtained by different sensors.

2.1 System Configuration

2

A typical configuration of our video conference room is shown in Figure 2. A sound system (A) and a video system (B) provide the locations of users and the identity of the speakers. The computer (C) fuses the information and provides speaker's location. Finally a camera focuses on a particular speaker and sends the video to the computer (D).

2.2 Sensing Systems A. Optical Motion Tracking System

In the project, two reflective markers are fixed to each user's head and body. The system imports a set of frames from the motion capture system to extract the marker locations, and then groups the markers by couples in order to compute the barycenter of each head.

B. Robotic Auditory Sensing System

We have been also working to develop a robotic auditory system, which tracks a 3D location of a sound source by using sound characteristics for the identification of a particular person. The microphone array shown in Figure 3 inputs 5 sound signals simultaneously to identify a speaker and the location.

Sound source location is estimated by using the phase difference and sound pressure difference among the 5 microphones [2]. Mel-cepstrum coefficients are used for the sound identification since they present the sound characteristics.

2.3 Identification of Particular Person

The information fusion system bundles and controls the sensing units distributed in the environment to track a person using visual and auditory information.

The video-based 3D acquisition system gives the precise location of users, however it is difficult to

identify who is speaking. The sound system is able to provide the direction of the sound source and to identify the speaker's voice signature, however it is difficult to distinguish one speaker among multiple users with an environmental noise. In this study, the identification of users and the estimation of the position are executed by the two approaches. Figure 4 shows the identification algorithm of a particular person. It is difficult to run various functions in one computer. We are constructing a distributed system consisting of different computers having various functions. In this study, we analyze these different steps, and present our solution based on OSGi frameworks running on each computer.

2.4 Conclusion of Tracking Techniques

In this chapter, an information fusion system was introduced, together with the application of the identification of a particular person in a group of plural people.

Fig.2 Video Conference Room Fig. 3 Microphone Array

Optical motion tracking Auditory sensing

Input sound signal Input Gesture

Human Body model

Estimation of sound

Fig. 4 Flow of Active Tracking

3. Image-based Gesture Recognition for Intuitive User Interface

In a study of virtual reality and mixed reality, the

developments of an input device and user interface are important for the interaction with virtual objects and environment. However, a mouse and a joystick are commonly used in present systems. If we can manipulate a virtual object intuitively by natural

Feature vector selection position

Template matching

Position of marker

Body feature Sound feature

Position of sound

Who ? Location of user

MFCC extraction

gestures without any sensors or devices, the system would contribute to present reality. In this paper, we suggest a technique to let a user move his hand in front of a camera to manipulate a virtual object, and the movement is recognized in real-time to establish an interactive communication.

3

3.1 System Configuration The system consists of a web camera, a computer

and a display as shown in Figure 5. Use's gestures are captured by the camera in front of him with the size of 320*240 pixels, and the gestural actions are recongnized in realtime.The user is able to manipulate a virtual object by his gestures and to intuitively interact with the three dimensional virtual environment.

PC

Web camera

User

Virtual object

Fig.5 System configuration

3.2 Image processing for gesture recognition A. Gesture recognition algorithm

The gesture recognition algorithm is constracted to identify characteristic motions given by user’s arm motions. In this study, we try to build a system to be simple as possible, so that a user operates the system to have interactions in realtime.

The image processing algorithm is described by referring to Figures 6. By inputting images from the web camera, the system applies spatial averaging to an image I(w,h,t) at time t. The brightness between two adjacent images B(x,y,t) and B(x,y, t-1) are compared, and the motion areas are extracted to obtain a binary image F(w,h) based on the threshold value as shown in Figure 6 (a).

Then, a morphological filter as presented in Equation 1 is applied to the image F(w,h) for removing small noises. Several white blocks still remain, and the system finds the largest block, which is regarded as the area of human moving hand. Equation 2 shows the formula to compute the size of the white blocks.

⎪⎪⎭

⎪⎪⎬

=≥++

=≤++

∑ ∑

∑∑

−= −=

−= −=

    

  

1),(,6),(

0),(,3),(

1

1

1

1

1

1

1

1

jiFmjniF

jiFmjniF

n m

n m (1)

  (2)   ⎭⎬⎫

⎩⎨⎧

= ∑∑x

i

y

jjiFjiFyxL ),(),,(max),(

(a) Motion area detection (b) Barycenter calculation

Fig.6 Flow of image processing After finding the largest block, the system computes

a barycenter G(gx, gy) in the image F(w,h) as presented in Figure 6 (b), and calculates the speed and the direction of the moving hand by referring to image sequence.

B. Experiments

An experiment was conducted to validate the algorithm to extract human hand motions. A subject made a circular motion as shown in Figure 7 (a), and the motion was measured as presented in Figure 7 (b). Small noises along the trajectory are still observed, however the characteristics of circular motions are generously recognized. The execution time for one frame is less than 20 milliseconds while transferring images through USB using a note PC (Intel Pentium M 1.6GHz, 760MB RAM), and we found that the algorithm works in realtime.

(a) Gestural motion (b) Measurement results

Fig. 7 Experiment of gesture tracking 3.3 System applications

To evaluate the gestural interface to be employed for the manipulation of computational system, a 3D table-tennis game played with hand gestures was constructed as shown in Figure 8. A ball with a striped pattern is bouncing around in a virtual 3D space, in which a wall of stacked blocks is situated in the center. A user stands in front of the computer display to see the virtual space, and at the moment the ball comes

back to the entrance face, he strikes the ball by the hitting motion using the hand. The ball bounces back when it hits the wall, and if the ball hits a block, it disappears.

The system recognizes the gestural trajectory, the direction and the motion speed of the user's hand actions, and they are related to the reactions of the ball to bounce back, so that the user plays the virtual tennis by the interaction with the 3D space in realtime. If he gives a stroking gesture to the left, the ball returns to the left direction. By giving an arch motion, he is able to put a spin on the ball according to the amount of the stroke. If he strokes quickly, the ball bounces back with higher speed. The user has to control the direction and speed of the ball to hit a targeted block to disappear, because if the ball speed is not strong enough, the block does not disappear.

We conducted an experiment to evaluate the application system. Nine subjects, who all have the experience of computers more than 5 years and are accustomed to the use of a conventional mouse and a keyboard, played the 3D table-tennis game, and evaluated it from the viewpoints of the user-interface and the interaction with a virtual object by answering an questionnaire. The questions to the subjects are;

A) Could you easily understand the manipulation? B) Were you easily accustomed to the manipulation

employing gestures? C) Could you well control the ball in 3D space?

and the evaluation results are shown in Table 1. The gestural manipulation was evaluated almost

favorably. Especially, most of the users pointed out the intuitive and easy understanding of the manipulation, and evaluated positively. On the other hand, there were several opinions to point out the difficulties in controlling the direction and speed of the ball based on the recognition of gestures. The improvement of the recognition ability and the flexible association of user's actions with CG outputs should be further examined in the future system.

4

Fig. 8 Gesture-manipulated table-tennis

Table 1. Result of the questionnaire Max Min Ave

A 5 3 4.5 B 4 2 3.4 C 3 2 2.6

3.4 Conclusions We developed a vision-based gestural interface to

be used for an interactive manipulation of a virtual object, and introduced an application to a virtual tennis game, by which a user is able to play tennis employing gestural actions without using any sensors or special devices. The interface was preferably evaluated by a questionnaire, and several problems were also obtained. In the next stage, a further flexible interface system employing gestures will be developed by integrating with other modalities such as audition and tactile sensations.

4. Micro-vibration Actuators and the Presentation of Tactile Sensation to Human Skin Humans are able to communicate with each other

by using not only verbal media but also the five senses. However, few devices are found for presenting tactile information together with vision and sounds. This paper introduces the development of a tactile display using a shape-memory alloy (SMA) thread, and describes the presentation of tactile sensations to human skin.

4.1 Design of Tactile Display

We developed a micro vibration actuator electrically driven by periodic signals generated by current control circuits for the tactile information transmission [3]. Figure 9 shows the actuator, which is composed of a 5mm-long SMA thread with a diameter of 0.05mm to be able to attach to body surfaces easily. Figure 10 shows the tactile display setting 6 actuators by 3 x 3 matrix.

Actuator

Insulator

Ball Fig.9 Vibration actuator Fig.10 Tactile display

4.2 Presentation of Tactile Sensations to Narrow Area on Different Body Surfaces Blocks An experiment was conducted for the presentation

of tactile sensations to different body surfaces of an able-bodied subject, to examine the possibility of its information presentation not only to palms but also to any body locations. A palm and the back of a hand have different skin structure of tactile receptors. Hence, the stimulus area was set as shown in Figure 11, where 100 spots were arranged in 4.5 x 18 mm. The vibratory stimuli were generated by setting the frequency 50 Hz and the duty ratio 1:20, and a subject

Shadow of Ball

reported the sensations when the stimuli were applied with the vibration actuator.

5

The results are shown in Figure 12. On the palm, all the stimuli were uniformly perceived as mechanical vibratory sensations. On the other hand, on the back of the hand, a localization of vibratory sensations was found, which proved the different localizations of tactile receptors. It is also assumed that by giving vibratory stimuli with different frequency, another localization of different sensations would be found, which we would study in a further experiment.

4.3 Presentation of Texture Sensation

Two SMA actuators are able to generate Apparent Movement (AM) and Phantom Sensation (PS), and "rubbing" or "stroking" sensations are perceived by multiple AM and PS [3]. By driving 8 actuators, various texture sensations would be presented [4]. An experiment was conducted by employing 9 subjects to examine texture sensations generated by the display. Eight real objects, which are a felt, a towel, a sponge, a handkerchief, a cardboard, a rubber, a paper and a smooth wood board, were prepared, and a subject answered which object had the close tactile texture with a sensation given by the display.

Figure 13 shows the result. With 50 Hz, most subjects perceived rough textures like a felt, a towel, or a sponge. On the other hand, with the frequency 100Hz, smooth objects such as a handkerchief, a cardboard, or a smooth wood board.

4.4 Conclusions

In the paper, different sensations generated by the tactile display with SMA were introduced. By driving actuators randomly, various texture sensations were also perceived. In the next stage, we will investigate further suitable conditions for presenting texture sensations and "real" touch feelings, together with the physiological mechanism of human tactile sensations.

0% 20% 40% 60% 80% 100%

50Hz

100Hz

5. 発話ロボットによる構音動作の再現と

対話型発話訓練システムの提案 音声言語は人間同士のコミュニケーションにお

ける非常に重要な情報伝達手段の一つである。しかし、脳性麻痺患者や聴覚障碍者のように発話障害を持つ場合、発話音声が不明瞭なため日常生活においてコミュニケーションの困難を伴う。そこで、円滑にコミュニケーションをおこなうために発話訓練が必要となる。 脳性麻痺患者・聴覚障碍者は言語聴覚士(ST)に

よる発話訓練を受けているが、その発話訓練において、自分自身で口の中を見ることができない、夏休みなどの長期休暇があると声道形状を忘れてしまう、患者の数に対して ST の数が圧倒的に少ない等の問題がある。そこで、患者が目視で自分自身の声道形状を確認できるもの、ST の代替となるようなものが求められている。以上のことから、患者が自分自身で声道形状を確認しながら訓練できるロボットの実現を考えている。 本研究では、我々がこれまでに開発してきた発

話ロボットが、自律発話学習によって口内の声道形状を推定して再現できることから、脳性麻痺患者の不明瞭要因解析と聴覚障碍者のための対話型発話訓練システムの構築をおこなう。

5.1 発話ロボットの構成 発話ロボットの構成を図 14 に示す。本システム

は、エアーポンプ、人工声帯、鼻腔、共鳴管、マイクロフォン、音響アナライザから構成され、これらはそれぞれ人間の肺、声帯、鼻腔、共鳴管、聴覚に対応している。エアーポンプから送られた空気流が人工声帯を振動させる事で、音源波が生成さる。シリコーンゴムで作成した共鳴管をステンレス棒の上下運動で変形させることにより、音響の共鳴特性を変化させ、任意の音声を生成する[5]。

5.2 発話ロボットの構成 本ロボットは、自己組織化ニューラルネットワ

ーク(SONN)を用いて、発話動作を自律的に獲得する。学習は、図 15 に示すように上向きに自己組

4.5m

m

2m

Fig.11 Stimulus area

18mm

Fig.12 Sensation maps (a) on the palm (b)on the back of the hand

18mm 18mm

4.5m

m

4.5m

m

Felt Towel Sponge HandkerchiefCardboard Rubber Paper Smoothwood

Handkerchief Smooth wood

Felt Towel Sponge

Fig.13 Tendency according to frequency

織化学習(SOM)、下向きにバックプロパゲーション法による3層パーセプトロン学習を使用した。学習フェーズでは、上向き学習でロボットによりランダムに発声された音声が特徴マップにマッピングされる。その後、下向き学習により入力音声から抽出された音響パラメータとモータ制御パラメータが関連付けられる。学習後にターゲットとなる音声を入力すると、競合層のパターンにより出力層からの推定声道形状が求められ音声が生成される。また、本システムは上向き学習にSOM を用いており、少ない学習回数でマップ上に位相構造を作ることができる。そのため、音の特徴が似ていれば近くに、似ていなければ遠くにマッピングでき、未学習な音も推定することができる。SONN 学習後、健常者 5 人に 1 人につき日本語 5 母音を発声させ、その特徴を特徴マップへマッピングした。マッピングの結果を図 16 に示す。

6

図 14 発話ロボットの構成図

図 15 SONN の構成

図16 SONN学習後の健常者日本語5母音マッピング

マッピング結果から、各母音の特徴点がカテゴライズされており、音の特徴が似ている母音は近くに、似ていないものは遠くにマッピングされ、音響特徴の位相構造が獲得できていることがわかる。

5.3 脳性麻痺患者音声の不明瞭要因解析 先の結果をふまえた上で、脳性麻痺患者の不明

瞭な音声を入力音声として用い、どのような声道形状が再現されるかを検証した。実験には脳性麻痺患者 6 名分の 5 母音の音声データを用いた。結果の例を図 17 に示すが、脳性麻痺患者音声 a の/i/だけが唯一健常者母音領域からかなり離れた/u/と/o/領域の間にプロットされた。音声を聞いたところ、こもった/i/のような音声で/u/に近いように聞こえた。再現された声道形状は健常者音声/u/と/o/の中間のような形状だった。つまり、構音動作を良好におこなえていないため、音声が不明瞭になることがわかった。また、今回の実験で、不明瞭な音声でも声道形状を推定して再現できることがわかった。

Low pass FilterSystem Controller

Air PumpMicrophone

Phoneme-MotorController

PitchMotorController

Airflow

Air Buffer

Resonance Tube

Nasal Cavity

Lips

Low pass FilterSystem Controller

Air PumpMicrophone

Phoneme-MotorController

PitchMotorController

Airflow

Air Buffer

Resonance Tube

Nasal Cavity

Lips

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 2401 5 1 42 c d 23 3 a4 f d5 e b6 f 57

(a) 実験後の特徴マップ (b) 再現された声道形状 図 17 特徴マップと声道形状

5.4 脳性麻痺患者音声の不明瞭要因解析 先におこなった実験により不明瞭な音声から、

その声道形状を推定して再現することができた。そこで、発話ロボットを対話型発話訓練装置として応用し、実際に聴覚障碍者に対して対話型発話訓練実験をおこなった。被験者は香川県立聾学校の高等部(12 名)と中等部(7 名)の生徒計 19 名である。発話訓練では日本語 5 母音の訓練を目的としている。訓練の流れは、理想的な母音の口内声道形状を目標形状として、被験者に提示する。次に、被験者は自身の発声音声によりロボットが推定した声道形状を目標形状と比較し、違う部分を自ら意識しながら発声し直すことで、発話訓練おこなう。また、声道形状比較に加え、特徴マップを用いた発話訓練も同時におこなった。特徴マップ上には、発話者の声の特徴が特徴点となって表示される。これを用いて、聴覚障碍者が発声した際に示される特徴点と、目標音声領域の位置を比べ、近ければ発話音声が良好であることを示す。図 18に、少ない回数で訓練が良好におこなわれた例と訓練が成功しなかった例を示す。

3

Output LayerHidden Layer

Input LayerSound Parameters

3 Layered Perceptron

Motor-controlParam

eters Feature Map

Self-Organizing LearningAuditory System

Speech Generation

Output LayerHidden Layer

Input LayerSound Parameters

3 Layered Perceptron

Motor-controlParam

eters Feature Map

Self-Organizing Learning

Output LayerHidden Layer

Input LayerSound Parameters

3 Layered Perceptron

Motor-controlParam

etersM

otor-controlParam

eters Feature Map

Self-Organizing LearningAuditory System

Auditory System

Auditory System

Speech Generation

Speech Generation

Speech Generation

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 2401 5 1 42 23 3456 57 389 3

10 2 1 511 2 41213 41415161718 1 219 2 120 5 421 3 422 32324

1

5

/e /

/i/

/u / /o /

/a /

8 c9 b 310 2 1 5 e11 2 412 f e1314 b15 a16 e17 b f18 1 d d 219 2 e 120 5 a b c 4 521 3 c 422 a 32324 f a

c d

1

4

/e /

/i/

/u / /o /

/a /

7

(a) 少ない回数で成功 (b) 訓練失敗 図 18 発話訓練結果

発話訓練結果から、訓練を良好におこなえた被

験者は高等部の学生が多かった。聾学校の中でも発話訓練がおこなわれており、中等部の学生に比べ高等部の学生の方が発話訓練の経験が多いため良好に訓練をおこなうことができたと考えられる。このことから、発話訓練の達成度は訓練を繰り返すことにより上昇し、実験でおこなった音声特徴点のディスプレイとロボットの声道形状による視覚的提示が有用であることがわかった。

5.5 まとめと今後の課題 本研究では、発話障碍者が自分自身で声道形状

を確認しながら訓練できるロボットを実現するために対話型発話訓練装置を提案した。発話ロボットが自律的に日本語 5 母音の発話動作を獲得するための学習として SONN 学習を提案し、上向き学習に SOM を用いることによって、少ない学習でも未知な音を推定することが可能となった。SONN 学習後、脳性麻痺患者の不明瞭音声を入力として用い、声道形状の再現について検証した結果、不明瞭な音声に対応した声道形状を得ることができた。そこで発話ロボットを応用した、聴覚障碍者のための対話型発話訓練システムの構築をおこなった。実験結果から、発話訓練の達成度は訓練を繰り返すことにより上昇し、実験でおこなった音声特徴点のディスプレイとロボットの声道形状による視覚的提示が有用であることがわかった。今後は、発話訓練装置の実用化に向けた研究を進めていく。 6. まとめ 著者らが現在取り組んでいる、人と機械・システ

ムを結ぶ技術としてのヒューマンインタフェースおよび、情報通信技術について紹介した。我々の研究室では、次世代インタフェース、人間中心システム、人間支援技術の 3 つをキーワードとして、主に音声・音響、視覚、触覚、行動認識に関する研究を進めている。音声・音響については、自律的に発話を獲得する発話ロボットおよび、音声や音響を聞き分けて追跡をおこなう研究を、また人間の視覚情報処理の実現として、顔の認識および追跡、表情抽出による特定個人の同定に関す

る研究を進めている。さらに、音・画像・ジェスチャを統合した、双方向マルチモーダルコミュニケーションシステムの構築を進めている。研究テーマは多岐にわたるが、人間のもつ曖昧さや柔軟さ、感性を計算機によって理解、再現することを目標としており、人間が機械やシステムを意識しないで利用できる新しい計算機システムの実現を目指すものである。また講演では、特に発話ロボット、音響のトラッキング、ジェスチャ認識、触覚ディスプレイに関する研究をビデオデモンストレーションを交えて紹介し、統合メディア技術への展望を述べた。これらは未だ基礎研究段階といえるが、近い将来の実現を目指し、更に取り組みを進めていく。

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 2401 5 1 42

http://www.eng.kagawa-u.ac.jp/~sawada/

(参考文献) [1] Khairunizam Wan and Hideyuki Sawada: "3D

Measurement of human upper body for gesture recognition", International Symposium on Optomechatronic Technologies, Vol. 6718, 67180I-1 - 8, 2007.

[2] Hideyuki Sawada and Toshiya Takechi, "A Robotic Auditory System that Interacts with Musical Sounds and Human Voices", Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol.11, No.10, pp.1177-1183, 2007.

[3] Y. Mizukami and H. Sawada, "Tactile information transmission by apparent movement phenomenon using shape-memory alloy device", International Journal on Disability and Human Development, Vol.5, No.3, pp.277-284, 2006

[4] 水上陽介、澤田秀之:「微少振動子を用いた触

覚ディスプレイと駆動信号の発生確率密度制

御による触覚感覚の呈示」、情報処理学会 イン

タラクション 2008 論文集, pp. 195-202, 2008 [5] Hideyuki Sawada: "Talking Robot and the

Autonomous Acquisition of Vocalization and Singing Skill ", Chapter 22 in Robust Speech Recognition and Understanding, pp.385-404, June 2007, ISBN: 978-3-902613-08-0

e3 23 34 e256 57 3 e189 3 a110 2 i1 1 511 2 4 112 a313 4 u214151617 a218 1 219 u3 2 u1 120 5 o3 4 521 3 422 323 o124

/e /

/i/

/u / /o /

/a /

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240 e1 e51 5 1 42 23 3 i1456 57 3 e2 e48 u2 o19 310 2 1 5 i311 2 4 112 i213 4 o2 a11415161718 e3 1 219 o3 2 120 5 4 521 3 422 323 u124 u3 u4

/e /

/i/

/u / /o /

/a /