Mobile Ani Matt Ion

8/2/2019 Mobile Ani Matt Ion

http://slidepdf.com/reader/full/mobile-ani-matt-ion 1/26

Mobile Facial Animation

ACKNOWLEDGEMENT

I take this opportunity to thank the Almighty for keeping me on the right path and

the immense blessing towards the successful completion of my Seminar.

I wish to express my sincere gratitude to Smt. Geetha Ranjin, H.O.D Department of

Electronics and Communication Engineering for her expert guidance, constant

encouragement and valuable suggestions for the completion of this Seminar.

I am also grateful to my Staff-in-Charge Mr. Ranjith Ram and Mr. Vinod Kumar,

Department of Electronics and Communication Engineering, for always being there to hand

out invaluable pieces of advise.

Last of all, all my teachers and friends who extend every possible assistance they

could.

ROSHITH. P

Govt. College of Engg., Kannur 2005




ABSTRACT

Three-dimensional facial model coding can be employed in various mobile

applications to provide an enhanced user experience. Instead of directly encoding the data

using conventional coding techniques such as MPEG-2, a one-time 3D computer model of

the caller is transmitted at the beginning of the telephone call. There after capturing 3D

movements and mimicry parameters with the camera is all that required to continually see

and hear a true -to-life, synchronized caller on the display. The 3Dmodels are

interchangeable, which means that one person can be displayed on the screen with the

movements of another, it is suitable for use in conjunction with various mobile networks

from GSM to UMTS. However what is less clear is that sensitivity to channel errors of 3D

-coded data.





CONTENTS

CHAPTERS INDEX PAGES

Chapter 1. INTRODUCTION 1

Chapter 2. SYSTEM OVERVIEW 2

Chapter 3. FACIAL ANIMATION AND SPECIFICATION

3.1 MPEG-4 standard

3.2 Face animation parameters

3.3 Facial animation parameter units

3.4 Face feature points

3.5 MPEG-4 facial animations delivery

3

Chapter 4. CODING OF FAP'S

4.1 Arithmetic coding OF FAP'S

4.2. DCT13.1 coding of FAPS

4.3. Interpolation and extrapolation

9

Chapter 5. SYSTEM ARCHITECTURE 12

Chapter 6. CHANNEL MODELS FOR FAP

6.1 GPRS

6.2 EDGE

6.3 Results

14

Chapter 7. ERRORS IN MOBILE FACIAL ANIMATION 18

Chapter 8. APPLICATIONS

7.1 Embodied agents in spoken dialogue systems

7. 2 Language training with talking heads

7.3 Synthetic faces as aids in communication

20

Chapter 9. CONCLUSION 22

REFERENCES 23





CHAPTER 1

INTRODUCTION

Facial animation and virtual human technology in computer graphics has made

considerable advancements during past decades has been a research topic attracting an

-increasing. Number of commercial applications such as mobile platforms,

telecommunications, tele-presence via Internet end or digital entertainment. A number of

mobile applications may benefit from the enhancement that 3-D video can bring including

message services and e-commerce.

Despite of the possible advantages of such technologies, the effect of the mobile link on 3-D video has not considered in the design of it's syntax. Another issue is delivering the

coded bit stream over the wireless network The bandwidth should be as narrow as

possible.

MPEG-4 is the first international standard that standardizes-time multimedia

communication- including natural and, synthetic audio, video and 3D graphics. In order to

define face models there is BIFS for MPEG-4. Within BIFS the FAP coding provides, lower

bit rate for face models.

In order to deliver the services, the possible channel models are GPRS and EDGE.

But the coded data is sensitive to errors, so it should be considered.

In the next session, gives an overview of some relevant parts of FAP technology,

coding of FAP, and discusses different mobile network technology. This is followed by

some results when FAP is delivered through GPRS and EDGE channels and Comparing

channel errors. Next are other noticeable issues involved on this technology. The

applications of facial animation in mobile terminals are also discussed.


1




CHAPTER 2

SYSTEM OVERVIEW

Figure 2.1 System overview

The mobile facial animation can be describe-'' using the above block diagram.

Using a projection camera the 3D input surfaces or facial models are produced. Using facial

animation technique the movements of the face can be tracked. MPEG-4 FAP encoder

encodes the high-resolution facial models. Now this data stream is transmitted over wireless

network. GPRS and EDGE channel models are preferred to use this due to data rate and

bandwidth. In this error rate is small in EDGE. Now in the receiver the data stream is

received using same protocol stack as in transmitter but in the inverse manner. Then it -s

decoded using MPEG-4 FAP decoder. Now the face model is reconstructed.


2




CHAPTER 3

FACIAL ANIMATION AND SPECIFICATION

3.1 MPEG-4 STANDARD

The MPEG-4 Systems standard

1) Contains a method of representing and encoding 3-D scenes, called Binary

Information For Scenes (BIFS).

2) This standard is based on the Virtual Reality Modeling Language (VRML), which

specifies a language for describing 3D scenes.

3) BIFS provides a method for compressing VRML type data, and animating the 3-D

objects in a scene.

4) Within BIFS, MPEG-4 provides Facial Animation Parameter (FAP) encoding.

5) FAP encoding specifically lows the representation, animation, and binary encoding

of facial models. This can be performed using techniques of varying complexity.

Like the VRML standard (3), MPEG-4 BIFS describes 3-D scenes using a series of

nodes. Nodes can describe various scene aspects including object shape, rotation, and

translation. They can also contain other nodes; it is common for scenes to containhierarchical node trees. Although scenes are described using VRML type structures, BIFS

includes a number of features that are not present in VRML:

• Data streaming

• Scene updates

• Compression of scene data

A combination of the first two features allows elements within scenes to beanimated. BIFS also allows scenes to be displayed as the data arrives at the client, while

VRML requires the whole scene to be downloaded before anything is shown.

3.2 FACE ANIMATION PARAMETERS


3




In an effort to standardize face models parameterization, originally* for the purposes

of efficient model-based coding of moving images, the MPEG consortium developed the

MPEG-4 Facial animation standard. This standard defines 68 facial animation parameters

(FAPs) and 84 facial feature points. The facial feature points are well-defined landmark

points on the human face. Face Animation Parameters (FAPs) that have been designed to be

independent of any particular facial model. In other words, essential facial gestures and

visual speech derived from a particular performer will produce good results on other faces

unknown at the time the encoding takes place.

The 68 parameters are categorized into 10 groups related to parts of the face (Table

1). FAPs represent a complete set of basic facial actions including head motion, tongue, eye,

and mouth control. They allow the representation of natural facial expressions. They can

also be used to define facial action' units . Exaggerated values permit the definition of

actions that are normally not possible for humans, but are desirable for cartoon-like

characters.

The FAP set contains the two high-level FAP and 66 low level FAP's .The high

level FAP are visemes and expressions (FAP group 1). A viseme is a visual correlate to a

phoneme. Only 14 static visemes that are clearly distinguished are included in the standard

set. In order to allow for co articulation of speech and mouth movement, transitions from

one viseme to the next are defined by blending the two visemes with a weighting factor.

Similarly, the expression parameter defines 6 high level facial expressions like joy and

sadness (Figure). In contrast to visemes, facial expressions are animated with a value

defining the excitation ,of the expression. Two facial expressions can be blended with a

weighting factor. Since expressions are high-level animation parameters', they allow

animating unknown models with high subjective quality.


4




Figure 3.1 Facial Expressions

Table 3.1: FAP groups

Group Number of FAPs

1: Visemes and expressions 2

2: Jaw, chin, inner lowerlip, cornerlips, midlip 16

3: Eyeballs, pupils, eyelids 12

4: Eyebrow 8

5: Cheeks 46: Tongue 5

7: Head rotation 3

8: Outer lip positions 10

9: Nose 4

10: Ears 4


5




3.3 FACIAL ANIMATION PARAMETER UNITS

The MPEG-4 standard uses a set of parameters (FAP) already explained. As

explained FAP defined independent of a face model. This is accomplished by defining each parameter in a normalized space referring to FAP Units or FAPUs. In a system FAPUs are

computed by measuring the distance of key feature points on the neutral high-resolution

model. Figure shows these key standard measurements including the Eye Separation (ES),

Iris Diameter (IRISD), Eye Nose Separation (ENS), Mouth Nose Separation (MNS), and the

Mouth Width (MW), respectively. FAPUs derived from these measurements can be scaled

and adjusted to produce a "visual volume" best suited for the target face.

MPEG-4 FAPUs measured on a person's face.


6




3.4 FACE FEATURE POINTS

An MPEG-4 compliant face model is a 3D mesh that includes 84 well-defined

Feature Points (Figure). 68Facial Animation Parameters describe the displacements of the

feature points, together with global head rotations around the x,.y and z axes. It is up to the

animation player to reconstruct realistic movements for the remaining points of the face

model. The FAPs are normalized so that the same stream can be used to animate different

face models (FAPU).

Figure 3.4.1 An MPEG-4 compliant face model, the dots representing the 84 Feature

Points (left). An example of varying FAP 4,5,6 and 12 to describe mouth opening

(right)

Some feature points like the ones along the hairline are not affected by FAPs. They

are required for defining the shape of a proprietary face model using feature points. Feature

points are arranged in groups like cheeks, eyes, and mouth (Table 1). The location of these

feature points has to he known for any MPEG-4 compliant face model.

3.5 MPEG-4 FACIAL ANIMATIONS DELIVERY


7

Figure3.4.2 Facial Feature Points




This discussion aims at analyzing the issues in the transmission of MPEG-4

compliant facial animation streams over lossy packet networks, such as wireless LANs, the

Internet, or third generation mobile networks. Many web-based applications now exploit

three-dimensional, animated virtual characters to enrich their user interface . However, the

HTTP/TCP protocols used for animation transport in the majority of existing systems fail to

guarantee a fast interaction over a wide range of network conditions .The use of unreliable,

connectionless transport protocols, such as the Real-time Transport: Protocol (RTP) over

UDP,for the delivery of multimedia content has been proposed in order to reduce end-to-end

latency and improve robustness against network congestion.

The MPEG-4 standard allows for the encoding and representation of a wide range of

natural and synthetic audio and video sources .A major difference with previous multimedia

standards lies in its object-based approach in which a scene is composed of severs! Audio-

Visual Objects, each of them being represented through an elementary bit stream.

One such object is the Face Object-, a three dimensional face model (either human-

or cartoon-like)that may be animated by a set of Facial Animation Parameters (FAP)

.Because of the complexity of implementing a complete MPEG-4 Systems architecture, a

common approach in web-based applications is that of directly carrying a single

elementary stream over the lightweight RTP protocol . For face models very low bit rate and

the use of a model-based, variable-length, predictive encoding is used. MPEG-4 employs

the highly efficient arithmetic and DOT (Discrete Cosine Transform) coding

algorithms to reduce temporal redundancy in FAP streams. Bit rates as low as 2 Kbps can

be achieved .Thus, the frame size becomes comparable with the size of RTP/UDP/IP

headers. The use of these algorithms also means that the loss/late arrival of a single packet

may destroy a significant amount of information, hence requiring the use of error resilience

and/or concealment techniques .Finally, the specific bit stream syntax often requires a

significant amount of look-ahead in the decoding process: if a packet is lost or corrupt, the

decoding process is interrupted up to the next reference frame. This work investigates the

effects on bandwidth .


8




CHAPTER 4

CODING OF FAP'S

One key issue in making use of FAP technology is how FAP parameters are

obtained ready for encoding. The standard MPEG-4 FAP encoder software uses text based

FAP files as input. These text-based files contain Various parameters specifying how the

face moves, anyhow the binary encoded FAP should be produced. Generation of these FAP

files may be achieved in two ways: manually, or automatically by employing image

Processing algorithms. A number of image processing techniques have been proposed,

which are capable of identifying and tracking facial features.

Basically for coding facial animation parameters,MPEG-4 provides two tools,

ceding of quantized and temporally predicted FAP's using an arithmetic coder allows for

coding of FAP's introducing small delay only. Using a discrete cosine transform(DCT) for

encoding a sequence of FAP's introduces significant delay but achieves higher coding

efficiency.

4.1 ARITHMETIC CODING OF FAP'S

The figure below shows the block diagram for encoding FAP's. The first set of FAP

values FAP (i)0 at the instant 0 is coded in intra mode. the value of an FAP at time instant k

FAP(i)k is predicted using the previously decoded value FAP(i)k-l.the prediction error e is

quantized using a quantization step size that is specified for each FAP multiplied by a

quantization parameter FAP_QUANT with CK FAP_QUANT<9. FAP_QUANT is identical

for all FAP values of one time instant k. using the FAP dependent quantization step size and

FAP_QUANT assures that quantization errors are subjectively evenly distributed between

different FAP's. The quantization prediction error e is arithmetically encoded using a

separate adaptive probability model for each FAP's. Since the encoding of the current FAP

value depends only on one previously coded FAP value, this coding scheme allows for low-

delay communications. At the decoder, the received data is arithmetically decoded,

.dequantized and added to the previously decoded value in order to, recover the,encoded

FAP value.


9




In order to avoid transmitting all FAPs for every frame, the encoder can transmit a

mask indicating for which group FAP values are transmitted. The encoder can also specify

for which FAPs within a group values will be transmitted. This allows the encoder .to send

incomplete sets of FAPs to the decoder.

Figure 4.1.1 Block diagram of the encoder using arithmetic coding for FAPs.

4.2. DCT CODING OF FAPS

The Second coding tool that is provided for coding FAPs is the discrete cosine

transform applied to 16 consecutive FAP values (Figure). This introduces a significant delay

into the coding and decoding process. Hence, this coding method is mainly useful for

application where animation parameter streams are retrieved from a data-base. This coder

replaces the coder shown in arithmetic coding. After computing the DCT of 16 consecutive

values of one FAP, DC and AC coefficients are coded differently. Whereas the DC value is

coded predictively using the previous DC coefficient as prediction, the AC coefficients

directly coded. The AC coefficient and the prediction error of the DC coefficient are

linearly quantized. Whereas the quantizer step size can be controlled, the ratio between the

quantizer step size of the DC coefficients and the AC coefficients is set-to 1 A. The quantized

AC coefficients are encoded with one variable length code word (VLC) defining the number

of zero-coefficients prior to the next non-zero coefficient and one VLC for the amplitude

this non-zero coefficient. The hanging of the de-coded FAPs is not changed.


10




Figure 4.2.1 Block diagram of the FAP encoder using DCT. DC coefficients are

predictively coded. AC coefficients are directly coded.

4.3. INTERPOLATION AND EXTRAPOLATIONThe encoder may allow the decoder to extrapolate the values of some FAPs from the

transmitted FAPs. Alternatively, the decoder can specify the interpolation rules using FAP

interpolation tables (FIT). A FIT allows a smaller set of FAPs to be sent during a facial

animation. This small set can then be used to determine the values of other FAPs, using a

rational polynomial mapping between parameters. For example, the top inner lip FAPs can

be sent and then used to determine the top outer lip FAPs. The inner lip FAPs would be

mapped to the outer lip FAPs using a rational polynomial function that is specified in the

FIT.

The decoder can extrapolate values of unspecified FAPs, in order to create a more

complete set of FAPs. The standard is vague in specifying how the decoder is supposed to

extrapolate FAP values. Examples are that if only FAPs for the left half of a face are

transmitted, the corresponding FAPs of the right side have to be set such that the face moves

symmetrically. If the encoder only specifies motion of the inner lip (FAP group 2), the

motion of the outer lip (FAP group 8) has .to be extrapolated. Letting the decoder

extrapolate FAP values may create unexpected-results unless FAP interpolation functions

are defined.


11




CHAPTER 5

SYSTEM ARCHITECTURE

On the transmitting machine /7X) an uncompressed FAP file is encoded in real time.

A dedicated hardware motion capture system could also be taken as the FAP source. The

transmitter is responsible for applying the desired encoding parameters and implementing

the packetization policy.

Figure 5.1 System Architecture

The HTTP/TCP protocols used for animation transport in the majority of existing

systems fail to guarantee a fast interaction over a wide range of network conditions. The

use of unreliable, connectionless transport protocols, such as the Real-time Transport

Protocol (RTP) over UDP, for the delivery of multimedia content has been proposed in

order to reduce end-to-end latency and improve robustness against network congestion .

On the receiving terminal (KX), a network buffer is used to compensate for jitter and

out-of-order arrival of packets. As soon as a sufficient quantity of packets is received

(typically, 12 packets or about 1 second; the exact number may be adjusted to fit network

conditions), the decoder starts processing the received stream. After reassembling the bit

stream, the receiver is to detect and hide network errors from the animation' player. As soon

as an error/packet loss is detected, the decoder starts a search for the next reference frame in

the bit stream. When this is reached, the decoding process can restart.

Another issue is the generation of 3-D face models that resemble the speaker. FAP

data can be decoded and applied either to default facial models on the end user's terminal, or

can be applied to models downloaded for a particular session that more accurately represent

the speaker. Various methods have been proposed to produce 3-D models of human faces

from camera images.


12




To find an effective compromise between bandwidth and video quality, the choice

of encoding and packetization parameters must take into account the characteristics of the

channel on which the animation is transmitted. So the channel should be enough bandwidth.


13




CHAPTER 6

CHANNEL MODELS FOR FAP

For FAP the suitable channel models are GPRS channel and EDGE (Enhanced Data

rates for GSM and TDMA/136 Evolution) channel .The following figure gives relevant

comparison of different networks. From that it is clear that high bandwidth is needed for

low error, which can give, by GPRS and EDGE.

Table for comparison of error in different application

Application

environment

Packet loss IFD Frames/ RTP

packet

Animation

Bitrate

Buffering for

Error

Concealment

Uni-directional

applications

<5% 5-7 2 5 Kbps -500 ms

Interactive

applications Low delay

<5% 3-5 2 6 Kbps -200 ms

Wireless/mobile

networks

10-15% 1-3 2 9 Kbps -120ms

6.1 GPRS

GPRS is a wireless packet based network architecture using GSM radio systems.

The original design of GPRS has been driven by non real time requirements. Nevertheless

the adaptive multi slot capability of GPRS which allows for dynamic allocations of

timeslots to a given terminal, provided enough bandwidth for the support of a limited set of

multimedia enabled services. Furthermore the native support of the IP protocol allows a

simple .interfacing of current IP/RTP based multimedia applications such as facialanimation streaming to a GPRS network

For the GPRS channel model, the propagation conditions were those specified in

GSM 05.05 as TU50 Ideal Frequency Hopping at 900MHz. The TU50 channel model

represents multi-path propagation conditions found in typical urban conditions. Four

channel coding schemes are specified for GPRS, three of which were employed here. The

frames are convolution coded under different rates. In convolutional coding when we use v

output symbols for each input code the convolutional code rate is 1/p.when k symbols are

shifted using k shift .registers and output symbols are v then code rate is k/v. The schemes


14




used for GPRS are labeled CS-1, CS-2, CS-3, and respectively correspond to convolutional

code rates of ½, 2/3 and 3/4.

Figure below shows the results of GPRS simulations performed using various

channel coding schemes, at a number of C/I13.2 ratios. PSNR values above 45 dB generally

indicate very infrequent error bursts. Values between 40 and 45 dB indicate more frequent

errors, but overall quality is likely to be acceptable to many users. Taking this as a guide, it

is clear that acceptable quality is achievable using the entire channel coding schemes tested.

However, relatively high C/I ratios are required using CS-3, making the use of this scheme

undesirable.

Figure 6.1.1 : PSNR results for GPRS channel of FAP transmission with 11-frame per

Second


15




6.2 EDGE

Beyond GPRS, EDGE (Enhanced Data Rates For GSM Evolution) is a generation

2.5-air interface, which represents a step towards UMTS. It provides higher data rates thanGPRS and introduces a new modulation technique called eight-phase-shift-keying (8-psk)

that allows a much higher bit rates and automatically adapts to radio conditions. EDGE

shares its available bandwidth among users on one carrier in a sector, which ranges from

several 10's of kbps to 384 kbps, depending on various conditions such as propagation,

interference, and traffic load.

The network chooses a maximum number of retransmissions that may be attempted

for each link layer segment. Link adaptation is used in EDGE so that the system can select

the most efficient modulation and coding scheme for each mobile based on its current

channel condition method. EDGE uses 8 different channel coding schemes. Some of them

are based on the convolutional coding with correct error correction capabilities.

For the EDGE channel model, the propagation conditions were again those specified

in GSM 05.05 with ideal frequency hopping. However, for this model the mobile terminal

speed is set to 3 km/hr. eight joint modulation-coding schemes are specified, which make

use of two different modulation schemes and various convolutional coding rates.

Modulation is either GMSK, as used in GSM and GPRS, or 8-PSK, which gives higher data

rates. Two GMSK schemes are used here: MCS-1 and MCS :2 correspond to covolutional

code rates of 0.53 and 0.66, Two 8-PSK schemes are also tested: MCS-5 and MCS-6

correspond to covolutional code rates of 0.37and 0.49. The other modulation-coding

schemes resulted in the transmitted data being subjected to error rates that are too high to

consider for transmission of FAP's.

Results with the EDGE channel model are shown in figure below, The results show

that transmission of FAP's using the 8-PSK-modulation scheme is likely to result in

unacceptable quality unless the C/I ratio is greater than 18 dB. Even with GMSK

modulation, acceptable quality decoding of FAP's may only realistically be possible using

MCS-1, unless a C/Lratio greater than 15 dB can be guaranteed. In terms of error rates,

EDGE provides a more hostile environment to multimedia than GPRS.


16




Figure 6.2.1 PSNR results of FAP transmission over and EDGE channel with 11-frameper second.

6.3 RESULTS

1. Freezing of the animation: Corrupted data is detected before 4t is displayed. The

display freezes while the decoder searches for the next resync code.

2. Catastrophic display of corrupted data. Corrupted data is not detected before it is

displayed. This leads to highly obvious, "catastrophic" errors being visible in the

decoder display (see figure below).


17




CHAPTER 7

ERRORS IN MOBILE FACIAL ANIMATION

A barrier to the introduction of FAP technology to mobile devices is computational

Complexity. This is an issue for both the encoding and decoding terminals. At the decoder,

the 3-D model must be reconstructed and rendered. Fortunately, MPEG-4 FAP models are

relatively simple compared to many modern 3-D applications, and can be rendered on

relatively cheap, low power hardware.

Producing a compressed FAP bit stream from a text based FAP file consumes very

little processing power. However, some of the image processing algorithms required to

produce the FAP file parameters are often complex. This does not necessarily prohibit the

use of FAP encoding in mobile devices. If the application is not real-time then the

processing could be performed in the background by the mobile device.

Predictive coding means that errors encountered in one P-frame propagate to

following P frames. This makes the regular insertion of I-frames vital for combating the

effects of channel errors. The channel errors cause loss of synchronization. The effects of

synchronization loss are commonly limited through the insertion of ^synchronization code

words. For MPEG-4 natural video coding, resync code words are inserted at the beginning

of every frame, and also at regular locations within, each frame when the error resilience

modes are used. However, because FAP's can be compressed down to such low bit rates, it

would be inappropriate to insert lengthy resync code words, at such a frequency.

When there is no effective error detection and concealment appropriate algorithm

resync code words were inserted before every I-frame. Undetected error scan often cause

very serious problems in the displayed output. P-frames following such serious errors would

not improve the quality, and can therefore be skipped.

In the receiver error resilience can be achieved through two complementary

mechanisms: early error detection and interpolation-based concealment. The first guarantees

that packet losses and bit stream errors are signaled to the concealment module as soon as

possible. The second one-is responsible for recovering missing or corrupted frames.


18




Typically, an error is detected only indirectly after some frames, and the perfect

localization of the error is often impossible. It is thus very important detect the error as early

as possible. With this in mind, the bit stream syntax was analyzed in order to pinpoint the

places where an error could be detected.

Based on several assumptions that stand in the case of most on-line transmissions, it

introduced optional checks for the values of certain fields of the stream. While these fields,

like gender bit, coding type, and object mask, are theoretically unconstrained, in practice

they are not supposed to change during a single session.

On one hand, the need for an error concealment module is increased by the fact that

the loss of a single P-frame prevents the correct decoding of the following ones. On the

other hand, given that the facial animation parameters represent 1-D displacements of the

Feature Points, and that loss bursts are typically comparable with the length of-a phoneme

(during which the mouth position, or viseme, does not vary significantly), the use-of error

concealment techniques based on Interpolation proves effective in reconstructing FAP

trajectories.

In the MPEG-4 version different software's are developing for FAP encoding &

decoding in mobile platforms. But it is necessary for them to add the following

functionality.

• Error resilient decoding: When errors are detected, the decoder freezes the display,

and searches for the next resync code. There is no error concealment built in, it has

to be added.

• Regular insertions of resync codes: A 32-bit resync code -specified in the MPEG-4

standard is inserted before every I-frame to limit the effects of synchronization loss.

• Output of decoded visual data to file: The displayed output is written to a series of

bitmap files, to aid quality evaluation and comparison of test result.


19




CHAPTER 8

APPLICATIONS

8.1 EMBODIED AGENTS IN SPOKEN DIALOGUE SYSTEMS

Using facial animation we can create talking heads, which deliver their services

through mobile phones. Users were able to ask questions about available services to the

talking heads on their mobile phones. Examples of the services are timetables for trains,

accommodation and location of hotels. The system may use graphical interface.

Other than providing lip movements to accompany the synthesized voice output, the

head was capable of deictic movements: when information (e.g. a timetable) was presented

somewhere in the graphical interface, the face would look and turn towards that location on

the screen, thereby guiding the user's attention.

Figure 8.1.1 Talking head for service assistance

8. 2 LANGUAGE TRAINING WITH TALKING HEADS

Using a multimedia-communicating device, the facial animation can be used as

language training tool. Rather than aiming at building fixed set of speech training

applications, this focused on integrating a number of relevant technologies into an

interactive, easy to environment, making it possible for teachers, parents and other


20




interested parties to construct applications involving multimodal speech technology. Using

the graphical interface (GUI), users could select different viewings of face and tongue.

Figure 8.2.1 Software tool for language training (remote assistance); Talking head

animated for mouth.

8.3 SYNTHETIC FACES AS AIDS IN COMMUNICATION

This application aims a communication device that, in a speaker independent

fashion, translates telephone quality speech signals into visible articulatory motion in a

synthetic talking head with sufficient accuracy to provide significant speech reading support

to the hearing impaired user, improving his ability to communicate over mobile phones.

There are two factors makes it difficult.

I. The device has to be speaker independent

II. It has to work in real time, with no more than about 100msec delay

There are efforts going on to solve this problem


21




CHAPTER 9

CONCLUSION

FAP coding provides a method of supplying animated 3-D representations of

speakers at very low bandwidths. Although the processing power involved in acquiring FAP

information appropriate for encoding may be challenging for mobile terminals, trading

quality for complexity may produce feasible solutions. When it is carried out using the

GPRS and EDGE channel models revealed that FAP coded streams are reasonably robust to

error when compared to normal coded video. However, certain channel errors produce

highly disturbing effects that indicate the need for efficient error detection and concealment

schemes. Investigation of more advanced resynchronization code insertion schemes is alsorecommended.


22




REFERENCES

[1] Jochen Schiller-“Mobile Communications”, 2nd Edition, ADDISON

WESLEY Publishers, 2003.

[2] www.apple.com /mpeg4.

[3] www.vidiator.com

[4] www.visage technologies.com

23

Documents

Mobile Ani Matt Ion