Upload
irshad-ahmed
View
225
Download
0
Embed Size (px)
Citation preview
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 1/26
Mobile Facial Animation
ACKNOWLEDGEMENT
I take this opportunity to thank the Almighty for keeping me on the right path and
the immense blessing towards the successful completion of my Seminar.
I wish to express my sincere gratitude to Smt. Geetha Ranjin, H.O.D Department of
Electronics and Communication Engineering for her expert guidance, constant
encouragement and valuable suggestions for the completion of this Seminar.
I am also grateful to my Staff-in-Charge Mr. Ranjith Ram and Mr. Vinod Kumar,
Department of Electronics and Communication Engineering, for always being there to hand
out invaluable pieces of advise.
Last of all, all my teachers and friends who extend every possible assistance they
could.
ROSHITH. P
Govt. College of Engg., Kannur 2005
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 2/26
Mobile Facial Animation
ABSTRACT
Three-dimensional facial model coding can be employed in various mobile
applications to provide an enhanced user experience. Instead of directly encoding the data
using conventional coding techniques such as MPEG-2, a one-time 3D computer model of
the caller is transmitted at the beginning of the telephone call. There after capturing 3D
movements and mimicry parameters with the camera is all that required to continually see
and hear a true -to-life, synchronized caller on the display. The 3Dmodels are
interchangeable, which means that one person can be displayed on the screen with the
movements of another, it is suitable for use in conjunction with various mobile networks
from GSM to UMTS. However what is less clear is that sensitivity to channel errors of 3D
-coded data.
Govt. College of Engg., Kannur 2005
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 3/26
Mobile Facial Animation
CONTENTS
CHAPTERS INDEX PAGES
Chapter 1. INTRODUCTION 1
Chapter 2. SYSTEM OVERVIEW 2
Chapter 3. FACIAL ANIMATION AND SPECIFICATION
3.1 MPEG-4 standard
3.2 Face animation parameters
3.3 Facial animation parameter units
3.4 Face feature points
3.5 MPEG-4 facial animations delivery
3
Chapter 4. CODING OF FAP'S
4.1 Arithmetic coding OF FAP'S
4.2. DCT13.1 coding of FAPS
4.3. Interpolation and extrapolation
9
Chapter 5. SYSTEM ARCHITECTURE 12
Chapter 6. CHANNEL MODELS FOR FAP
6.1 GPRS
6.2 EDGE
6.3 Results
14
Chapter 7. ERRORS IN MOBILE FACIAL ANIMATION 18
Chapter 8. APPLICATIONS
7.1 Embodied agents in spoken dialogue systems
7. 2 Language training with talking heads
7.3 Synthetic faces as aids in communication
20
Chapter 9. CONCLUSION 22
REFERENCES 23
Govt. College of Engg., Kannur 2005
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 4/26
Mobile Facial Animation
CHAPTER 1
INTRODUCTION
Facial animation and virtual human technology in computer graphics has made
considerable advancements during past decades has been a research topic attracting an
-increasing. Number of commercial applications such as mobile platforms,
telecommunications, tele-presence via Internet end or digital entertainment. A number of
mobile applications may benefit from the enhancement that 3-D video can bring including
message services and e-commerce.
Despite of the possible advantages of such technologies, the effect of the mobile link on 3-D video has not considered in the design of it's syntax. Another issue is delivering the
coded bit stream over the wireless network The bandwidth should be as narrow as
possible.
MPEG-4 is the first international standard that standardizes-time multimedia
communication- including natural and, synthetic audio, video and 3D graphics. In order to
define face models there is BIFS for MPEG-4. Within BIFS the FAP coding provides, lower
bit rate for face models.
In order to deliver the services, the possible channel models are GPRS and EDGE.
But the coded data is sensitive to errors, so it should be considered.
In the next session, gives an overview of some relevant parts of FAP technology,
coding of FAP, and discusses different mobile network technology. This is followed by
some results when FAP is delivered through GPRS and EDGE channels and Comparing
channel errors. Next are other noticeable issues involved on this technology. The
applications of facial animation in mobile terminals are also discussed.
Govt. College of Engg., Kannur 2005
1
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 5/26
Mobile Facial Animation
CHAPTER 2
SYSTEM OVERVIEW
Figure 2.1 System overview
The mobile facial animation can be describe-'' using the above block diagram.
Using a projection camera the 3D input surfaces or facial models are produced. Using facial
animation technique the movements of the face can be tracked. MPEG-4 FAP encoder
encodes the high-resolution facial models. Now this data stream is transmitted over wireless
network. GPRS and EDGE channel models are preferred to use this due to data rate and
bandwidth. In this error rate is small in EDGE. Now in the receiver the data stream is
received using same protocol stack as in transmitter but in the inverse manner. Then it -s
decoded using MPEG-4 FAP decoder. Now the face model is reconstructed.
Govt. College of Engg., Kannur 2005
2
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 6/26
Mobile Facial Animation
CHAPTER 3
FACIAL ANIMATION AND SPECIFICATION
3.1 MPEG-4 STANDARD
The MPEG-4 Systems standard
1) Contains a method of representing and encoding 3-D scenes, called Binary
Information For Scenes (BIFS).
2) This standard is based on the Virtual Reality Modeling Language (VRML), which
specifies a language for describing 3D scenes.
3) BIFS provides a method for compressing VRML type data, and animating the 3-D
objects in a scene.
4) Within BIFS, MPEG-4 provides Facial Animation Parameter (FAP) encoding.
5) FAP encoding specifically lows the representation, animation, and binary encoding
of facial models. This can be performed using techniques of varying complexity.
Like the VRML standard (3), MPEG-4 BIFS describes 3-D scenes using a series of
nodes. Nodes can describe various scene aspects including object shape, rotation, and
translation. They can also contain other nodes; it is common for scenes to containhierarchical node trees. Although scenes are described using VRML type structures, BIFS
includes a number of features that are not present in VRML:
• Data streaming
• Scene updates
• Compression of scene data
A combination of the first two features allows elements within scenes to beanimated. BIFS also allows scenes to be displayed as the data arrives at the client, while
VRML requires the whole scene to be downloaded before anything is shown.
3.2 FACE ANIMATION PARAMETERS
Govt. College of Engg., Kannur 2005
3
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 7/26
Mobile Facial Animation
In an effort to standardize face models parameterization, originally* for the purposes
of efficient model-based coding of moving images, the MPEG consortium developed the
MPEG-4 Facial animation standard. This standard defines 68 facial animation parameters
(FAPs) and 84 facial feature points. The facial feature points are well-defined landmark
points on the human face. Face Animation Parameters (FAPs) that have been designed to be
independent of any particular facial model. In other words, essential facial gestures and
visual speech derived from a particular performer will produce good results on other faces
unknown at the time the encoding takes place.
The 68 parameters are categorized into 10 groups related to parts of the face (Table
1). FAPs represent a complete set of basic facial actions including head motion, tongue, eye,
and mouth control. They allow the representation of natural facial expressions. They can
also be used to define facial action' units . Exaggerated values permit the definition of
actions that are normally not possible for humans, but are desirable for cartoon-like
characters.
The FAP set contains the two high-level FAP and 66 low level FAP's .The high
level FAP are visemes and expressions (FAP group 1). A viseme is a visual correlate to a
phoneme. Only 14 static visemes that are clearly distinguished are included in the standard
set. In order to allow for co articulation of speech and mouth movement, transitions from
one viseme to the next are defined by blending the two visemes with a weighting factor.
Similarly, the expression parameter defines 6 high level facial expressions like joy and
sadness (Figure). In contrast to visemes, facial expressions are animated with a value
defining the excitation ,of the expression. Two facial expressions can be blended with a
weighting factor. Since expressions are high-level animation parameters', they allow
animating unknown models with high subjective quality.
Govt. College of Engg., Kannur 2005
4
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 8/26
Mobile Facial Animation
Figure 3.1 Facial Expressions
Table 3.1: FAP groups
Group Number of FAPs
1: Visemes and expressions 2
2: Jaw, chin, inner lowerlip, cornerlips, midlip 16
3: Eyeballs, pupils, eyelids 12
4: Eyebrow 8
5: Cheeks 46: Tongue 5
7: Head rotation 3
8: Outer lip positions 10
9: Nose 4
10: Ears 4
Govt. College of Engg., Kannur 2005
5
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 9/26
Mobile Facial Animation
3.3 FACIAL ANIMATION PARAMETER UNITS
The MPEG-4 standard uses a set of parameters (FAP) already explained. As
explained FAP defined independent of a face model. This is accomplished by defining each parameter in a normalized space referring to FAP Units or FAPUs. In a system FAPUs are
computed by measuring the distance of key feature points on the neutral high-resolution
model. Figure shows these key standard measurements including the Eye Separation (ES),
Iris Diameter (IRISD), Eye Nose Separation (ENS), Mouth Nose Separation (MNS), and the
Mouth Width (MW), respectively. FAPUs derived from these measurements can be scaled
and adjusted to produce a "visual volume" best suited for the target face.
MPEG-4 FAPUs measured on a person's face.
Govt. College of Engg., Kannur 2005
6
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 10/26
Mobile Facial Animation
3.4 FACE FEATURE POINTS
An MPEG-4 compliant face model is a 3D mesh that includes 84 well-defined
Feature Points (Figure). 68Facial Animation Parameters describe the displacements of the
feature points, together with global head rotations around the x,.y and z axes. It is up to the
animation player to reconstruct realistic movements for the remaining points of the face
model. The FAPs are normalized so that the same stream can be used to animate different
face models (FAPU).
Figure 3.4.1 An MPEG-4 compliant face model, the dots representing the 84 Feature
Points (left). An example of varying FAP 4,5,6 and 12 to describe mouth opening
(right)
Some feature points like the ones along the hairline are not affected by FAPs. They
are required for defining the shape of a proprietary face model using feature points. Feature
points are arranged in groups like cheeks, eyes, and mouth (Table 1). The location of these
feature points has to he known for any MPEG-4 compliant face model.
3.5 MPEG-4 FACIAL ANIMATIONS DELIVERY
Govt. College of Engg., Kannur 2005
7
Figure3.4.2 Facial Feature Points
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 11/26
Mobile Facial Animation
This discussion aims at analyzing the issues in the transmission of MPEG-4
compliant facial animation streams over lossy packet networks, such as wireless LANs, the
Internet, or third generation mobile networks. Many web-based applications now exploit
three-dimensional, animated virtual characters to enrich their user interface . However, the
HTTP/TCP protocols used for animation transport in the majority of existing systems fail to
guarantee a fast interaction over a wide range of network conditions .The use of unreliable,
connectionless transport protocols, such as the Real-time Transport: Protocol (RTP) over
UDP,for the delivery of multimedia content has been proposed in order to reduce end-to-end
latency and improve robustness against network congestion.
The MPEG-4 standard allows for the encoding and representation of a wide range of
natural and synthetic audio and video sources .A major difference with previous multimedia
standards lies in its object-based approach in which a scene is composed of severs! Audio-
Visual Objects, each of them being represented through an elementary bit stream.
One such object is the Face Object-, a three dimensional face model (either human-
or cartoon-like)that may be animated by a set of Facial Animation Parameters (FAP)
.Because of the complexity of implementing a complete MPEG-4 Systems architecture, a
common approach in web-based applications is that of directly carrying a single
elementary stream over the lightweight RTP protocol . For face models very low bit rate and
the use of a model-based, variable-length, predictive encoding is used. MPEG-4 employs
the highly efficient arithmetic and DOT (Discrete Cosine Transform) coding
algorithms to reduce temporal redundancy in FAP streams. Bit rates as low as 2 Kbps can
be achieved .Thus, the frame size becomes comparable with the size of RTP/UDP/IP
headers. The use of these algorithms also means that the loss/late arrival of a single packet
may destroy a significant amount of information, hence requiring the use of error resilience
and/or concealment techniques .Finally, the specific bit stream syntax often requires a
significant amount of look-ahead in the decoding process: if a packet is lost or corrupt, the
decoding process is interrupted up to the next reference frame. This work investigates the
effects on bandwidth .
Govt. College of Engg., Kannur 2005
8
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 12/26
Mobile Facial Animation
CHAPTER 4
CODING OF FAP'S
One key issue in making use of FAP technology is how FAP parameters are
obtained ready for encoding. The standard MPEG-4 FAP encoder software uses text based
FAP files as input. These text-based files contain Various parameters specifying how the
face moves, anyhow the binary encoded FAP should be produced. Generation of these FAP
files may be achieved in two ways: manually, or automatically by employing image
Processing algorithms. A number of image processing techniques have been proposed,
which are capable of identifying and tracking facial features.
Basically for coding facial animation parameters,MPEG-4 provides two tools,
ceding of quantized and temporally predicted FAP's using an arithmetic coder allows for
coding of FAP's introducing small delay only. Using a discrete cosine transform(DCT) for
encoding a sequence of FAP's introduces significant delay but achieves higher coding
efficiency.
4.1 ARITHMETIC CODING OF FAP'S
The figure below shows the block diagram for encoding FAP's. The first set of FAP
values FAP (i)0 at the instant 0 is coded in intra mode. the value of an FAP at time instant k
FAP(i)k is predicted using the previously decoded value FAP(i)k-l.the prediction error e is
quantized using a quantization step size that is specified for each FAP multiplied by a
quantization parameter FAP_QUANT with CK FAP_QUANT<9. FAP_QUANT is identical
for all FAP values of one time instant k. using the FAP dependent quantization step size and
FAP_QUANT assures that quantization errors are subjectively evenly distributed between
different FAP's. The quantization prediction error e is arithmetically encoded using a
separate adaptive probability model for each FAP's. Since the encoding of the current FAP
value depends only on one previously coded FAP value, this coding scheme allows for low-
delay communications. At the decoder, the received data is arithmetically decoded,
.dequantized and added to the previously decoded value in order to, recover the,encoded
FAP value.
Govt. College of Engg., Kannur 2005
9
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 13/26
Mobile Facial Animation
In order to avoid transmitting all FAPs for every frame, the encoder can transmit a
mask indicating for which group FAP values are transmitted. The encoder can also specify
for which FAPs within a group values will be transmitted. This allows the encoder .to send
incomplete sets of FAPs to the decoder.
Figure 4.1.1 Block diagram of the encoder using arithmetic coding for FAPs.
4.2. DCT CODING OF FAPS
The Second coding tool that is provided for coding FAPs is the discrete cosine
transform applied to 16 consecutive FAP values (Figure). This introduces a significant delay
into the coding and decoding process. Hence, this coding method is mainly useful for
application where animation parameter streams are retrieved from a data-base. This coder
replaces the coder shown in arithmetic coding. After computing the DCT of 16 consecutive
values of one FAP, DC and AC coefficients are coded differently. Whereas the DC value is
coded predictively using the previous DC coefficient as prediction, the AC coefficients
directly coded. The AC coefficient and the prediction error of the DC coefficient are
linearly quantized. Whereas the quantizer step size can be controlled, the ratio between the
quantizer step size of the DC coefficients and the AC coefficients is set-to 1 A. The quantized
AC coefficients are encoded with one variable length code word (VLC) defining the number
of zero-coefficients prior to the next non-zero coefficient and one VLC for the amplitude
this non-zero coefficient. The hanging of the de-coded FAPs is not changed.
Govt. College of Engg., Kannur 2005
10
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 14/26
Mobile Facial Animation
Figure 4.2.1 Block diagram of the FAP encoder using DCT. DC coefficients are
predictively coded. AC coefficients are directly coded.
4.3. INTERPOLATION AND EXTRAPOLATIONThe encoder may allow the decoder to extrapolate the values of some FAPs from the
transmitted FAPs. Alternatively, the decoder can specify the interpolation rules using FAP
interpolation tables (FIT). A FIT allows a smaller set of FAPs to be sent during a facial
animation. This small set can then be used to determine the values of other FAPs, using a
rational polynomial mapping between parameters. For example, the top inner lip FAPs can
be sent and then used to determine the top outer lip FAPs. The inner lip FAPs would be
mapped to the outer lip FAPs using a rational polynomial function that is specified in the
FIT.
The decoder can extrapolate values of unspecified FAPs, in order to create a more
complete set of FAPs. The standard is vague in specifying how the decoder is supposed to
extrapolate FAP values. Examples are that if only FAPs for the left half of a face are
transmitted, the corresponding FAPs of the right side have to be set such that the face moves
symmetrically. If the encoder only specifies motion of the inner lip (FAP group 2), the
motion of the outer lip (FAP group 8) has .to be extrapolated. Letting the decoder
extrapolate FAP values may create unexpected-results unless FAP interpolation functions
are defined.
Govt. College of Engg., Kannur 2005
11
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 15/26
Mobile Facial Animation
CHAPTER 5
SYSTEM ARCHITECTURE
On the transmitting machine /7X) an uncompressed FAP file is encoded in real time.
A dedicated hardware motion capture system could also be taken as the FAP source. The
transmitter is responsible for applying the desired encoding parameters and implementing
the packetization policy.
Figure 5.1 System Architecture
The HTTP/TCP protocols used for animation transport in the majority of existing
systems fail to guarantee a fast interaction over a wide range of network conditions. The
use of unreliable, connectionless transport protocols, such as the Real-time Transport
Protocol (RTP) over UDP, for the delivery of multimedia content has been proposed in
order to reduce end-to-end latency and improve robustness against network congestion .
On the receiving terminal (KX), a network buffer is used to compensate for jitter and
out-of-order arrival of packets. As soon as a sufficient quantity of packets is received
(typically, 12 packets or about 1 second; the exact number may be adjusted to fit network
conditions), the decoder starts processing the received stream. After reassembling the bit
stream, the receiver is to detect and hide network errors from the animation' player. As soon
as an error/packet loss is detected, the decoder starts a search for the next reference frame in
the bit stream. When this is reached, the decoding process can restart.
Another issue is the generation of 3-D face models that resemble the speaker. FAP
data can be decoded and applied either to default facial models on the end user's terminal, or
can be applied to models downloaded for a particular session that more accurately represent
the speaker. Various methods have been proposed to produce 3-D models of human faces
from camera images.
Govt. College of Engg., Kannur 2005
12
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 16/26
Mobile Facial Animation
To find an effective compromise between bandwidth and video quality, the choice
of encoding and packetization parameters must take into account the characteristics of the
channel on which the animation is transmitted. So the channel should be enough bandwidth.
Govt. College of Engg., Kannur 2005
13
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 17/26
Mobile Facial Animation
CHAPTER 6
CHANNEL MODELS FOR FAP
For FAP the suitable channel models are GPRS channel and EDGE (Enhanced Data
rates for GSM and TDMA/136 Evolution) channel .The following figure gives relevant
comparison of different networks. From that it is clear that high bandwidth is needed for
low error, which can give, by GPRS and EDGE.
Table for comparison of error in different application
Application
environment
Packet loss IFD Frames/ RTP
packet
Animation
Bitrate
Buffering for
Error
Concealment
Uni-directional
applications
<5% 5-7 2 5 Kbps -500 ms
Interactive
applications Low delay
<5% 3-5 2 6 Kbps -200 ms
Wireless/mobile
networks
10-15% 1-3 2 9 Kbps -120ms
6.1 GPRS
GPRS is a wireless packet based network architecture using GSM radio systems.
The original design of GPRS has been driven by non real time requirements. Nevertheless
the adaptive multi slot capability of GPRS which allows for dynamic allocations of
timeslots to a given terminal, provided enough bandwidth for the support of a limited set of
multimedia enabled services. Furthermore the native support of the IP protocol allows a
simple .interfacing of current IP/RTP based multimedia applications such as facialanimation streaming to a GPRS network
For the GPRS channel model, the propagation conditions were those specified in
GSM 05.05 as TU50 Ideal Frequency Hopping at 900MHz. The TU50 channel model
represents multi-path propagation conditions found in typical urban conditions. Four
channel coding schemes are specified for GPRS, three of which were employed here. The
frames are convolution coded under different rates. In convolutional coding when we use v
output symbols for each input code the convolutional code rate is 1/p.when k symbols are
shifted using k shift .registers and output symbols are v then code rate is k/v. The schemes
Govt. College of Engg., Kannur 2005
14
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 18/26
Mobile Facial Animation
used for GPRS are labeled CS-1, CS-2, CS-3, and respectively correspond to convolutional
code rates of ½, 2/3 and 3/4.
Figure below shows the results of GPRS simulations performed using various
channel coding schemes, at a number of C/I13.2 ratios. PSNR values above 45 dB generally
indicate very infrequent error bursts. Values between 40 and 45 dB indicate more frequent
errors, but overall quality is likely to be acceptable to many users. Taking this as a guide, it
is clear that acceptable quality is achievable using the entire channel coding schemes tested.
However, relatively high C/I ratios are required using CS-3, making the use of this scheme
undesirable.
Figure 6.1.1 : PSNR results for GPRS channel of FAP transmission with 11-frame per
Second
Govt. College of Engg., Kannur 2005
15
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 19/26
Mobile Facial Animation
6.2 EDGE
Beyond GPRS, EDGE (Enhanced Data Rates For GSM Evolution) is a generation
2.5-air interface, which represents a step towards UMTS. It provides higher data rates thanGPRS and introduces a new modulation technique called eight-phase-shift-keying (8-psk)
that allows a much higher bit rates and automatically adapts to radio conditions. EDGE
shares its available bandwidth among users on one carrier in a sector, which ranges from
several 10's of kbps to 384 kbps, depending on various conditions such as propagation,
interference, and traffic load.
The network chooses a maximum number of retransmissions that may be attempted
for each link layer segment. Link adaptation is used in EDGE so that the system can select
the most efficient modulation and coding scheme for each mobile based on its current
channel condition method. EDGE uses 8 different channel coding schemes. Some of them
are based on the convolutional coding with correct error correction capabilities.
For the EDGE channel model, the propagation conditions were again those specified
in GSM 05.05 with ideal frequency hopping. However, for this model the mobile terminal
speed is set to 3 km/hr. eight joint modulation-coding schemes are specified, which make
use of two different modulation schemes and various convolutional coding rates.
Modulation is either GMSK, as used in GSM and GPRS, or 8-PSK, which gives higher data
rates. Two GMSK schemes are used here: MCS-1 and MCS :2 correspond to covolutional
code rates of 0.53 and 0.66, Two 8-PSK schemes are also tested: MCS-5 and MCS-6
correspond to covolutional code rates of 0.37and 0.49. The other modulation-coding
schemes resulted in the transmitted data being subjected to error rates that are too high to
consider for transmission of FAP's.
Results with the EDGE channel model are shown in figure below, The results show
that transmission of FAP's using the 8-PSK-modulation scheme is likely to result in
unacceptable quality unless the C/I ratio is greater than 18 dB. Even with GMSK
modulation, acceptable quality decoding of FAP's may only realistically be possible using
MCS-1, unless a C/Lratio greater than 15 dB can be guaranteed. In terms of error rates,
EDGE provides a more hostile environment to multimedia than GPRS.
Govt. College of Engg., Kannur 2005
16
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 20/26
Mobile Facial Animation
Figure 6.2.1 PSNR results of FAP transmission over and EDGE channel with 11-frameper second.
6.3 RESULTS
1. Freezing of the animation: Corrupted data is detected before 4t is displayed. The
display freezes while the decoder searches for the next resync code.
2. Catastrophic display of corrupted data. Corrupted data is not detected before it is
displayed. This leads to highly obvious, "catastrophic" errors being visible in the
decoder display (see figure below).
Govt. College of Engg., Kannur 2005
17
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 21/26
Mobile Facial Animation
CHAPTER 7
ERRORS IN MOBILE FACIAL ANIMATION
A barrier to the introduction of FAP technology to mobile devices is computational
Complexity. This is an issue for both the encoding and decoding terminals. At the decoder,
the 3-D model must be reconstructed and rendered. Fortunately, MPEG-4 FAP models are
relatively simple compared to many modern 3-D applications, and can be rendered on
relatively cheap, low power hardware.
Producing a compressed FAP bit stream from a text based FAP file consumes very
little processing power. However, some of the image processing algorithms required to
produce the FAP file parameters are often complex. This does not necessarily prohibit the
use of FAP encoding in mobile devices. If the application is not real-time then the
processing could be performed in the background by the mobile device.
Predictive coding means that errors encountered in one P-frame propagate to
following P frames. This makes the regular insertion of I-frames vital for combating the
effects of channel errors. The channel errors cause loss of synchronization. The effects of
synchronization loss are commonly limited through the insertion of ^synchronization code
words. For MPEG-4 natural video coding, resync code words are inserted at the beginning
of every frame, and also at regular locations within, each frame when the error resilience
modes are used. However, because FAP's can be compressed down to such low bit rates, it
would be inappropriate to insert lengthy resync code words, at such a frequency.
When there is no effective error detection and concealment appropriate algorithm
resync code words were inserted before every I-frame. Undetected error scan often cause
very serious problems in the displayed output. P-frames following such serious errors would
not improve the quality, and can therefore be skipped.
In the receiver error resilience can be achieved through two complementary
mechanisms: early error detection and interpolation-based concealment. The first guarantees
that packet losses and bit stream errors are signaled to the concealment module as soon as
possible. The second one-is responsible for recovering missing or corrupted frames.
Govt. College of Engg., Kannur 2005
18
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 22/26
Mobile Facial Animation
Typically, an error is detected only indirectly after some frames, and the perfect
localization of the error is often impossible. It is thus very important detect the error as early
as possible. With this in mind, the bit stream syntax was analyzed in order to pinpoint the
places where an error could be detected.
Based on several assumptions that stand in the case of most on-line transmissions, it
introduced optional checks for the values of certain fields of the stream. While these fields,
like gender bit, coding type, and object mask, are theoretically unconstrained, in practice
they are not supposed to change during a single session.
On one hand, the need for an error concealment module is increased by the fact that
the loss of a single P-frame prevents the correct decoding of the following ones. On the
other hand, given that the facial animation parameters represent 1-D displacements of the
Feature Points, and that loss bursts are typically comparable with the length of-a phoneme
(during which the mouth position, or viseme, does not vary significantly), the use-of error
concealment techniques based on Interpolation proves effective in reconstructing FAP
trajectories.
In the MPEG-4 version different software's are developing for FAP encoding &
decoding in mobile platforms. But it is necessary for them to add the following
functionality.
• Error resilient decoding: When errors are detected, the decoder freezes the display,
and searches for the next resync code. There is no error concealment built in, it has
to be added.
• Regular insertions of resync codes: A 32-bit resync code -specified in the MPEG-4
standard is inserted before every I-frame to limit the effects of synchronization loss.
• Output of decoded visual data to file: The displayed output is written to a series of
bitmap files, to aid quality evaluation and comparison of test result.
Govt. College of Engg., Kannur 2005
19
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 23/26
Mobile Facial Animation
CHAPTER 8
APPLICATIONS
8.1 EMBODIED AGENTS IN SPOKEN DIALOGUE SYSTEMS
Using facial animation we can create talking heads, which deliver their services
through mobile phones. Users were able to ask questions about available services to the
talking heads on their mobile phones. Examples of the services are timetables for trains,
accommodation and location of hotels. The system may use graphical interface.
Other than providing lip movements to accompany the synthesized voice output, the
head was capable of deictic movements: when information (e.g. a timetable) was presented
somewhere in the graphical interface, the face would look and turn towards that location on
the screen, thereby guiding the user's attention.
Figure 8.1.1 Talking head for service assistance
8. 2 LANGUAGE TRAINING WITH TALKING HEADS
Using a multimedia-communicating device, the facial animation can be used as
language training tool. Rather than aiming at building fixed set of speech training
applications, this focused on integrating a number of relevant technologies into an
interactive, easy to environment, making it possible for teachers, parents and other
Govt. College of Engg., Kannur 2005
20
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 24/26
Mobile Facial Animation
interested parties to construct applications involving multimodal speech technology. Using
the graphical interface (GUI), users could select different viewings of face and tongue.
Figure 8.2.1 Software tool for language training (remote assistance); Talking head
animated for mouth.
8.3 SYNTHETIC FACES AS AIDS IN COMMUNICATION
This application aims a communication device that, in a speaker independent
fashion, translates telephone quality speech signals into visible articulatory motion in a
synthetic talking head with sufficient accuracy to provide significant speech reading support
to the hearing impaired user, improving his ability to communicate over mobile phones.
There are two factors makes it difficult.
I. The device has to be speaker independent
II. It has to work in real time, with no more than about 100msec delay
There are efforts going on to solve this problem
Govt. College of Engg., Kannur 2005
21
8/2/2019 Mobile Ani Matt Ion
http://slidepdf.com/reader/full/mobile-ani-matt-ion 25/26
Mobile Facial Animation
CHAPTER 9
CONCLUSION
FAP coding provides a method of supplying animated 3-D representations of
speakers at very low bandwidths. Although the processing power involved in acquiring FAP
information appropriate for encoding may be challenging for mobile terminals, trading
quality for complexity may produce feasible solutions. When it is carried out using the
GPRS and EDGE channel models revealed that FAP coded streams are reasonably robust to
error when compared to normal coded video. However, certain channel errors produce
highly disturbing effects that indicate the need for efficient error detection and concealment
schemes. Investigation of more advanced resynchronization code insertion schemes is alsorecommended.
Govt. College of Engg., Kannur 2005
22