71
FHTW Berlin, Germany University of Applied Sciences for Engineering and Economy International Media and Computing (Bachelor) Bachelor Thesis Mobile Games over 3G Video Calling Analysis of Interactive Voice and Video Response for Mobile Applications and Games Author Christoph Köpernick Mühlenstr. 20A 14167 Berlin, Germany +49 171 4527999 [email protected] Matr.-No.: s0514154 Start: 24 November 2008 Hand In: 27 February 2009 1 st reviewer Prof. Dr. Ing. Carsten Busch Kopfbau, Raum: 109 Wilhelminenhofstraße 75A 12459 Berlin +49 30 5019-2214 [email protected] 2 nd reviewer Prof. Thomas Bremer Kopfbau, Raum: 109 Wilhelminenhofstraße 75A 12459 Berlin +49 30 5019-2481 [email protected]

Mobile Games over 3G Video Calling - Christoph Koepernick...commands, and game concepts that ask gamers to take pictures of special symbols to control the game. Although 3G phones

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

FHTW Berlin, Germany

University of Applied Sciences for Engineering and Economy

International Media and Computing (Bachelor)

Bachelor Thesis

Mobile Games over 3G Video Calling

Analysis of Interactive Voice and Video Response

for Mobile Applications and Games

Author

Christoph Köpernick

Mühlenstr. 20A

14167 Berlin, Germany

+49 171 4527999

[email protected]

Matr.-No.: s0514154

Start: 24 November 2008

Hand In: 27 February 2009

1st reviewer

Prof. Dr. Ing. Carsten Busch

Kopfbau, Raum: 109

Wilhelminenhofstraße 75A

12459 Berlin

+49 30 5019-2214

[email protected]

2nd reviewer

Prof. Thomas Bremer

Kopfbau, Raum: 109

Wilhelminenhofstraße 75A

12459 Berlin

+49 30 5019-2481

[email protected]

i

Abstract Interactive voice and video response (IVVR) is a

mobile technology enabling interactive services

based on 3G video telephony. IVVR applications can

take advantage of bidirectional real-time multimedia

streaming, enabling speech and camera-based

interaction. Games are a proven way to draw users

into new technologies. This thesis analyses IVVR’s

underlying technologies and is dedicated to IVVR

application’s usability aspects and game concepts

suitable for 3G video call games. 3GBattle is a

prototype for a camera-based card battle game,

demonstrating IVVR’s capabilities and ways to

adapt to its limitations.

ii

Preface When I came across interactive voice and video response (IVVR) during my internship in

Malaysia, I was looking for substantial literature about 3G-324M and IVVR. I discovered

that—for 2008 and so far in 2009—there are no books or studies that exclusively cover IVVR

in depth. I foresee that systems founded on the notion of IVVR will have good prospects,

including Mobile Rich Media applications that mix media streaming with interactivity,

creating an innovative mobile experience for consumers. In consideration of my upcoming

final thesis and the fact that IVVR and games excite me, I have decided to write about Mobile

Games over 3G Video Calling with the goal of studying IVVR in depth and thinking beyond

current IVVR services. The topic of my thesis includes many areas covered in my studies and

beyond; therefore, it is a good choice to interconnect different specialities to a

multidisciplinary work.

Acknowledgements

I would like to thank FHTW for providing me with a high-quality education and especially

my professors Prof. Dr. Ing. Carsten Busch and Prof. Thomas Bremer for supporting my wish

to write about this special topic and for their advice. I am also grateful that my friends and

family always encourage me to work hard on my thesis and focus on my studies. Furthermore

I thank the professional editors from papercheck.com for proofreading my thesis.

Christoph Köpernick, February 2009

iii

Contents

Introduction .............................................................................................................................. 1 

Chapter 1 Basic Concepts ....................................................................................................... 4 

1  3G Mobile Phone Standards and Technology ............................................................ 5 1.1  Characteristics of Wireless Networks ...................................................................... 6 

2  3G-324M – The 3GPP Umbrella Protocol .................................................................. 7 2.1  3G-324M Multimedia Terminal .............................................................................. 8 2.2  3G-324M System ..................................................................................................... 8 

3  Multimedia Codecs, Compression, and Streaming .................................................. 10 3.1  Speech Codec ........................................................................................................ 11 3.2  Video Codec .......................................................................................................... 12 3.3  Interaction Delay ................................................................................................... 13 3.4  Side-Effects on the User Experience ..................................................................... 14 

4  Interactive Voice and Video Response ...................................................................... 16 4.1  IVVR Applications ................................................................................................ 17 4.2  IVR Supplements ................................................................................................... 17 4.3  Information Portals ................................................................................................ 18 4.4  Video on Demand .................................................................................................. 18 4.5  Mobile TV ............................................................................................................. 18 4.6  Video Sharing ........................................................................................................ 19 4.7  P2P Video Avatar .................................................................................................. 19 4.8  3G-to-IP ................................................................................................................. 19 4.9  3G-to-TV ............................................................................................................... 19 4.10  Mobile Banking ..................................................................................................... 20 4.11  Mobile Games........................................................................................................ 20 

5  IVVR Game Santa Claus Sleigh Ride ....................................................................... 21 

6  Classification of Mobile Games over 3G Video Calling .......................................... 22 6.1  Thin-Clients and Gaming Terminals ..................................................................... 22 6.2  Mobile Game Streaming ........................................................................................ 23 6.3  Person-to-Application ............................................................................................ 24 6.4  Direction, Interaction and Conversation ................................................................ 26 6.5  All-IP Approach .................................................................................................... 27 6.6  Mobile Games over 3G Video Calling Defined .................................................... 27 

iv

Chapter 2 Usability and Design Opportunities of IVVR ................................................... 29 

7  Considerations about Mobile Video Telephony ....................................................... 30 

8  Usability Opportunities and Design Rules for IVVR .............................................. 31 8.1  Simplicity............................................................................................................... 31 8.2  Sounds ................................................................................................................... 32 8.3  Visual Design Rules .............................................................................................. 32 8.4  Resuming Sessions ................................................................................................ 34 8.5  Consistency and Multi-tap Text Entry................................................................... 34 8.6  Camera-Based Information Entry and Interaction ................................................. 35 

Chapter 3 Design of Mobile Games ...................................................................................... 38 

9  Technical Foundation for IVVR Games ................................................................... 39 

10  Appropriate Game Concepts .................................................................................. 41 10.1  Visual Novels ........................................................................................................ 41 10.2  Mobile Gambling ................................................................................................... 43 10.3  IVVR Multiplayer Games ..................................................................................... 43 10.4  Parallel Reality Games .......................................................................................... 44 

Chapter 4 Mobile Role-Playing Game: 3GBattle ............................................................... 45 

11  Early Prototype ....................................................................................................... 46 11.1  Setting .................................................................................................................... 46 11.2  Game Concept ....................................................................................................... 46 

12  Preparation for a Working Prototype ................................................................... 48 12.1  Machine-Readable Playing Cards ......................................................................... 48 12.2  Theme .................................................................................................................... 48 

13  Further Improvements ........................................................................................... 50 

Conclusion .............................................................................................................................. 51 

14  Further Studies ........................................................................................................ 52 

v

List of Figures Figure 1. High-level architecture of UMTS network ................................................................. 5 

Figure 2. 3G-324M system diagram .......................................................................................... 9 

Figure 3. Visual distortion of a H.263 stream due to transmission errors (with error

concealment). ........................................................................................................................... 13 

Figure 4. Person-to-person 3G video telephony ...................................................................... 16 

Figure 5. Person-to-application video telephony (IVVR)........................................................ 16 

Figure 6. Example of IVVR supplement application for customer care with barcode

recognition ............................................................................................................................... 18 

Figure 7. IVVR game “Santa Claus” from CreaLog GmbH ................................................... 21 

Figure 8. IVVR application template with 16x16 raster .......................................................... 33 

Figure 9. 12-digit numpad........................................................................................................ 34 

Figure 10. High-level system architecture for the delivery of dynamic IVVR services .......... 39 

Figure 11. Screenshot from popular visual novel “Brass Restoration”. .................................. 42 

Figure 12. IVVR slot machine to win coupons........................................................................ 43 

Figure 13. 3GBattle prototype configuration ........................................................................... 46 

Figure 14. Semacode tag representing number 1. .................................................................... 48 

Figure 15. Example character card 1. ...................................................................................... 49 

Figure 16. Example character card 2. ...................................................................................... 49 

Figure 17. Example battle card 1. ............................................................................................ 49 

Figure 18. Example battle card 2. ............................................................................................ 49 

List of Tables Table 1 Evolution and Comparison of H.324, H.324M and 3G-324M ..................................... 7 

Table 2 Various UMTS Services from User Point of View .................................................... 23 

Table 3 UMTS Services from Network Point of View............................................................ 24 

1

Introduction Games are a proven way to draw users into new applications and devices. Video games are

popular and widely adopted by all age groups and in all social environments. First, video

games are detached from arcade cabinets, becoming available on personal computers and

game consoles for home usage. Soon, gaming was possible on the go with handheld game

consoles and even mobile phones. In 2009, it is common to play casual games on one’s

mobile phone using Java technology or BREW, or on other platforms such as Symbian,

iPhone, or Windows Mobile. Some games are available as Flash Lite applications or over

WAP, but the majority needs to be installed on a phone, and the devices have to meet certain

software and hardware requirements. Moreover, multiplayer games are also quite popular on

many platforms. Mobile connectivity makes mobile phones a perfect platform for online

and/or multiplayer games.

Current mobile phone capabilities offer numerous ways for service providers, mobile network

operators, and content providers to create profitable mobile services. These services include

(1) WAP Push-driven, premium-rated short message services; (2) mobile instant messaging;

(3) mobile dating; (4) video and game downloads; (5) TV voting; (6) colour ring-back tones;

(7) web services extended to mobile users over mobile IP data services; or (8) premium-rate

telephone services such as customer care, tech support, or adult chat lines. Most of these

services—such as SMS or premium telephony services—use the circuit-switched

characteristics of mobile networks; others are based on the packet-switched mobile data

services of GPRS, UMTS, or HSDPA. A number of these services generate direct revenue for

both the service provider and the mobile network operator; some only-mobile network

operators benefit as chargeable voice or data traffic is generated on the mobile network.

However, all these services lack a user-friendly combination of real-time interaction and

ease-of use coupled with an instant multimedia experience. Moreover, most do not support

features such as content protection or push communication, and they do not use an ultra thin

client approach. This is where the new video call capabilities of contemporary 3G handsets

come into play to create the new mobile technology interactive voice and video response

(IVVR). Utilizing the full potential of IVVR can enable service providers to create the

ultimate thin client service that is easy to use and features bidirectional multimedia

communication in real-time, with full content control in a user-friendly and easy-to-catch

manner.

2

Exploiting the capabilities of IVVR based on 3G video telephony for mobile gaming opens

up many opportunities but also many challenges. Mobile games can be delivered without

prior installation, are operating-system independent, and can be played without additional

software such as J2ME. Games stream instantly to the phone and do not require any

additional local storage or processing power. Mobile games using IVVR technology also

enable developers to create games where levels, avatars, and objects can update when

desired, based on information sources on the web or from other gamers. This opens up a

variety of services featuring multiplayer and social networking capabilities. During an IVVR

session, the caller is automatically sending his camera and microphone signal to an

application server, paving the ground for motion- or gesture-based interaction, voice

commands, and game concepts that ask gamers to take pictures of special symbols to control

the game.

Although 3G phones with video call capabilities have been available since 2001 and are

pervasive nowadays, they are still unused for mobile games over 3G video calling. Network

operators are slowly realizing that 30-plus years of evidence prove that people just do not like

the idea of seeing whom they are calling, and this preference will not change dramatically.

Therefore, they are looking for the next “killer-application” that offers a unique user

experience to co-exist with the mobile web for 3G and classic communication services such

as voice calling.

As of spring 2009, there are no known successful games over 3G video calling. In this thesis,

I will analyse the advantages and opportunities of exploiting the 3G video call feature for

mobile games and present design guidelines and evaluate which game concepts are

appropriate for IVVR games.

In detail, I will cover the following aspects:

The first chapter introduces the basic concepts behind 3G video calling including the relevant

UMTS architecture for circuit-switched video calling, multimedia streaming and codecs, the

3G-324M umbrella protocol, and research about IVVR and 3G video call applications in

detail.

Chapter 2 focuses on user interaction by speech commands and camera-based interaction,

covering usability aspects for IVVR applications.

3

Chapter 3 explains the current state of my findings for a system architecture to provide IVVR

games and discusses which game concepts are suitable for mobile games over 3G Video

Calling.

Chapter 4 features the game design and conceptual prototyping for my IVVR game 3GBattle

that uses a camera-based interaction approach.

4

Chapter 1 Basic Concepts Before designing, developing and analysing IVVR applications—and especially Mobile

Games over 3G Video Calling—it is essential to understand numerous basic concepts on

which those applications rely. These basic concepts influence the creation of IVVR

applications, help exploit all 3G video call features, and cope with the downsides of the

bearer technologies for achieving an undiluted user experience.

From the technological perspective of 3G video calling in general, the essential technologies

involved are (1) the circuit-switched characteristics of the UMTS mobile network system for

video telephony; and (2) the 3G-324M umbrella protocol used for conversational multimedia

services. The 3G-324M standard recommends that the media codecs AMR and H.263(+) be

used for audio and video streaming. To assess the quality of IVVR services, it is essential to

understand the characteristics of these codecs in the mobile environment.

Aside from the details of IVVR technology and provision of IVVR application examples, the

idea of Mobile Games over 3G Video Calling is classified and clearly defined.

5

1 3G Mobile Phone Standards and Technology

Third-generation (3G) systems were designed with the notion of enabling a single global

standard to fulfil the needs of anywhere and anytime communication (Etoh). Compared to 2G

systems, 3G systems focus more on multimedia communication such as video conferencing

and multimedia streaming. ITU defined IMT-2000 as a global standard for 3G wireless

communications and, within this framework, 3GPP developed UMTS as one of today’s 3G

systems. W-CDMA is the main 3G air interface for UMTS (Holma and Toskala)

implementing various person-to-person, circuit-switched services such as video telephony.

The high-level UMTS network architecture from 3GPP-R5 is described in documents from

its Technical Specification Group in the figure below (Etoh 22).

BSS Base Station System CS Circuit-Switched HSS Home Subscriber Servers IMS IP Multimedia Subsystem MS Mobile Station NMS Network Management Subsystem PS Packet-Switched RNS Radio Network Subsystem

Figure 1. High-level architecture of UMTS network

As shown in figure 1, the UMTS core network primarily consists of a circuit-switched (CS)

and a packet-switched (PS) domain. Typically, the PS domain is used for end-to-end packet

data applications, such as mobile Internet browsing and e-mail. On the other hand, the CS

domain is intended for real-time and conversational services, such as voice and video

conferencing. Circuit-switched connections are most efficient for constant, continuous data

streaming by definition (Etoh). In addition to the CS and PS domains, 3GPP-R5 also specifies

the IP Multimedia Subsystem (IMS).

Using the PS domain, IMS is projected to provide IP multimedia services that also satisfy

real-time requirements, including those that were previously possible only in the CS domain.

6

In this thesis, I will discuss Mobile Games over 3G Video Calling based on 3G video

telephony in the CS domain (see figure 1, highlighted in dark green).

1.1 Characteristics of Wireless Networks

Wireless networks are inherently error prone. Bitrates in wireless systems tend to fluctuate

more as compared with wired networks. In wired networks, phenomena such as fading,

shadowing, or reflection are non-existent so that, for the most part, the same bandwidth and

much higher bandwidths are present during transmission. Influences on signal propagation

cause the constant changing bandwidths in wireless systems. Generally, the receiving power

depends on the distance between sender and receiver. The receiving power p decreases

proportionally to the square of the distance betw n sender and receiver: ee

1

where d is the distance between sender and receiver (Schiller).

Receiving power is influenced further by frequency dependent fading, shadowing, reflection

at large obstacles, refraction depending on the density of the medium, scattering at small

objects, and diffraction at edges.

The effect of multipath propagation can cause jitter when the radio signal reaches the

receiver by two or more paths at different times (Schiller). Moreover, the mobility of the user

adds another set of problems that results in fading of received power over time; the channel

characteristics change over time and location. This exacerbates the effect of multipath

propagation because signal path change will be increased as the user changes his or her

location. Changes in the distance between sender and receiver cause different delay variations

of different signal parts.

The phenomenon of “cell-breathing” is a special problem in CDM systems. In CDM systems,

all terminals use the same frequency spectrum. Therefore, the more information that

terminals are sending and receiving in a cell, the more noise that is produced. A higher noise

level means that the noise level for far terminals will increase to the point that reception is

impossible; ergo, the cell shrinks.

The UMTS network (W-CDMA) counters but not eliminates these effects by implementing

error detection, error correction, and error concealment measures. For example, in W-CDMA,

cell-breathing is effectively prevented by implementing the wideband power-based load

7

estimation to keep the cell coverage within the planned limits (Holma and Toskala).

Nonetheless, these phenomena can still affect the audiovisual quality of 3G video calls such

as high delays, bit errors, or varying bitrates. This aspect is covered in more detail in Section

3: Multimedia Codecs, Compression, and Streaming.

2 3G-324M – The 3GPP Umbrella Protocol

The 3G-324M standard implemented in contemporary 3G camera phones enables 3G users to

establish bidirectional multimedia calls in the sense of a person-to-person, circuit-switched

service for the purpose of video telephony or video conferencing.

The 3G-324M umbrella protocol is based on H.324, and its first draft was specified by 3GPP

in 1999. The current No. 7 release of 3GPP—technical specification 3GPP TS 26.110—

introduces the set of specifications that apply to 3G-324M multimedia terminals. In TS

26.110, most of these specifications are referred as multimedia codecs for circuit-switched

3GPP networks. In the sense of TS 26.110, the term codec refers not only to codecs used for

the encoding and decoding of media streams, but also to mechanisms for multiplexing/de-

multiplexing and call control (3GPP). More specifically, the codecs used for media streams

are AMR and H.263; for instance, H.223 and H.245 are the codecs for multiplexing/de-

multiplexing and call control, respectively.

In addition to these codecs, 3G-324M also defines codecs for error detection and correction

since 3GPP networks are inherently error prone (see 1.1 Characteristics of Wireless

Networks).

Table 1 Evolution and Comparison of H.324, H.324M and 3G-324M

H.324 H.324Ma 3G-324M

Focus POTS Mobile Networks 3G Wireless Networks

Standardisation Began 1990 1995 December 1999

Standardisation Body ITU-T ITU-T 3GPP

Audio Codecs G.723.1, AMR G.723.1 Annex Cb G.723.1, AMR

Video Codecs H.263, MPEG-4 Part 2 H.263 Appendix IIc H.263+d, MPEG-4, H.264

Notes: a. H.324 Annex C refers to H.324M. b. With bitrate scalable error correction and unequal error correction. c. With error tracking improvements described in Annex K, N and R of H.263 version 2 from 1998. d. H.263 version 2 from 1998.

8

H.324 was originally developed by ITU-T for low bitrate multimedia communications with

voice, video, and data transmission in the PSTN over analogue (circuit-switched) phone lines

and was later extended to other GSTN networks like ISDN.

H.324 terminals provide real-time video, audio, or data, or any combination, between two

multimedia telephone terminals over a GSTN voice band network connection.

Communication may be either 1-way or 2-way. Multipoint communication using a separate

MCU among more than two H.324 terminals is possible (ITU-T). Over the years, several

extensions have been added: One of them is H.324M that adapts the H.324 series to mobile

networks to make the system more robust against transmission errors. H.324M was intended

to enable efficient communication over error-prone wireless networks.

One of the general principles set for the development of H.324M and 3G-324M

recommendations was that they should be based upon H.324 as much as possible; this would

simplify further development of existing systems and ease the introduction of new features in

standards derived from H.324 (Table 1 gives key facts about the evolution to 3G-324M).

Technical specification 26.111 contains the modifications in 3G-324M that were made to

H.324 in order to address error-prone environments.

2.1 3G-324M Multimedia Terminal

In the scope of 3G-324M, a terminal that implements the 3G-324M umbrella protocol and all

its features is called a 3G-324M multimedia terminal. Terminals can be 3G-handsets with a

W-CDMA air interface and a built-in camera. More generally, any equipment that complies

with the requirements of Technical Specification TS 26.110 is a 3G-324M multimedia

terminal. In this sense, for example, a Linux-based machine connected with an E1 line to the

PSTN can also be a 3G-324M multimedia terminal, as long as it supports all the protocol’s

requirements. More specifically, such IVVR (application) servers are the foundation to

provide and deliver IVVR services.

2.2 3G-324M System

Figure 2 shows the 3G-324M system followed by a description of its components relevant for

IVVR.

9

Figure 2. 3G-324M system diagram Note: e. (3GPP 7)

H.324 Annex H (Optional Multilink) defines the operation of H.324M1 over as many as 8

independent physical connections, aggregated together to provide a higher total bitrate (ITU-

T Study Group No. 16). A single physical connection is defined by one 64 kbit/s circuit-

switched connection that is compatible with N-ISDN (Etoh). In ISDN terms, such a

connection is also known as an S0 interface. Although the total bitrate for 3G-324M

connection can be multiples of 64 kbit/s, all mobile network operators2 and handsets3 the

author has tested support only a single N-ISDN compatible channel. A sole 64 kbit/s

connection is used for all logical media channels of a session, meaning for both transmission

and reception of multimedia streams and control data. Although bandwidth allocation is

dynamically based on demand and wireless network characteristics, a bitrate of roughly 30

kbit/s is available for each party to transmit or receive the media streams, respectively. H.223

is used to multiplex the logical media channels used for speech, video, and data

communication into one bitstream (Etoh).

During call set-up, the 3G-324M multimedia terminal capabilities are exchanged using H.245

messages, the master/slave relationship for the session is determined, the logical channels for

audio and video transmission are opened, and the multiplexing agreement is made up (Jabri). 1 All characteristics of H.324M also apply to 3G-324M due to the general principles described earlier. 2 Mobile Networks tested: Germany: Vodafone, T-Mobile, E-Plus, o2. Malaysia: Maxis, Celcom. 3 For example, Nokia N96 specification states, “CS max speed 64kbps” (Nokia).

10

The exchange of terminal capabilities has the same motivation as does the use of the SDP in

SIP. Terminals can have different capabilities, especially concerning supported multimedia

codecs and the need to agree upon codecs that both terminals support. The device capabilities

have qualitative effects on QoS, as evolved multimedia codecs like MPEG-4 Part 2 or AAC

are more efficient and robust than legacy codecs such as H.263 or AMR audio. Furthermore,

H.245 messages are used to transmit DTMF signals during a 3G-324M session, with the

caller using the handset’s numpad to type in numbers or characters. DTMF signals

transmitted through H.245 messages are the foundation for simple interaction with IVVR

applications. NSRP is an optional retransmission protocol, and CCSRL provides a mechanism

for segmenting H.245 messages into fragments to improve performance in conditions where

the likelihood of errors is high (Myers).

3GPP Technical specification 26.111 requires AMR to be used as speech codec. For

maintaining audio and video synchronisation—that is, lip-synch—the Optional Receive Path

Delay can compensate for the video delay. H.263 baseline is required as video codec to

compress the media stream. The use of MPEG-4 Part 2 is recommended as it provides higher

error robustness capabilities and improved coding efficiency as compared to H.263.

TS 26.111 also specifies data protocols like T.140 that could be used for real-time text

conversations. Text conversations may be opened simultaneously with voice and video

applications, or as text-only sessions4 (3GPP). However, ITU-T is not aware of a deployed

terminal having T.140 implemented (ITU-T). Unfortunately, even in 2009 there is no known

handset that implements T.140.

3 Multimedia Codecs, Compression, and Streaming

3G video telephony generally operates over a single 64 kbit/s connection where both parties

need to share the available bandwidth. Effectively, the application then is left with 60 kbit/s,

or less that are dedicated for both media types, since H.245 call control messages reduce the

gross bandwidth. In 3G-324M systems, the bandwidth is allocated dynamically; however,

generally said, every party has 50% of the bandwidth available for sending audio and video

signals. In a typical unidirectional scenario, 12.2 kbit/s are allocated for the speech codec, and

a bitrate of 43-48 kbit/s is allowed for the video data (Sang-Bong, Tae-Jung and Jae-Won).

4 Also known as textual chat.

11

By employing rate control methods in the media encoders, the network can dynamically

change these bitrates depending on network conditions and application demand. When two

parties communicate simultaneously, the bitrates for the speech and video codec can be

reduced in the encoders of both parties, keeping the overall bitrate below 64 kbit/s. For

instance, when just one party shows speech activity, the speech bitrate for the other party can

be reduced to a minimum where only comfort noise is generated on the receiver side (Holma

and Toskala); AMR can perform these bitrate changes every 20ms. For video, the encoder

can reduce the average bitrate by either reducing the frame rate or simply dropping frames

during transmission. To increase the overall frame rate on the receiver side, the decoder can

employ H.263 temporal scalability.

In 3G video telephony, the audio and video signals are bidirectionally streamed over

dedicated circuit-switched W-CDMA paths. Streaming describes media is continuously being

received or sent and played back on a terminal. Non-conversational one-way audio or video

streaming requires a transport delay variation of below 2s (3GPP). In contrast, two-way video

telephony introduces even higher real-time requirements with an end-to-end, one-way delay

of below 150-400ms5 (3GPP) to maintain a smooth conversation. The overall one-way delay

in W-CDMA networks6 is already approximately 100ms, and it should be noted that in

addition to the transmission time, media generation time is required when delivering IVVR

services. Due to these tight delay requirements, there is no time for retransmission when

transmission errors are detected. Retransmission would reduce bit errors and consequently

improve video quality, but it would also add undesired delays when resending PDUs.

Therefore, to avoid retransmission, H.223 and the media codecs are working hand-in-hand to

detect errors, accomplish resynchronisation, and perform error concealment.

3.1 Speech Codec

For audio coding the AMR narrowband (NB) speech codec is used and operates at a nominal

bitrate of 12.2 kbit/s7. The actual bitrate depends on network conditions and speech activity,

and it can switch every 20ms, leading to a different average bitrate. AMR-NB was developed

to handle narrowband speech; that is, digitized speech with a sampling frequency of 8 kHz

(Etoh).

5 Where <150ms is preferred and the lip-synch delay should be <100ms. 6 From the user equipment to the PLMN border. 7 Other operation modes are possible and include bitrates of 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, and 10.2 kbit/s.

12

3.2 Video Codec

In video coding, a sequence of frames is compressed and converted into a bitstream that is

typically many times smaller than the data representing the frames. In H.263+ and other

typical video codecs, a frame may be coded on one of several modes: the use of I-frames

(intraframe), P-frames (predicted frame), and B-frames (bidirectionally predicted frames). It

should be highlighted that the use of B-frames is unsuitable for conversational applications as

it creates additional buffering delays (Etoh). The use of B-frames results in a reordering of

frames, making the display order and coding order different.

3G-324M standard recommends a video resolution of 176 by 144 pixels (QCIF) with a frame

rate of between 10 and 15 frames to be encoded using H.263(+). H.263(+) operates in the

YCBCR colour space, and uses 4:2:0 chroma subsampling (YUV420), which compresses the

video signal. For coding, usually a frame is split into macroblocks. A macroblock consists of

one block of 16 by 16 luna samples (Y) and two blocks of 8 by 8 blue-difference and red-

difference chroma samples (Cb and Cr).

The H.263+ decoder can attempt to conceal errors, but previously, errors need to be detected.

In Subsection 2.2, we saw that the encoded media streams are multiplexed into a H.223

bitstream. This H.223 bitstream consists of a series of media packets called Adaption Layer

Protocol Data Unit (AL-PDU). Using H.223 AL28, the AL-PDUs can contain a control field

and CRC checksum (NMS Communications) in addition to the payload. With the help of

CRC, H.223 is capable of detecting errors or loss of AL-PDUs and then report these to the

H.263+ decoder.

However, errors that may not be detected in H.223 are passed to the H.263+ decoder without

indication; the decoder can still detect errors by watching out for syntactic or semantic

violations of the bitstream (Hansen). When an error is detected—either by transport decoder

H.223 or video decoder H.263+—the video decoder can decide to drop the complete packet

or try resynchronisation. In the H.263+ media stream, at the beginning of each GOB9, a

resynchronisation marker is inserted to help the decoder distinguish valid video information

after an error has occurred (Kwon and Driessen).

8 H.223 Adaption Layer 2. 9 In H.263, it was possible to place a sync marker only at the beginning of each picture.

13

Source: Warren Miller movie trailer encoded to QCIF H.263 with 64 kbit/s using Real Mobile Producer and streamed in the PS

domain over RTSP.

Figure 3. Visual distortion of a H.263 stream due to transmission errors (with error concealment).

When an error is detected, error concealment rather than retransmission is performed. Error

concealment aims to hide visual artifacts and residual errors. Spatial interpolation and

temporal motion estimation is performed to hide the visual distortion. The effects of

transmission errors and error concealment measures can be seen here.

3.3 Interaction Delay

To assess the interactivity of IVVR applications later in this thesis and in further studies, it

should be explained how different delays in an IVVR system add together and create the

overall interaction delay d:

   

where is the transmission delay of the 3G network from the mobile station (MS) to the

PLMN border. The delay caused by transition from one network to another is represented

by  ; this is mainly the transition from a circuit-switched network (3G CS or PSTN) to a

packet-switched network (IP). This transition is obvious in the following scenario:

The dialled number is routed over the PSTN to a PRI in a datacentre and answered by a 3G-

324M Gateway directly connected by a digital telephony card to the PRI line. In this case, it

is assumed that the path between the MS to the answering gateway is completely circuit-

switched. When the gateway routes the video call over an IP-based LAN to an IVVR

Application Server there is a transition from the CS to the PS domain. This transition causes

delays because the continuous bitstream needs to be wrapped in packets to transmit the data

over a LAN.

On the other hand, can also describe delays caused by network transitions between the

PLMN border and an E1 line. Nowadays, to reduce costs, parts of the PSTN are already

14

replaced by NGN that use the PS domain to interconnect circuit-switched networks for voice

d video communication (VocalTec). an

describes the delay caused on an application server to control the flow of program

ecution depending on user input. ex

is the delay caused by dynamic generation of an audio and video stream to be transmitted

and displayed on the MS.

Similar to , the variable describes delays caused when sending media from an IVVR

Application Server to the MS. The transition from the PS domain (LAN) to the CS domain

(E1 line) particularly causes delays due to packet delay variation. Packet delay variation—

also known as delay jitter—occurs when there is a packet jam on the transition from the PS to

the CS domain. Packets need to be unwrapped and put into a continuous bitstream. When the

available bandwidth in the CS domain is smaller than the bandwidth in the PS domain

(bottleneck), packets jam and a jitter buffer is used to ensure that data is continuously played

ut to the CS do ain (Wikipedia contributors). o m

is similar to in that it describes the delay for receiving the bitstream on the MS caused

during transmission between the PLMN border and the MS over the W-CDMA air interface.

The author’s experiments using a similar configuration as described in Section 9: Technical

Foundation for IVVR Games and evaluating IVVR services from companies listed in

Appendix B showed that a typical value for the overall interaction delay d is between 0.8 to 1

seconds. Although this could be further optimized to get near to the theoretical limit of

200ms, this is the reference value for further studies, especially to select and develop

appropriate game concepts as described in Section 10: Appropriate Game Concepts.

3.4 Side-Effects on the User Experience

Media compression, error concealment measures (see above), and the characteristics of

wireless networks (see Subsection 1.1) have side effects on the quality of 3G video telephony

and IVVR applications.

3G-324M requires only the use of speech codecs. In contrast to audio codecs, speech codecs

are designed for speech transmission within a narrow frequency range, making them

inappropriate for transmission of music or a range of artificial sounds. This fact needs to be

15

considered when designing IVVR applications—especially games, as most games utilize

music and sound effects to create an immersive atmosphere.

H.263 and MPEG-4 Part II baseline were designed for images of natural scenes with

predominately low-frequency components, meaning that the colour values of spatial and

temporally adjacent pixels vary smoothly except in regions with sharp edges. In addition,

human eyes can tolerate more distortion of high-frequency components than of the low-

frequency components (Kwon and Driessen). In reference to the explanation of Kwon and

Driessen, video codecs used for 3G-324M video telephony are great for natural scenes and

talking-head scenarios. Depending on the type of IVVR application, these characteristics

work against a good user experience.

Typical desktop or web applications have a monochromatic user-interface with boxes,

buttons, and fonts that are clearly readable. Based on user interaction, the user interface can

change its appearance frequently, perhaps only for some parts of the user interface or perhaps

the whole screen. It is obvious that codecs used for 3G-324M video telephony are unsuitable

for this kind of video transmission. Compressing such user interfaces with H.263 creates

blurred fonts and tattered buttons and lines, leading to a user interfaces too distorted for a

good user experience. The comparable high round-trip delays can make interaction tedious,

with interfaces that require a high rate of user interaction and screen changes.

Depending on the type of game, the compression characteristics of video codecs used in 3G-

324M can be advantageous. Contemporary 3D games such as first-person shooters or

simulation games try to model the game environment as realistic as possible, creating natural-

looking scenes and making them appropriate for compression using video codecs defined for

3G-324M.

However, more problematic are the delay requirements for mobile games that are essential

for a good gameplay experience. 3GPP defines a delay variation of below 75ms for real-time

games10 and considers first-person shooters the most demanding ones with respect to delay

requirements (3GPP). Other types of games, such as turn-based strategy games or visual

novels, may tolerate a higher end-to-end delay and may require lower data rates.

10 In my opinion,75ms is dedicated to network delays in multiplayer scenarios; interaction delay should

probably be a lot lower.

16

4 Interactive Voice and Video Response

Interactive voice and video response (IVVR) is a mobile technology that enables interactive

services based on a 3G video call (Ugunduzi Ltd.). IVVR is using the 3G video call

technology to deliver applications and services to 3G users.

Figure 4. Person-to-person 3G video telephony

Figure 5. Person-to-application video telephony (IVVR)

In contrast to person-to-person video telephony (Figure 4), an application server answers when a

3G user places a video call. The application server generates and transmits an audio and

video stream that is shown on the handset of the caller (Figure 5).

Instead of a real person or scene, the application server can transmit any kind of audio and

video—whether a pre-recorded “talking head”, movie trailer, a live feed from TV, or a video

showing traffic information—or even the order process for electronic shopping. As 3G video

calls are generally bidirectional, the user is automatically sending the handset’s build-in

camera and microphone signals to the other party. This paves the ground for interactive

applications based on gesture and speech recognition. Another, simpler way to realise

interaction is by processing DTMF signals sent with the 3G-324M session when the caller

uses the handset’s numpad to type.

The term IVVR is derived from interactive voice response (IVR). IVR is an interactive

content-to-person service and a prevalent technology to IVVR. IVR allows callers to retrieve

information, make bookings, and get connected with a contact based on the caller’s selection.

IVR applications are written mostly in VoiceXML to describe a simple series of voice menus

where the caller chooses from a given selection of options or makes spoken requests when

the IVR system supports natural language speech recognition. Based on the caller’s selection

and purpose of the IVR application, the system plays pre-recorded audio clips, dynamically

generates speech, or connects the caller with a person of charge. Although IVR systems are

17

flexible, used extensively, and accessible through most mobile and landline phones, they

show some limitations. The IVR system can respond only in the auditive dimension; complex

menus need to be broken down to limited choices. When these choices are nested, it can be

tedious for callers to dig through a complex voice menu or remember the series of choices.

Voice-only communication also excludes the dumb from using IVR.

Simply said, IVVR is based on video telephony and adds a visual dimension to IVR, enabling

service providers to create new services that take advantage of media streaming capabilities.

4.1 IVVR Applications

IVVR applications use the IVVR mobile technology to create services based on 3G video

calling. I have identified various types of IVVR applications, explained in the following

subsections.

4.2 IVR Supplements

IVR Supplements are dialog-based IVVR applications founding on the notion of typical IVR

applications for call dispatching or information services. They add the visual dimension to

IVR applications by showing slides as a graphical representation of an IVR voice menu on

the phone screen. This increases accessibility for the dumb, making it possible to interact

with IVR where listening might not be desired—such as in business meetings or during a

lecture—and accelerating perception of options and information.

Humans receive 80% of information by seeing, but only 15% by hearing—and 5% by feeling

(Dahm). Additionally, seeing is a non-linear process by which people can perceive

information simultaneously when skimming a text, for instance. In contrast, when the focus is

on the transmission of facts without emotions, hearing is a linear process where the listener

must wait until the speaker or recording has finished. The same information might be faster to

transmit using written text or visualisations. Because humans can perceive information

visually faster than auditively, IVVR takes advantage of this fact by using the handset’s

display to present information.

Moreover, an IVVR application can use the video transmission capabilities as seen in the

following example (Figure 6). Customers looking for repair service can let the system detect

their product by taking a snapshot of the product’s barcode.

18

Figure 6. Example of IVVR supplement application for customer care with barcode recognition

4.3 Information Portals

Information Portals serve the purpose of making information such as a weather forecast

accessible through a 3G video call. Of course, IVVR information portals compete with

mobile web browsing. A huge advantage of 3G video calling is the easy access to

information: Users do not need a mobile web browser or a data plan for accessing this

information. IVVR is based on 3G video call, which is a W-CDMA circuit-switched service;

therefore, access is charged the same way as voice calling, on a per minute basis11. Moreover,

IVVR applications are optimized for display on mobile devices and no need for device

portability, as all 3G-324M compatible multimedia terminals follow the same standards, for

example, video resolution and proportions.

4.4 Video on Demand

Video on demand (VoD) is a pull-based type of communication where people dial a number

and access videos on demand over a 3G video call. Video telephony has a build-in content

protection mechanism: No data is stored on the terminal, and in closed systems such as cell

phones, there is no possibility of recording of copyrighted material.

4.5 Mobile TV

Mobile TV is the use of 3G video telephony to receive TV signals. TV on-the-go is promising

but suffers from high costs or the need for new mobile phones with DVB-H or DMB

11 Although prices may differ compared to voice calling, toll-free numbers and premium numbers are possible.

19

interfaces. Today, 3G video telephony can make live TV available on phones as well as

affordable with the right call plan12.

4.6 Video Sharing

Video Sharing makes use of the built-in microphone and camera by recording video over an

ongoing video call on the server side, making it available to a video community. Combined

with a menu similar to an IVR menu, community members can select the videos they want to

watch. Should a larger number of videos be accessible—that is, more than 8—navigation

with numbers (using numbers 0–9 from the handset’s numpad) is tedious. Text entry is more

flexible and can be achieved by typing text messages with multi-tap text entry.

4.7 P2P Video Avatar

P2P Video Avatar is an idea to counter the resistance in user acceptance of classic person-to-

person video telephony (for further explanation, see 7 Considerations about Mobile Video

Telephony). The idea is to apply a dynamic overlay before the calling party’s video is

displayed on the handset. A mask or avatar can be superimposed on the caller’s video. It

should be mentioned that a P2P Avatar is not a real IVVR application but an enhancement of

mobile person-to-person video telephony.

4.8 3G-to-IP

3G-to-IP bridges a video call between the wireless and circuit-switched domain to a

(predominately) wired and packet-switched domain. Connecting 3G and IP users is a

challenge. 3G-324M was designed for circuit-switched networks; fortunately, the media

codecs are generally media independent. In the PS domain, H.323 is used for video

telephony. Recent VoIP activities have led to the use of SIP for call-set up and RTP for media

transport. These protocols are widespread and their use is similar to H.245 and H.223 in the

3G-324M domain. A typical 3G-to-IP gateway would extract media streams from the H.223

bitstream and encapsulate them into a RTP stream after performing call set-up employing

SIP. Like P2P Avatars, 3G-to-IP is not a genuine IVVR application but a gateway service.

4.9 3G-to-TV

3G-to-TV is a technology that enables innovative TV formats such as live newscasts or

participatory TV. The audio and video signal sent by a 3G participant is transmitted to a

broadcasting studio to be shown on TV (Mirial s.u.r.l.).

12 Vodafone Germany offers unlimited voice and video calls as part of their “SuperFlat” plans.

20

4.10 Mobile Banking

Mobile Banking is well suited to be performed using 3G video telephony. Security

requirements for mobile banking are exceptionally high and need to be considered along with

the desire for global mobility. 3G video telephony is a streaming service; no data is stored on

the network or the terminals. When accessing an account, for example, the account statement

is not received as ASCII characters but as an encoded video stream, making eavesdropping

even harder. In contrast to HTTP services where no transport layer security is mandatory,

communication on UMTS networks is secure from within, thanks to encryption between the

MS and BTS using procedures stored in the USIM (Schiller). The security of mobile banking

services can also benefit from the fact that the handset’s audio and video signals are available

for identification and authentication purposes. By combining PIN-authentication with

identification based on biometrics—such as the account holder’s voice and facial features—a

banking service would be more secure than typical online banking with PIN-only

authentication alone. This could be done without need for additional hardware such as

smartcard readers.

4.11 Mobile Games

Mobile Games using IVVR technology are games that are instantly streamed to the player’s

phone and played using various interaction possibilities such as keystrokes, speech

commands, or the use of the handset’s built-in camera for object, symbol, or gesture

recognition. This enables any 3G camera phone to be used as a game console, without the

need to download or install additional software. IVVR is well suited for games that do not

require fast interaction and for casual games played on a per-session basis. Game developers

do not need to worry about device porting or system requirements concerning graphic cards,

for instance, as 3G-324M is a well-defined and widely deployed standard in 3G networks and

devices. Moreover, billing is intuitive for gamers as it works on a per-minute basis with

premium numbers, standard phone numbers, and even toll free numbers. However, despite all

these advantages, there are some drawbacks to using today’s circuit-switched 3G video

telephony for mobile games. Game designers need to cope with these limitations, circumvent

them—or, better, create games that can exploit the possibilities of 3G video calling and cope

with its disadvantages at the same time. Section 10 features game concepts suitable for

IVVR.

21

5 IVVR Game Santa Claus Sleigh Ride

In 2007, CreaLog GmbH, located in Munich, Germany, has developed the example IVVR

game Santa Claus Sleigh Ride. The aim of the game is to steer the reindeer sleigh to the

North Pole by using speech commands or keystrokes. The game is constructed from a series

of pre-recorded video clips showing Santa Clause steering either to the right or the left,

played back depending on the caller’s commands. Although this game cannot be considered a

real-time interactive game, is technically near to an IVR supplement, perfectly illustrating

how to cope with the various limitations of 3G video calling.

Figure 7 shows a number of screenshots from the game that makes clear that the game is

adapted to the characteristics of IVVR technology. It copes with the high delays by simple

reducing the interaction rate to a minimum. After a 3G video call is placed, video sequences

start automatically, explaining how to play the game and how to steer the reindeer sleigh to

the North Pole. The only caller interaction required is to say left or right, or to use numbers 4

or 6, as seen in the second screenshot. During my testing sessions, the speech recognition

misinterpreted my voice commands in approximately one-third of the time. The response

delay after a keystroke was about 1 second before the next video sequence was played back

(for an in-depth explanation of the interaction delay in IVVR applications, see 3.3 Interaction

Delay).

Figure 7. IVVR game “Santa Claus” from CreaLog GmbH

In Subsection 3.4, I have explained that media codecs used in 3G-324M are targeted for

speech communication and talking head scenarios. CreaLog’s game handles this limitation by

22

having no sound effects but rather a voiceover technique where somebody playing Santa

gives hilarious comments about the player’s decisions. Furthermore, the game uses a 3D

animated environment consisting of colourful gradients with a small amount of sharp edges

that, unsurprisingly, compresses well using video codecs aimed for the compression of

natural scenes, creating a decent visual quality of this mobile game.

6 Classification of Mobile Games over 3G Video Calling

This section introduces a number of concepts and approaches that help to define clearly

Mobile Games over 3G Video Calling. Knowing these concepts helps stakeholders in the

game development process to describe various types of games based on 3G video telephony

by using the same vocabulary.

2G networks were originally designed for efficient delivery of voice services. Not until the

spread of 3G networks was the foundation for circuit-switched and packet-switched

multimedia and data services set up. Moreover, the UMTS network was designed for flexible

delivery of any type of service, making a great deal of various services possible. Video

telephony is such a service; however, thanks to the future-proof approach of the

standardisation bodies ITU-T, 3GPP and UMTS Forum, the technical basis of video

telephony in UMTS networks is not limited only to conversational person-to-person services.

This basic idea enables developers and service providers to create new services—such as

IVVR—based on an existing and well-deployed foundation. In the following, I am

referencing services and application definitions from standardisation bodies and other

literature to help classify and define IVVR, especially Mobile Games over 3G Video Calling.

6.1 Thin-Clients and Gaming Terminals

The architectural foundation for IVVR services is a simple client-server system. The

handset—also known as the cell phone, smartphone, PDA, mobile station (MS), 3G-324M

multimedia terminal, handheld game console or gaming terminal—is the client. The server—

more specifically, the combination of 3G-324M Gateway/PBX and IVVR application

server—is the server system in this client-server architecture. The 3G-324M multimedia

terminal is only a device that sends and receives media streams. It runs simple presentation

software that plays back incoming media, and it transmits sound, video, and keystrokes to the

server. A 3G-324M multimedia terminal does not perform any application processing,

persistent application data storage, or even graphics rendering, making it an extremely thin-

23

client (Sommerville). When using the 3G-324M multimedia terminal for gaming purposes, it

should be called a gaming terminal to be easily understood and identified by users.

6.2 Mobile Game Streaming

The UMTS Forum has identified the following services (Hansen) (UMTS Forum):

Table 2 Various UMTS Services from User Point of View

Information and M-Commerce

Education Entertainment Business and Financial

Communication Telematics and Special

Web-Browsing Virtual School Audio video on

demand

Mobile

banking

Video telephony

Telemedicine

Interactive

Shopping

Online Library Gaming on demand

Online billing Video

conferencing

Security

monitoring

Remote Training Live Streaming

and Interactive

TV

Mobile

payment

Interactive Voice

Response

Office

extension

Gaming on demand (GoD) is similar to audio and video on demand where a user pays only

for a game when he or she likes to play, without buying or downloading the full game.

According to Table 2, Mobile Games over 3G Video Calling is a combination of GoD and

video telephony. In contrast to progressive downloading of media or games in on-demand

cases, the use of video telephony as a streaming service makes it possible to stream a game to

the gaming terminal13 while playing it, excluding any wait for delivery. As the use of the

video telephony service makes it possible to stream the game to the user’s handset while

playing it, I introduce this as the new term mobile game streaming. The game is streamed

instantly to the user’s handset over a standard 3G video call, without additional software or

UMTS network components. The idea of game streaming is already present and in use in the

wired and stationary world. Tenomichi offers StreamMyGame, a service for broadband

Internet users. Members of StreamMyGame can access and play their games remotely via

broadband or share their games with other members. To provide this service, special software

interconnects the computers hosting games (server) and computers used to play games

(client). Similar to mobile game streaming using 3G-324M protocol, the game graphics and

sound are streamed to the client, and input of the client’s peripherals is sent to the server

(Tenomichi). Another existing approach, application streaming, is currently differently

defined and more similar to application virtualisation as it focuses on streaming the 13 In this case, a 3G-324M multimedia terminal or the user’s mobile phone.

24

application logic and data to a client for stepwise execution rather than streaming the

application’s visual output to the client. However, StreamMyGame technology can also be

used to stream applications following the notion of game streaming to clients.

6.3 Person-to-Application

The service classification in Table 2 and my concept of mobile game streaming was created

from a user point of view as it highlights how users can understand this new service based on

the ideas of services they already know. However, the process-centred classification of

possible UMTS services from Holma et al. (11-30) is created from the network point of view

and emphasises the involved types of communication partners. The table below is based on

the approach from Holma et al.:

Table 3 UMTS Services from Network Point of View

Technology Connection Parties Examples

Person-to-Person Circuit-Switched Services

CS Peer-to-peer (or with intermediate server)

Two persons or groups

Voice calling

Video calling

Video conferencing

Person-to-Person Packet-Switched Services14

PS Peer-to-peer (or with intermediate server)

Two persons or groups

Multimedia Messages (MMS)

Real-time video sharing

Push-to-talk over Cellular (PoC)

Voice over IP (VoIP)

Multiplayer games

Content-to-Person Services

CS / PS Client-server Content server and receiving user

Web browsing

Video on demand

Live streaming

Content download

Multimedia Broadcast Multicast Service (MBMS)

Business Connectivity

PS Multimodal Laptop to Internet or Intranet

Web browsing E-mail Secure access to corporate Intranet

14 CS services can later be provided through PS services, which opens up more service possibilities due to

higher bandwidth.

25

In reference to table 3, we can derive the following types of video calls, seen from the

network point of view:

Peer-to-peer (P2P) video calling is classic video calling where one person uses a multimedia

terminal to communicate with another person who is also using a multimedia terminal. The

P2P approach is independently from the type of network. Should a video call be set up

between parties in different networks (wired/wireless or CS/PS domain) or using different

communication protocols (3G-324M/H.324 or proprietary protocols such as Skype), there is

the need for a gateway interconnecting these networks, translating between the different

protocols, sometimes transcoding the media streams to different formats when both parties

cannot negotiate a common set of media codecs (cross-media adaption) (Basso and Kalva).

Peer-to-peer multi-point video calling—better known as video conferencing—is where

multiple participants are connected for a multi-point voice and video conversation. To

interconnect the various parties, a Multi-point Control Unit (MCU) is used, and the feeds

from the participants are displayed on the handset at the same time side-by-side. To achieve

this, either separate logical channels for each participant are opened or the video multiplex

mode in H.263 can be used. The multiplex mode can display up to four different video sub-

bitstreams, sent within the same video channel (ITU-T).

Furthermore, media adaption and transcoding issues need to be considered when

interconnecting parties with different terminals and connections. For example, if one party is

using a low bitrate communication terminal (3G-324M handset) and all other parties are

using broadband Internet connections with high-resolution web cameras, the video streams

from the broadband users need to be transcoded into a lower bitrate version so they can be

transmittable to the handset user. This is done by employing H.263 spatial scalability for

adapting the media to varying display and bandwidth requirements or constraints.

Person-to-application video calling is based on the notion of content-to-person services,

introduced in table 3, but focuses on video calling as a way to access applications in the sense

of on-demand or application service providing (ASP) mixed with the thin-client approach.

Person-to-application is a main concept behind the authors understanding of IVVR

technology as described in Section 4.

Peer-to-peer over Application (P2PoA) video calling is a mixture of peer-to-peer, P2P multi-

point, and person-to-application video calling. Its notion is to enable conversational services

26

that connect people with each other in a more dynamic way than classic peer-to-peer video

calling does. Examples are video dating, online conferences, ad-hoc groups, or group

coordination similar to Push to Talk over Cellular (PoC). Using P2PoA, callers use an IVVR

application to find and select a person or a group with whom they would like to

communicate, and then initiate a conversation without placing a new call but rather letting a

special MCU or gateway server interconnect them.

6.4 Direction, Interaction and Conversation

Atul and Tsuhan take the dimensions of direction, interaction and conversation into account

when assessing mobile services (51). Based on their approach, these dimensions are

discussed with the focus on IVVR applications and mobile games.

The direction of communication of an IVVR application or game can be either unidirectional

or bidirectional. Even though in a 3G-324M session both parties can automatically send and

receive their media streams, it does not indicate that the other party will consider or process

the incoming media stream. The unidirectional case can have two variations: On one hand,

the mobile terminal can play-back incoming media streams; examples are security

monitoring, live TV, or traffic surveillance. On the other hand, the mobile terminal is sending

a media stream to another mobile user, an application server, or gateway to other networks;

examples are sending of videos or photos to media-sharing communities or 3G to TV live

linking if no satellite connection or professional equipment is on site. However, the

dimension of direction is hardly useful to describe mobile games, as it depends on the point

of view what direction a game takes. Consider a single-player game over a 3G video call. The

gaming server generates the game and streams it to the player’s handset, making it a

predominately unidirectional communication. However, the player controls the game by

sending DTMF tones, making the communication, in a sense, bidirectionally.

The idea of interaction has generally two extremes: An application can be interactive or non-

interactive. By definition, video games are always interactive; however, the rate of interaction

heavily depends on the type of game. Therefore, we need a graduate definition of interaction.

There are slow-paced games with a minimum rate of interaction, and there are games that

require a high amount of fast interaction. Games that require a high rate of interaction are

generally real-time strategy games, shooters, or real-time sports simulations. An example for

a game requiring a minimum amount of interaction was covered in Section 5.

27

In this context, the understanding of the term conversational always refers to a

communication among humans; a service enabling a human to communicate with a weak or

even strong AI is not to be considered conversational. We have already learned that the focus

of 3G-324M is to provide conversational services. In contrast to video telephony and video

conferencing, which are typically conversational, IVVR applications and games are not

always conversational.

Most IVVR applications listed in Section 4—such as IVR Supplements, Mobile TV or

Mobile Banking—are non-conversational, as the caller does not perform a conversation with

another person. Typical conversational IVVR applications include P2P Video Avatar and

certain types of mobile games. It is not part of this thesis to discuss whether playing a

multiplayer game is already some type of communication among the players; it is presumed

that a multiplayer game is not automatically conversational just because the players could use

game objects as a medium of communication.

A mobile game can have a conversational character when players use text, voice, or even

video chatting. Using such a communication channel can be helpful when teammates need to

coordinate their activities in ego-shooters or when a group of players needs to develop a

strategy for defeating a challenge in a dungeon of a MMORPG.

6.5 All-IP Approach

In reference to the UMTS service classification from Etoh, IVVR is a service of the circuit

teleservices group as it operates in the CS domain and consist of simple video calls. This

classification will be obsolete when the IMS will have withered away the distinction between

PS and CS; video telephony, like any other UMTS application and service, will be based on

the IMS, generally using an “all-IP” approach. Issues such as the need for cross-media

adaption, transcoding or gateways interconnecting different types of networks (introduced in

the subsections above) will become less important when all devices use the same IP bearer

technology.

6.6 Mobile Games over 3G Video Calling Defined

According to the definitions, concepts and approaches above, Mobile Games over 3G Video

Calling can be defined as follows:

28

Mobile Games over 3G Video Calling is an interactive person-to-application IVVR service

and describes video games played on a 3G handset by establishing a simple 3G video call. By

accessing and playing the game over a 3G video call, the handset turns into a thin gaming

terminal. The game itself is processed, and its sound and graphics are generated on a remote

gaming server. These mobile games can be controlled with the terminal’s built-in camera,

microphone, or keypad while the game graphics and sounds are streamed to the gaming

terminal. The concept of Mobile Games over 3G Video Calling exploits 3G-324M

technology for circuit-switched conversational multimedia services—also known as video

telephony or video conferencing—of today’s 3G infrastructure to create new types of services

with existing technology. Network operators and standardisation bodies are working on the

4G technology IP Multimedia Subsystem (IMS) that will replace, among others, the circuit-

switched video telephony service with an “all-IP” version. This evolution will counter

limitations and drawbacks of the current bearer technology opening up more possibilities for

streamed games.

29

Chapter 2 Usability and Design Opportunities of IVVR Usability studies for IVVR face challenges in two major stages in the foreseeable evolution

of IVVR to Mobile Rich Media. In the first stage, designers, information architects, and

usability engineers need to cope with the limitations of the current bearer technology 3G-

324M and its employed media codecs.

In Chapter 1, we saw that W-CDMA networks are inherently capricious. They provide a

inconstant and unreliable bandwidth, face round-trip delays of around 200ms, and suffer from

bit errors and mobile noise. The effects of transmission errors were impressively shown in

Figure 3. Codecs used for 3G-324M communication are taking counter measures to lessen the

effects of wireless characteristics such as high delays, poor video quality with limited frame

rates, and quality problems when fast screen changes occur. However, there can still be

negative impacts on the user experience. By applying good practices, the impact of those

effects can be lessened, and designers can create easy-to-use interfaces and applications. 3G-

324M video telephony restrictions not only need to be considered when making decisions on

the visual design but also on the logical design of IVVR applications. In contrast, 3G video

calling offers new possibilities to create valuable services by exploiting its flexibility,

simplicity, and support for alterative interaction concepts.

Even when H.264/MPEG-4 AVS will be available as a video codec for 3G-324M-based

video telephony, it will not boost the visual experience, as H.264 was designed for natural

scenes like H.26315. Only in the second stage can IVVR applications benefit from new bearer

technologies, codecs and multimedia frameworks. The second stage will take off when the

IMS will be deployed and new media codecs from the MPEG-4 suite of standards will be

available and in use. MPEG-4 Part 20 in particular—also known as MPEG-4 Lightweight

Application Scene Representation (LASeR) —is aimed for the delivery of rich media

applications to handset users (LASeR Interest Group). In this upcoming stage, usability

engineers will have a flexible framework for the creation and delivery of IVVR applications.

Most probably, the acronym IVVR will be withered away by that time and changed to Mobile

Rich Media to give a catchier term to users, managers, and marketeers.

This chapter focuses on today’s usability challenges of IVVR that might also be useful for

Mobile Rich Media applications in NGN.

15 Moreover, baseline H.264 has no improvements over H.263.

30

7 Considerations about Mobile Video Telephony

Even with a great deal of marketing, early attempts to convert users to the video telephony

technology flopped (Jones and Marsden). In contrast, desktop video conferencing is

incredibly popular for private person-to-person conversations and widely used for video

conferencing in business environments such as telepresence for computer-supported

cooperative work (CSCW) (Kleinen).

In desktop video conferencing scenarios, typically a stationary computer is used. Camera and

microphone are fixed and usually maintain the same distance from the person participating

during the conversation. Moreover, lighting conditions are generally better than “on-the-go”,

as a desktop is easier to illuminate correctly than a scene in the mobile environment. When

performing mobile video telephony, lighting conditions change over time when the caller

moves or the environment changes; moreover, the camera is usually not fixed. During mobile

video telephony, the caller is likely to hold the handset in front of his face by extending his

arm, making the video wiggly. In combination with the meagre bandwidth and low-resolution

video, this can considerably degrade the video quality shown on the callee’s side. These

considerations about the video quality problems in the mobile environment also play a major

role in IVVR applications that take advantage of the instant video streaming capabilities that

3G-324M video telephony offers. Bad video quality negatively influences camera-based

games, gesture recognition, or P2P services that intentionally change the video for dynamic

video overlays such as for the P2P Avatar, because motion analysis algorithms perform better

with a sharp and clear video signal.

In desktop video conferencing, the video conferencing application is normally bundled to an

Instant Messaging software that includes text chat capabilities. Users can appoint or

prearrange a video conference using textual chat. In contrast, the current evolution of video

telephony in UMTS networks based on the circuit-switched 3G-324M service does not

seamlessly combine video conferencing with other communication channels. The notion of

video telephony in the mobile environment is nearer to standard voice calling than in the

stationary world. Therefore, it is more likely that somebody will place a video call without

prior announcement. This leads to privacy and inconvenience concerns. The callee might not

want to be seen during a conversation for a variety reasons: A video call “turns you ugly”

(Harlow) because the build-in cameras are usually not placed just above the user’s line of

sight but in the suboptimal position below the nose. Further, the video quality is meagre, and

lightning conditions are poor. People might feel that exposing their face over a video call

31

invades their privacy and, most times, callees do not want callers to see how he or she looks.

Furthermore, the use of video telephony can depend on social factors. Societies in South East

Asian countries—for example, Malaysia—are considered non-confrontational. This can be

seen when people make decisions on which channel they use for communication. The

author’s experiences in South East Asia revealed that most people prefer non-confrontational

communication such as SMS, instant messaging or e-mail, even in the business environment

or with good friends. Voice calling is avoided as much as possible for a first or unexpected

contact. It is obvious that P2P video calling is considered even more intrusive—and therefore

unlikely to succeed in these societies.

According to an informal research of Sachendra Yadav (Yadav), opinion leaders and

technology experts feel that video calling does not add much to a conversation compared to

voice calling. In comparison to desktop video conferencing, which is mostly free nowadays,

the cost-benefit analysis leads to resistance for using mobile video telephony.

For many reasons, 3G video telephony as a person-to-person conversational service is not as

successful as projected. The existing technical foundation for video calling can be used to

deliver IVVR services. A wide range of IVVR applications is imaginable, and some service

providers and network operators already deploy them. Furthermore, special IVVR

applications such as P2P Video Avatar can even compensate the drawbacks of classic P2P

video telephony, making P2P-alike video telephony successful after all.

8 Usability Opportunities and Design Rules for IVVR

Most challenges in mobile user-interface design and usability engineering for mobile

applications and services originate from the platform characteristics of mobile phones, of

course. Mobile Interaction Designers such as Matt Jones and Gary Marsden, and Handheld

Usability evangelists like Scott Weiss have already invented and applied ways to get around

mobile platform limitations—a small screen, lack of a full-blown keyboard, limited

processing power, and restricted storage and memory capabilities—and have summarized

them in their books. Most of their findings can also enhance the usability of IVVR

applications.

8.1 Simplicity

The major advantage of IVVR is its simplicity and easy-to-catch manner. Consumers can

instantly access IVVR services with any standard 3G camera phone by simply placing a

video call to a special phone number. Users can create a list of IVVR applications for fast

32

access in their phone’s address book, similar to application homes for J2ME or Symbian

applications on their phone. Furthermore, following the notion of direct dialling, consumers

can use extension numbers to “call-through” to a screen of an IVVR application right away.

Providing shortcuts within the application is always a good idea to enable frequent users to

access quickly the functions they want. Like most IVR systems, IVVR IVR Supplements

should enable users to quickly type in numbers to go straight to a submenu without the need

to wait for each screen in between to appear.

8.2 Sounds

Generating media streams for IVVR is different from media used in person-to-person

communication. This not only applies to video but to sounds, too. Comfort noise is an

artificial background noise that fills the silence in a transmission. In person-to-person

communication, comfort noise is added generally on the receiving end so that the listener can

tell that the transmission is still connected during silent periods. For IVVR applications,

comfort noise is not recommended and should be disabled on the server-side. Moreover,

audio codecs defined for 3G-324M are targeted for speech coding; this limits the use of

artificial sounds (e.g., music) for IVVR applications. Sound designers are advised to create

sounds in the 8 kHz narrowband frequency spectrum.

8.3 Visual Design Rules

Applications cannot enlarge a small screen visually, but they can implement techniques that

virtually increase the size of the display. One way is providing horizontal or vertical scrolling

of the user interface to make new information visible while hiding other content. Another

idea is a Peephole display that shows a different portion of a bigger picture when the phone is

moved to the left, right, up or down (Jones and Marsden). Unfortunately, neither approach

works well with IVVR. There are no positional sensors usable with 3G-324M, and scrolling

requires fast screen updates with the ability to hold a key as long as the user wants to scroll.

High delays in the current 3G-324M deployment and lack of transmitting the information that

a key is hold for a time prevent the implementation of such features. However, applications

can have multiple layers, such as a deck of cards that can be shown or hidden depending on

the user’s selection. Furthermore, designs can take advantage of the media- streaming

capabilities and multimodal information channels by providing some information using

speech output, some with pictures or text, and others by using video sequences.

33

Mobile users demand visually attractive user interfaces that are clearly readable and intuitive

to use. Application flow design is beyond the scope of this thesis, and every application and

game will have its own characteristics to model and challenges to overcome. Nevertheless,

some basic guidelines for slide-based IVVR applications can be given.

Slide-based applications such as IVR Supplements shown in Figure 6 are best visually

designed using pixel-based image editors such as Adobe Photoshop. The video codecs used

in 3G-324M work in the YUV420 colour space, and the target image size is 176 x 144 pixels

(QCIF). With basic understanding of chroma-subsampling and how spatial and temporal

compression in video codecs works, designers can create slides that will compress well while

maintaining sharpness where essential. A precondition is to align the slide’s layout to a raster

of 16x16 pixels with one subdivision (8x8) as seen in the figure below:

Figure 8. IVVR application template with 16x16 raster

To ensure best readability despite video compression, designers should use sans-serif fonts.

Moreover, the font colour and the background colour should have a high difference in

luminance. The human eye can distinguish difference in luminance easier than in colour; this

fact is used by video compressors and is the foundation of chroma subsampling. For instance,

a white font on a light yellow background is already hardly readable without compression.

After compression, however, with only half of information available for colour differences

when using YUV420, the font will not be distinguishable from the background. The author’s

experiments showed that especially Microsoft’s Calibri font creates a nice typeface even after

compression. Calibri’s subtly rounded stems and corners are perfect for H.263 DCT-based

compressors that create smooth edges. Note that the minimum font size is 18px when lossy

compressed in order to be readable for mobile users. We can only hope that next-generation

34

IVVR applications can use T.140 or similar ways to transmit ASCII-text directly, making

readability considerations obsolete.

8.4 Resuming Sessions

Video call set-up times are generally between 1 to 5 seconds16 independently of the IVVR

application one is going to use, which is sometimes faster than the initialisation process of

complex J2ME applications. This makes quick on-the-go lookup or entry of information

pleasant. However, what happens when the caller needs to interrupt a gaming session or the

call is interrupted because of missing network coverage or exceeded battery life? Games

should enable users to start and stop with breaks in between, since the time they have to

spend may be brief (Weiss). Mobile games are used especially to pass time for just a couple

of minutes or even seconds when waiting for the bus, riding the subway, or to relieve

boredom during TV commercials. Therefore, all mobile applications need to apply ways to

interrupt a session and quickly resume to the last state as the user desires. This requirement

also applies to IVVR applications. As application and user data of IVVR programs can be

completely stored on the server-side, there are no limitations to auto-save program states or

record the user’s actions. With the caller’s unique phone number as an identifier, it is easy to

develop resumable applications.

8.5 Consistency and Multi-tap Text Entry

To create applications that are internally consistent, application designers should use not only

the same terms for the same things but should also think about consistent interaction concepts

across a suite of applications. For example, most IVVR applications will be controlled by

keystrokes as this is the simplest to implement and the most understandable and exact method

for consumers. Picture a standard 12-digit numpad, as seen in Figure 9:

Figure 9. 12-digit numpad

16 1-second call set-up when using MONA specified in H.324 Annex K.

35

A good practise is to reserve certain keys for standard functions such as back, menu/main,

and confirm/OK. As people in Western societies generally think from left to right, it seems

appropriate to use the star key (*) for the back function and the hash key (#) to confirm an

action or as OK button. The digit zero (0) can be used to return to the main page of an

application or to show a menu. Therefore, the application designer is left only with digits 1 to

9 for application-specific interaction such as option selection, game controls, or for input of

information. As screen size is limited and options need to be presented in a fairly large font to

be readable, 9 digits suffice for option selection anyway.

In the case of controlling a game, IVVR games can use the keys 2, 6, 8 and 4 for steering left

(4), right (6), or for accelerating (2) or breaking/backing up (8) —as most mobile games

already do, for example. When it comes to information input, 9 digits is a fairly limited

number of keys to input text. The figure above does not only show a 12-digit numpad but the

so-called fastap keypad with eight numerals keys having three or four associated alphabetic

characters (Jones and Marsden).

By using multi-tap text entry or T9, users can type in text that is then transmitted by DTMF

signals and processed by an application component. Appendix A shows the source code of

my implementation of a multi-tap text entry component done in Actionscript. The component

receives DTMF tones from an intermediate PBX over a socket server to calculate which

characters the user likes to input. The script also features three modes: The caller can input

either numbers, text using multi-tap, or select options from the screen.

A challenge when typing in texts using multi-tap text entry method is the lag between

keystrokes and visual feedback on the screen, which is due to the well-known delays in

mobile networks. Using this method to input a large amount of information will make users

feel they are not in control of the system and lead inevitably to a sluggish, clunky user

experience. Unfortunately, there is no way around this when using keystrokes for text input.

8.6 Camera-Based Information Entry and Interaction

Alternatively to multi-tap text entry, application developers could implement speech

recognition mechanisms to convert spoken words to text. However, these mechanisms are

error prone; correcting recognition errors with speech commands is tedious and can lead to

new errors.

36

Another way to feed IVVR applications with information is to use the phone’s built-in

camera to transmit video, processing it on the server-side to extract information. In Section 4

already covered the conceivable applications that use video input, mostly for person-to-

person communication or recording of video clips for presentation in VoD portals. A more

advanced usage of the video transmission capabilities in IVVR services is camera-controlled

applications and games enabling new types of handheld game interactions. Optical flow

techniques can be used for position and orientation tracking (Bucolo, Billinghurst and

Sickinger). When the handset user moves the phone, the camera captures a scene from a

different perspective. The basic idea of optical flow techniques is to analyse the video feed

for changes and then calculate the direction and speed of the moving phone. Moreover,

captured video can be used for real-time mixed-reality applications by “placing” virtual

objects in the real scene. Games like Mosquito Hunt apply this interaction model where the

movement of the phone is used to position a crosshair in a mixed-reality environment to

shoot mosquitoes.

Another way of using the camera video for interaction is motion detection in front of a fixed

scene. A user can use gestures to control an application, or a player can use an object or

simply his hands to control a game character. The mentioned techniques can be used

appropriately only when the user gets real-time feedback about the movement on the screen

to adapt his next motions accordingly. The current version of IVVR is not capable to achieve

this, as there is always a lag of several hundreds of milliseconds between input and feedback.

Instead of using camera-interaction techniques requiring real-time feedback, application

designers can use simpler ways of camera-based interaction that rely only on the recognition

of a single piece of information where the capture and feedback phases are temporally

separated.

Optical machine-readable representations of data such as barcodes or data matrices are

inexpensive to produce, and computers can recognise them quite easily even if the video

quality is low, lightning conditions are imperfect, or the symbol is captured from different

angles17. Examples for data matrices, or so-called “tags”, are the QR-Code or Semacode. The

latter is in use, among other use cases, to visually encode URLs that can be captured by

mobile phone users to quickly access a URL. The use of Semacode for the camera-controlled

game 3GBattle is shown in Chapter 4. 17 This is performed with the help of code markers and block matching algorithms (Tran and Huang).

37

In general, applications designers are advised to design applications that do not rely on the

input of large amounts of information or else provide easy ways to do so. This can be

achieved by using voice commands or camera-based input techniques. However, when

applications are developed without comprehensive speech or image recognition technologies,

developers should focus on the presentation of information and provision of entertainment

that require minimal user input.

38

Chapter 3 Design of Mobile Games Entertainment and gaming are ideal applications for mobile devices. They are nearly always

in hand, and games provide an easily accessible entertainment mechanism when users are

bored (Weiss). To create entertaining mobile games as IVVR services, game designers, game

developers, and solution providers have to overcome several obstacles. IVVR service

characteristics hinder the realisation of game concepts that require fast interaction, sharp user

interfaces, and sophisticated sound effects and music. Moreover, hosting and programming

IVVR games is not as well developed as it is for games played using other technologies. In

addition to this thesis, the author of this paper is working on ways to find appropriate

solutions for the latter problem. Therefore, the author briefly describes the current state of the

findings, and then focuses on appropriate game concepts for IVVR that counter its limitations

and take advantage of its interaction opportunities.

39

9 Technical Foundation for IVVR Games

Commercially available IVVR appliances18 still focus on dialog-based services and use

VoiceXML to create IVVR applications that are generally IVR Supplements, without

sophisticated features like real-time video generation or camera-based interaction. This

hinders the creation of advanced IVVR applications as mentioned in Section 4. Such

applications are generally founded on pre-recorded or pre-generated video sequences that are

played back based on speech commands or keystrokes. To create a more flexible solution that

is capable of delivering games and interactive services, sounds, and graphics need to be

generated in real time. The current research and testing results recommend the configuration

featured in figure 10.

analysis Simple System Architecture

Nam e: Simple System ArchitectureAuthor: Christoph KöpernickVersion: 2.1Crea ted: 04/08/2008 14:33:56Upda ted: 21/02/2009 11:33:07

3G Handset

BTS

Cal ler

MSC3G-3 24M Gateway

Game/Applica tion Serv erBil ling

Serv er

SIP Registrar

(PBX)

H.245 Call Con trol and DTMF

Audio: AMR or AAC

A/V/DTMF Input

H.223 Bit Streams

3G-324M

O Interface SS7

A Interface PCM-30

A Interface PCM-30

U_m Air Interface

A/V Reception

Video: H.263 or MPEG-4 Part 2 Simple Profi le

SIP for call control andRTP for media transport

SIPSIP

Figure 10. High-level system architecture for the delivery of dynamic IVVR services

Adobe Flex has evolved into a suite of technologies appropriate for the creation of rich media

applications and games. For the solution presented in this research, Flash is employed as the

Game Engine as it is very flexible and can be extended to create 3D games.19 The Flash

application is not run on a handset, but the audio and video it creates are transmitted over a

3G-324M session to a handset so they can be controlled based on the handset’s camera and

microphone signals and keystrokes. To achieve this, the Flash application runs on a

18 For example, DTG 3000 from Dilithium is a combination of MCU, 3G-324M Gateway, and transcoder. 19 By using the open source real-time 3D Engine Papervision3D.

40

Windows-based Game Server and the media streams are transmitted to the handset using a

3G-324M Gateway. The gateway consists of a digital telephony card, an Asterisk installation,

and the implementation of the 3G-324M protocol stack.20 Adobe’s runtime engine for Flash

applications is called Adobe Flash Player. Although it is available for many operating

systems, its design approach is to present the application on the desktop of the machine that is

used to execute it. To present the output from the Flash application on a 3G-324M

Multimedia Terminal, the Game Server needs to provide Flash’s output as RTP media

streams for the 3G-324M Gateway in order to transmit them over a circuit-switched link to a

mobile phone.

Medialooks’ Flash Source Filter21 is a DirectShow filter capable of instantiating the Flash

runtime engine, executing an SWF application, and providing its output to other DirectShow

filters for further processing in a filter graph. Microsoft DirectShow does not include AMR,

H.263, or RTP encoder filters originally; therefore, this research uses VLC Player22 with its

FFmpeg library to compress Flash’s output with AMR speech codec and H.263 video codec.

Moreover, live555 library included in VLC Player is used as an RTP encoder. However, VLC

Player and the employed libraries are not compatible with DirectShow filter graphs. To

connect the Flash Source Filter with VLC, a special DirectShow filter acting as a bridge

between DirectShow and VLC from Sensoray is used.

To enable interaction with the Flash application based on DTMF signals from the 3G-324M

multimedia terminal, the author has written a simple XML socket server that transmits DTMF

signals relayed by the 3G-324M to a special Actionscript in Flash applications (see Appendix

A for example Actionscript).

In order to feed the IVVR Flash applications with the camera and microphone signals from a

caller, a special DirectShow filter needs to be developed that receives these signals via RTP

streams from the 3G-324M Gateway, emulating virtual Webcam and microphone devices on

the Game Server. As Flash is capable of processing and playing back input from Webcams

20 The stack was implemented by Sergio García Murillo and can be found at http://sip.fontventa.com/. 21 A trial version of this filter can be found at

http://www.medialooks.com/products/directshow_filters/flash_source.html 22 VLC Player can be found at http://www.videolan.org/vlc/

41

and microphones, motion detection23 algorithms or speech recognition techniques can be

implemented for Flash-based IVVR games.

10 Appropriate Game Concepts

Mobile games played over 3G video calls should cope with IVVR’s limitations and ideally

take advantage of its unique capabilities in order to be entertaining. But, in order to be

playable, interaction delay requirements of IVVR games should be lower than the overall

interaction delay of IVVR services as calculated and stated in Subsection 3.3. A game that

complies with the mentioned requirement has to be a slow-paced game that only requires a

low number of interactions.

More specifically, to deal with the overall interaction delay, the maximum rate of interaction

should be approximately one interaction per second. Slow-paced single-player games are as

suitable for IVVR as multiplayer games that are either turn-based or even asynchronous, with

gamers performing actions that do not have tight temporal restrictions (Koivisto and

Wenninger).

10.1 Visual Novels

Typical games that have a minimum of gameplay are visual novels and certain types of

mobile gambling. Visual novels are a subgenre of adventure games, featuring mostly static

graphics and written text. Moreover, most visual novels have multiple storylines with

different endings depending on the player’s choices at decision points (Wikipedia

contributors). Figure 11 shows a screenshot from the popular visual novel “Brass

Restoration.”.

23 A popular example of Flash’s motion detection ability can be seen in the game PlaydoJam

(http://www.playdojam.com/).

42

Figure 11. Screenshot from popular visual novel “Brass Restoration”.

To tailor the idea of visual novels to the IVVR world, static graphics and written text should

be substituted or complemented to take advantage of IVVR’s capabilities and circumvent its

limitations. Written text should be avoided and substituted with voiceovers from narrators or

synthesised speech. Designers can also use sound effects and music as long as they can be

encoded properly with a narrowband speech codec. Static images can be replaced or

complemented with animated video sequences or short real-life video clips from actors. For

IVVR based visual novels, it would be advantageous to use IVVR’s real-time video and

audio transmission capabilities to create a more immersive experience.

In combination with speech or melody recognition techniques, a visual novel could require

the player to hum a melody to influence the storyline or to solve a quest. Another way to

create a more involved environment in visual novels is to use a player’s voice or portrait to

adapt certain parts of the game. Besides placing a gamer-generated picture or video sequence

within the game, a gamer could virtually communicate with or give orders to game

characters. Using speech commands for controlling game characters, synthesized voice to

receive information, or speech for person-to-person communication is more practical than

using text since a mobile phone is not ideal for typing or reading texts. This is especially the

case when text cannot be displayed sharply and is lossy compressed within a video stream.

Moreover, when a player is on the move and needs his or her eyes for viewing the real-life

environment, voice is a safer option.

However, using voice chat in role-playing games could break the immersion (Koivisto and

Wenninger). Ideally, using voice and video that are adapted to the game environment could

43

increase the sense of immersion for a player. For example, changing a voice to make it lower

or placing a mask in front of one’s face could create a more realistic game experience.

10.2 Mobile Gambling

Another genre suitable to be played over 3G video calls is Mobile Gambling, also known as

Remote Gambling. Simulating a slot machine or poker game is fairly simple and the results

can be highly addictive.24 Such games can often be played on a brief per-session basis. User

interaction required for a slot machine simulation can be extremely low when the player only

has to spin the reels (see Figure 12). Poker is also ideal since slow reactions lead to winning

the game and concentration, retentiveness, and tactics are helpful.

Figure 12. IVVR slot machine to win coupons

10.3 IVVR Multiplayer Games

The aforementioned games need to implement mechanisms that allow players to suspend

gaming sessions and resume play when desired. Especially in multiplayer scenarios, game

developers need to discover ways to provide players with the freedom to interrupt a game

without displeasing other players. Further, game concepts should cope with the interaction

delays of IVVR technology. In single-player games, the game should automatically pause and

save the current game state when a user hangs-up the video call, resuming the last game state

when the user chooses to play again. Implementing a similar functionality in multiplayer

games is more challenging. As real-time shooters require rapid interaction, they are

inappropriate for the W-CDMA network. One way to cope with the high 3G network latency

is turn-based multiplayer games in which fast reactions to other players’ decisions are not

required. Moreover, some actions in a game can be performed asynchronously, such as

character development or adding new items for sale in one’s in-game shop. 24 Federal laws and regulations in the country where this service would be offered need to be strictly regarded.

44

Event notifications combine well with asynchronous gameplay since it allows the game to

contact the player when a certain kind of change in the game state has occurred (Koivisto and

Wenninger) or when other players are ready to play. In the sense of push-communication,

gamers could be alerted by receiving a video call. When these alerts are received on a regular

basis and only from a limited number of friends, they can increase pervasiveness without

annoying the user. In such a case, users can decide whether they would like to accept the call

when they are ready to play. Combining the concepts of asynchronous gameplay, event

notifications, and a turn-based strategy helps developers to create entertaining multiplayer

games suitable for IVVR.

10.4 Parallel Reality Games

Real-time video transmission found in IVVR is a unique feature not as readily available in

other mobile technologies. As discussed in subsection 8.6, IVVR games can also be

controlled by using the handset’s camera. In parallel reality games, the game takes place in

the virtual world and the real world. The basic idea is to motivate gamers to take actions in

real life because events in the real world affect the virtual world and vice versa (Koivisto and

Wenninger). A prominent example of this is location-based games, which are unfortunately

not as feasible with today’s IVVR technology as with other mobile technologies due to the

lack of GPS information available for IVVR applications. However, using a handset’s camera

to take pictures of buildings, objects, or symbols allows game designers to create parallel

reality games using IVVR technology. Such a game could require users to take pictures of

corporate symbols, distinctive buildings, or Semacode tags that are placed in cities or on

campuses, for example, to prove that they visited those places.

45

Chapter 4 Mobile Role-Playing Game: 3GBattle 3GBattle is a turn-based card battle game played with a 3G camera phone over a 3G video

call. Players need a set of physical game cards that can be bought from a shop or printed out

at home. In order to perform actions in the game, the player needs to select an appropriate

game card and hold it in front of the phone’s built-in rear camera, making 3GBattle a camera-

controlled mobile game. This IVVR game is a multiplayer game designed for player versus

player battles. Generally, games using physical cards need to be played in the same location,

where players sit together around a table. To play 3GBattle, players need a 3G camera phone

that is connected to an IVVR game server. This makes it possible—even attractive—to play

the card-based battle game in remote locations.

Rapid prototyping is performed using easy-to-reach material like 2 decks of playing cards,

paper, and two 3G camera phones. This first prototyping phase focuses on gameplay and

assessing the feasibility of 3GBattle’s being played over 3G video calls. The early prototype

of 3GBattle is not intended to be a full-blown game, but a foundation for creating more

sophisticated card battle games based on IVVR technology. Subsequent prototyping stages

would include the development of machine-readable playing cards and a working prototype

using the configuration recommended in Section 9: Technical Foundation for IVVR Games.

The author’s motivation for creating 3GBattle was to develop a game that exploits IVVR’s

capabilities for camera-based games. Moreover, 3GBattle is inspired by fantasy role-playing

games like Dungeons & Dragons, the notion of controlling a game with camera-based user

interfaces (Tran and Huang), and the PlayStation 3 game The Eye of Judgement, which uses

the PlayStation Eye camera peripheral for capturing physical game cards that trigger battles

on a virtually augmented playing grid.

46

11 Early Prototype

In the first stage, the game is simulated using playing cards and equipment that is readily

available. For this simulation, two 3G camera phones that are connected to a W-CDMA

network are used. In addition, 2 decks of French style playing cards, 2 players, and 1 game

master are required.

11.1 Setting

From the decks, only ♣ A, ♣ 2-10, ♥ A, and ♥ 2-10 are needed. As seen in Figure 13, the 2

players are sitting back to back so that they cannot see each other’s cards, simulating the

situation in which 2 players are in remote locations. The game master is supervising the game

and performs the same tasks the IVVR application would in working prototypes.

Figure 13. 3GBattle prototype configuration

11.2 Game Concept

The game concept is fairly easy to understand: There are 10 character cards (♣ A and ♣ 2-10)

and 10 battle cards (♥ A and ♥ 2-10) available for each player, totalling 40 cards. The

numbers on the playing cards represent their power for battles during the game, and aces have

a nominal value of 1. The game master selects 6 different cards of each kind for the players,

making a hand of 6 character and 6 battle cards for each. In each of 6 rounds, players lay their

47

combination of character and battle cards in two phases. Players “lay” cards by holding them

in front of their handset’s (rear) camera. Due to the lack of a MCU, only 2 handsets with a

P2P connection are available. Therefore, the 2 players pass the first handset back and forth,

and the game master monitors the game with the second handset. To determine who is

winning the current round, the combination of character and battle cards from the first player

is compared with the selection from the second player. That player with the highest

combination of cards wins. Laying the cards is performed in two phases; initially, the

character cards are laid and then presented to both players simultaneously. In the next phase,

each player can select a battle card and lays it. The game master monitors the selection of

cards on the display of his or her 3G handset and can calculate who has won the current

round based on the combination of character and battle cards each player has laid.

Afterwards, the game master notes the point difference on a scoreboard. The scoreboard is

also used to keep track of assigned and laid cards to prevent cheating. A game is won when a

player has a higher total score than the opponent.

Informal research and experiments revealed that 3GBattle is feasible, as the game rules are

very easy to understand, and the game concept is ideal for short gaming sessions of 3 to 5

minutes. However, attendees mentioned that the playing cards are too dry and that the only

fun of the early prototype is in quickly beating the opponent. However, a certain degree of

tactics is required to win a game: The beginning of the game is determined by luck, as cards

for players’ hands are assigned randomly. When character cards are laid, players need to

choose a battle card without knowing what battle card the other player will lay. This battle

card should be high enough to beat the opponent. Furthermore, by memorising cards the

opponent has already laid, the player can try to determine his or her opponent’s hand to

prudently choose character and battle cards to win the game.

Although this prototyping stage of 3GBattle seems very simple for an entertaining card-based

game and still lacks game elements that create an immersive atmosphere, it shows that card-

based games using 3G video call are feasible. Video resolution and delays were adequate to

capture and interpret the gaming cards. In order to make machine-based interpretation of

playing cards feasible, too, cards should show a overarching and distinguishable pattern.

48

12 Preparation for a Working Prototype

In order to prepare the creation of an IVVR game from the early prototype of 3GBattle, the

playing cards have to be machine-readable for camera-based interaction and a theme has to

be found that makes 3GBattle more enjoyable.

12.1 Machine-Readable Playing Cards

A widespread way to make information on physical objects machine-readable was mentioned

in subsection 8.6. For this prototype, Semacode tags are used. These tags represent numbers

from 1 to 10, as seen on the French style playing cards of the early prototype. A different tag

(see Figure 14) needs to be placed on each playing card.

Figure 14. Semacode tag representing number 1.

12.2 Theme

To make the game enjoyable, the author has created a number of example playing cards

partly based on popular TV series South Park. The theme is not to be meant violent or

offensive, but a parody and example for a thrilling gameplay. Characters were designed using

an online character generator25 and illustrations for battle cards were designed with

Photoshop:

25 http://www.sp-studio.de/

49

Figure 15. Example character card 1. Figure 16. Example character card 2.

Figure 17. Example battle card 1. Figure 18. Example battle card 2.

Instead of simply calculating who has won a round, each battle in 3GBattle: South Park

should be visualised with a short animation. This animation should show the characters

fighting using attacks according to the laid battle cards, followed by a cheer of triumph for

the character that has won the round. The battle sequences, combined with the illustrated

playing cards and perhaps a short introductory story around the game, should create a

pleasant atmosphere for the players.

50

13 Further Improvements

To enhance gameplay and make 3GBattle more immersive, players should be able to develop

their own characters using a character generator. For refinancing or profit reasons, an in-

game shop could be offered where players could buy new accessories or clothes for their

characters. These character enhancements should only be ornamental—especially concerning

the battle sequences—not meant to improve the character’s strength. To enhance the players’

enjoyment of interaction and challenges with other individuals, the game could open a

bidirectional voice channel for in-game conversation and provide a high score table.

Furthermore, to make 3GBattle playable on the go, the playing cards should be riveted like a

fan so that a player can easy select a card with one hand while holding the handset in the

other.

51

Conclusion Today’s IVVR technology is a mixed blessing, but its core concepts foreshadow future thin-

client services that combine real-time multimedia streaming and sophisticated interaction

concepts for a unique user experience. On the one hand, with IVVR applications and games

based on 3G video telephony, developers can overcome interoperability issues and design

games that are not subject to memory or processing power limitations; games that have

multiplayer functionality, as well as full layout and application flow control. Service

providers do not need to bother with content protection, can implement time-based billing

mechanisms, and can provide seamless connectivity with Web sources or IP users, all without

disrupting a call session. Consumers can benefit from highly accessible services, security by

nature, the availability of 3G handsets, and large 3G coverage.

IVVR makes mixed reality games possible, and can boost social communication by providing

live video feeds to friends and viewers on mobile phones or the Web. On the other hand,

IVVR based on the current evolution of mobile video telephony (3G-324M) suffers from a

number of drawbacks that hinder the popularity of its services. There are various technical

limitations, mostly due to the wireless characteristics in 3G networks, such as limited

bandwidth and round-trip delays that are unacceptable for fast-paced games and applications

with a high rate of interaction. Proponents of IVVR need to bear in mind that the underlying

technology, codecs, and quality of service criteria were developed for person-to-person

communication, not for the delivery of interactive applications that have different capability

and quality of service requirements.

Today’s IVVR technology based on circuit-switched 3G video telephony is just the

preliminary stage of a new era of Mobile Rich Media and Mobile Game Streaming that uses

upcoming technologies such as the 3GPP IP Multimedia Subsystem with MPEG-4 LaSER. In

the future, bandwidth will increase, round-trip delays will decrease, and data protocols for

text and lossless graphic transmission will become available. Moreover, accessing Mobile

Rich Media will not be bound to per-minute billing, but will be charged as any other data

service, making it affordable for consumer.

The IVVR application examples, considerations about appropriate game concepts, and

interaction opportunities presented in this paper are even more feasible with next-generation

Mobile Rich Media and IP technologies such as SIP and RTP. The concept of a Flash-based

IVVR system architecture founded on IP technology is therefore independent from 3G-324M.

52

The high penetration of VVoIP clients on stationary computers can already open a market for

streamed games and applications. VVoIP is generally used with broadband Internet

connections and high-quality media codecs, and data protocols are available to make it an

appropriate platform to create interactive streaming services.

14 Further Studies

As discussed in this thesis, development of IVVR mobile technology requires extensive effort

to find appropriate game concepts and circumvent limitations. Therefore, future studies

should focus on Mobile Rich Media and Mobile Game Streaming based on IMS and MPEG-4

LaSER. The use of bidirectional media streaming for gaming is a relatively unknown field of

study. Hence, games that take advantage of this concept are worth additional research, and

studies should be conducted to determine how they increase immersion. Furthermore, the

author’s recommendation for an IVVR system able to deliver interactive real-time games

should be completed in order to run a prototype of 3GBattle.

53

Appendices

Appendix A: Source Code of Multi-tap Text Entry with Actionscript /** Stop at the interactive home page **/ this.stop(); /** Socket Connection **/ var ipAdress:String = '192.168.1.104'; var port = 8099; var connected = false; //Creating new socket and connection keySocket = new XMLSocket(); keySocket.connect(ipAdress, port); /** * Socket onConnect handler */ keySocket.onConnect = function(success) { if (success) { connected = true; _root.digitInput.text = ':)'; showNavigation(); } else { connected = false; _root.digitInput.text = ':('; } }; /** * Socket onClose handler */ keySocket.onClose = function() { connected = false; _root.digitInput.text = ':|'; }; /** * Socket onClose handler */ XMLSocket.prototype.onData = function(socketMessage) { //Symbol is number 0-9, * or # var symbol:String = socketMessage; processKeyStroke(symbol); }; /* Key Stroke Processing */ //This indicates if the text field should be cleared first var firstSymbolEntered = false; //Input modes var INPUT_MODE_NUMBER = 1; var INPUT_MODE_TEXT = 2; var INPUT_MODE_NAVIGATE = 3; var inputMode = INPUT_MODE_NAVIGATE; this.moMultiBar.input_mode.text = "NAV";

54

/** * Processing the key strokes and dispatching. * E.g. changing input mode, calling multi-tap etc. */ function processKeyStroke(key) { switch(key) { case '#': changeInputMode(); resetMultiTap(); break; case '0': switch(inputMode) { case INPUT_MODE_NUMBER: typeSymbol(key); break; case INPUT_MODE_TEXT: resetMultiTap(); submitText(); break; case INPUT_MODE_NAVIGATE: resetMultiTap(); doAction(key); break; } break; case '*': switch(inputMode) { case INPUT_MODE_NUMBER: resetMultiTap(); backspace(); break; case INPUT_MODE_TEXT: resetMultiTap(); backspace(); break; case INPUT_MODE_NAVIGATE: resetMultiTap(); goBack(); break; } break; default: switch(inputMode) { case INPUT_MODE_NUMBER: typeSymbol(key); break; case INPUT_MODE_TEXT: multiTap(key);

55

break; case INPUT_MODE_NAVIGATE: doAction(key); break; } break; } } function changeInputMode() { switch(inputMode) { case INPUT_MODE_NUMBER: inputMode = INPUT_MODE_TEXT; this.moMultiBar.input_mode.text = "abc"; showHelp(); firstSymbolEntered = false; break; case INPUT_MODE_TEXT: inputMode = INPUT_MODE_NAVIGATE; this.moMultiBar.input_mode.text = "NAV"; showNavigation(); break; case INPUT_MODE_NAVIGATE: inputMode = INPUT_MODE_NUMBER; this.moMultiBar.input_mode.text = "123"; showHelp(); firstSymbolEntered = false; break; } } function backspace() { if(_root.digitInput.text.length > 0) { var subStrEnd = _root.digitInput.text.length - 1; _root.digitInput.text = _root.digitInput.text.substr(0, subStrEnd); } } function typeSymbol(symbol) { if(!firstSymbolEntered) { firstSymbolEntered = true; _root.digitInput.text = symbol; } else { _root.digitInput.text += symbol; } } var SWITCH_DELAY = 1000; var lastKeyPressTime = 0; var lastKey = null; var keyPressedTimes = 0; var keyPosition = -1;

56

var currentChar = null; //Symbol mapping of keys var keys:Array = new Array( new Array(" ", ".", "!", "1"), //1 new Array("a", "b", "c", "2"), //2 new Array("d", "e", "f", "3"), //3 new Array("g", "h", "i", "4"), //4 new Array("j", "k", "l", "5"), //5 new Array("m", "n", "o", "6"), //6 new Array("p", "q", "r", "s", "7"), //7 new Array("t", "u", "v", "8"), //8 new Array("w", "x", "y", "z", "9") //9 ); //letters cycle round the option function multiTap(key) { //reset multi-tap status variables if(lastKey != key) { resetMultiTap(); } var tmpDate:Date = new Date(); if(lastKeyPressTime > 0 && lastKeyPressTime + SWITCH_DELAY < tmpDate.getTime()) { resetMultiTap(); } var tmpDate:Date = new Date(); lastKeyPressTime = tmpDate.getTime(); keyPosition = nextPosition(key, keyPosition); currentChar = keys[key-1][keyPosition]; if(keyPressedTimes >= 1) { changeCharacter(currentChar); } else { typeSymbol(currentChar); } keyPressedTimes++; lastKey = key; } /** * Reset multi-tap status variables */ function resetMultiTap() { lastKeyPressTime = 0; lastKey = null; keyPressedTimes = 0; keyPosition = -1; currentChar = null; } /** * */ function nextPosition(key, position) {

57

if(position < keys[key-1].length-1) { position++; } else { position = 0; } return position; } function changeCharacter(character) { if(firstSymbolEntered) { backspace(); _root.digitInput.text += character; } else { firstSymbolEntered = true; _root.digitInput.text = character; } } function showHelp() { _root.digitInput.text = "Info: Use multi-tapping to insert text!" } /** Navigation **/ function showNavigation() { _root.digitInput.text = "1: Clip 1\n2: Clip 2"; } function doAction(key) { switch(key) { case "1": this.gotoAndStop(5); break; case "2": this.gotoAndStop(10); break; case "0": this.gotoAndStop(1); break; default: _root.digitInput.text = key + " performed!"; var song_sound:Sound = new Sound(); song_sound.attachSound("logon_sound"); song_sound.start(); break; } }

58

Appendix B: Providers of IVVR and Related Services

Celudan Technologies, USA/Spain (http://www.celudan.com/)

CosmoCom, Inc.; USA (http://www.cosmocom.com/)

CreaLog Software Entwicklung und Beratung GmbH, Germany (http://www.crealog.com/)

Dialogic, Worldwide (http://www.dialogic.com/)

Dilithium, USA (http://www.dilithiumnetworks.com/)

Exit Games GmbH, Germany (http://www.exitgames.com/)

I6NET Solutions and Technologies, SL, Spain (http://www.i6net.com/)

Legion Interactive, Australia (http://www.legioninteractive.com.au/)

Mobile Communications Media Sdn. Bhd., Malaysia (http://www.mocome.net/)

Ugunduzi Ltd., Israel (http://www.ugunduzi.com/)

WHATEVER MOBILE GmbH, Germany (http://www.whatevermobile.com/)

59

Bibliography 3GPP. TS 22.105 Services and service capabilities. December 2008. 22 January 2009

<http://www.3gpp.org/ftp/Specs/html-info/22105.htm>.

—. TS 26.110 3G-324M General description. Vers. Release 7. June 2007. 20 January 2009

<http://www.3gpp.org/ftp/Specs/html-info/26110.htm>.

—. TS 26.111 Modifications to H.324. Vers. Release 7. June 2008. 20 January 2009

<http://www.3gpp.org/ftp/Specs/html-info/26111.htm>.

Anderson, Dean; Lamberson, Jim; Sensoray. Open Source VLC to Directshow Bridge. 2008.

12 January 2009 <http://www.sensoray.com/support/videoLan.htm>.

Atul, Puri and Chen Tsuhan. Multimedia Systems, Standards, and Networks. CRC Press,

2000.

Barth, Peter; Steffen, Thomas; FH Wiesbaden. Usability Lecture WS 04/05. Lecture.

Wiesbaden, 2004.

Basso, Andrea and Hari Kalva. “Beyond 3G video mobile conversational services: An

overview of 3G-324M based messaging and streaming.” IEEE ISMSE'04 (2004).

Bucolo, Sam, Mark Billinghurst and David Sickinger. User Experiences with Mobile Phone

Camera Game Interfaces. Christchurch, New Zealand: University of Canterbury, 2005.

CreaLog GmbH. CreaLog präsentiert erstes interaktives Videotelefon-Gewinnspiel

Deutschlands - Anrufer lenken den Rentier-Schlitten per Sprache. 17 December 2007. 9

February 2009 <http://www.crealog.com/de/news/archiv07.htm>.

—. “IVVR Mobile Game.” Weihnachtsmann. 3G Video Call +49 (89) 381 55 555: CreaLog

GmbH, 2007.

Dahm, Markus. Grundlagen der Mensch-Computer-Interaktion. München: Pearson Studium,

2006.

Etoh, Minoru. Next Generation Mobile Systems 3G and Beyond. John Wiley & Sons, Ltd,

2005.

Furht, Borko and Mohammad Ilyas. Wireless Internet Handbook: Technologies, Standards,

and Applications (Internet and Communications). Auerbach Publications, 2003.

60

Hansen, Frode Ørbek. “Real Time Video Transmission in UMTS.” Postgraduate Thesis in

Information and Communication Technology. New Zealand: Agder University College, May

2001.

Harlow, Jo. “Nokia S60 Summit.” Barcelona, May 2008.

Holma, Harri and Antti Toskala. WCDMA for UMTS. Vol. Third Edition. John Wiley &

Sons, Ltd, 2004.

ITU-T. “H.263 Video coding for low bit rate communication.” Recommendation. 2005.

—. “ITU-T Recommendation H.324: Terminal for low bit-rate multimedia.” 2005.

—. “LS Reply on H.324 Text Conversation.” 2007.

ITU-T Study Group No. 16. “Corrigendum to ITU-T Recommendation H.324.” 2002.

Jabri, Marwan. „Mobile Videotelefonie.“ telekom praxis (2005): 34-36.

Jones, Matt and Gary Marsden. Mobile Interaction Design. Chichester: John Wiley & Sons,

Ltd, 2006.

Kleinen, Barbara. “Lecture FHTW Berlin.” Computer Supported Cooperative Work. 2007.

Koivisto, Elina M.I. and Christian Wenninger. “Enhanching Player Experience in

MMORPGs with Mobile Features.” 2005.

Kwon, David and Peter Driessen. “Error Concealment Techniques for H.263 Video

Transmission.” IEEE (1999): 276-279.

LASeR Interest Group. Overview. 03 March 2006. 17 February 2009 <http://www.mpeg-

laser.org/html/overview_contextO.htm>.

Mirial s.u.r.l. 3G-to-TV Video Calling Solution. 2009. 28 January 2009

<http://www.mirial.com/pdf/Whitepaper/3G-to-TV_Video_Calls.pdf>.

Myers, David J. Mobile Video Telephony. McGraw-Hill Professional, 2004.

NMS Communications. 3G-324M Video Technology Overview. 2008. 28 January 2009

<http://www.nmscommunications.com/DevPlatforms/OpenAccess/Technologies/3G324Man

dIPVideo/TechnologyOverview.htm>.

61

Nokia. N96 Specifications. 2009. 30 January 2009 <http://www.nokia.co.uk/A4835651>.

Pias, Claus. Computer Spiel Welten. Dissertation. Professur Geschichte und Theorie

künstlicher Welten. Weimar, 2004.

RADVISION Ltd. “3G Powered 3G-324M Protocol.” 2002.

Röber, Niklas and Maic Masuch. Playing Audio-Only Games, A compendium of interacting

with virtual, auditory worlds. Proceedings of DiGRA 2005 Conference. Magdeburg,

Germany: DiGRA, 2005.

Sang-Bong, Lee, et al. “Error Concealment for 3G-324M Mobile Videophones Over a

WCDMA networks.” unknown. IEEE. 6 Feburary 2009 <IEEE Xplore, Technische

Universitaet Berlin>.

Schiller, Jochen H. Mobile Communications. Vol. Second Edition. Pearson Eduction

Limited, 2003.

Shii. Visual Novel Terminology. 09 February 2009. 22 February 2009

<http://www.shii.org/translate/>.

Sommerville, Ian. Software Engineering. Vol. Eight Edition. Pearson Education Limited,

2007.

Sony Computer Entertainment America Inc. THE EYE OF JUDGMENT. 2008. 12 February

2009

<http://www.us.playstation.com/PS3/Games/THE_EYE_OF_JUDGMENT/Description>.

Tenomichi. About StreamMyGame. 2009. 26 January 2009

<http://www.streammygame.com/smg/modules.php?name=About>.

Tran, Khoa Nguyen and Zhiyong Huang. Design and Implementation of a Build-in Camera

based User Interface for Mobile Games. ACM Report. Perth: GRAPHITE, 2007.

Turner, Brough. Video over Mobile IP — Operators Shoot Themselves in the Foot. February

2008. 11 February 2009 <http://www.tmcnet.com/voip/0208/next-wave-redux-video-over-

mobile-ip-operators-shoot-themselves-in-the-foot.htm>.

Turner, Ian. “Trens in Linguistic Technology.” CallCenter International Issue 1 2009: 30-34.

62

Ugunduzi Ltd. Ugunduzi - IVVR Services Summary. 2008. 19 January 2009

<http://www.ugunduzi.com/IVVR_Services.html>.

UMTS Forum. “UMTS Forum Report No. 11.” 2000.

VocalTec. Deutsche Telekom ICSS and VocalTec announce solution for international VoIP

interconnection. 1 July 2008. 10 Feburary 2009 <http://ghs-

internet.telekom.de/dtag/cms/content/ICSS/en/374426;jsessionid=8B4D6A1DE09A119828B

0A3B89CD60663>.

Voip-Info contributors. Asterisk H324M. 24 November 2008. 21 January 2009

<http://www.voip-info.org/wiki/page_history.php?page_id=2104&preview=19>.

Weiss, Scott. Handheld Usability. New York: John Wiley & Sons, Ltd, 2002.

Wikipedia contributors. Jitter. 26 January 2009. 10 February 2009

<http://en.wikipedia.org/w/index.php?title=Jitter&oldid=266478536#Anti-jitter_circuits>.

—. Visual novel. 08 February 2009. 22 February 2009

<http://en.wikipedia.org/w/index.php?title=Visual_novel&oldid=269225498>.

Yadav, Sachendra. Why haven’t Video Calls (Mobile Video Telephony) taken off? 11 June

2008. 30 January 2009 <http://sachendra.wordpress.com/2008/06/11/why-havent-video-calls-

mobile-video-telephony-taken-off/>.

You, Yilun, et al. “Deploying and Evaluating a Mixed Reality Mobile Treasure Hunt:

Snap2Play.” MobileHCI (2008): 335-338.

63

Acronyms

2G Second generation mobile networks, services and technologies 3G Third generation mobile networks, services and technologies 3GPP 3rd Generation Partnership Project AAC Advanced Audio Coding AL-PDU Adaption Layer Protocol Data Unit AMR Adaptive Multi-Rate BREW Binary Runtime Environment for Wireless BRI Basic Rate Interface BSS Base Station System BTS Base Transceiver Station CCSRL Control Channel Segmentation and Reassembly Layer CRC Cyclic Redundancy Check CS Circuit-Switched DTMF Dual-Tone Multi-Frequency GOB Group of Blocks GoD Gaming on Demand GPRS General Packet Radio Service GPS Global Positioning System GSTN Generalised Switched Telephone Network HSDPA High-Speed Downlink Packet Access HSS Home Subscriber Servers IMS IP Multimedia Subsystem ISDN Integrated Services Digital Network ITU International Telecommunication Union ITU-T Telecommunication Standardization Sector of ITU IVR Interactive Voice Response IVVR Interactive Voice and Video Response J2ME Java 2 Micro Edition LAN Local Area Network LAPM Link Access Procedure for Modems LASeR Lightweight Application Scene Representation MCU Multipoint Control Unit MS Mobile Station MSC Mobile Switching Center NB Narrowband NGN Next Generation Network N-ISDN Narrowband Integrated Services Digital Network NMS Network Management Subsystem NSRP Numbered Simple Retransmission Protocol P2P Peer-to-peer PBX Private Branch Exchange PCM Pulse-Code Modulation PDU Protocol Data Unit

64

PLMN Public Land Mobile Network PRI Primary Rate Interface PS Packet-Switched PSTN Public Switched Telephone Network QCIF Quarter Common Intermediate Format QoS Quality-of-service RNS Radio Network Subsystem RTP Real-Time Transport Protocol SDP Session Description Protocol SIM Subscriber Identity Module SIP Session Initiation Protocol SMS Short Message Service SRP Simple Retransmission Protocol SS7 Signaling System #7 SWF Small Web Format for Flash Applications UMTS Universal Mobile Telecommunications System USIM UMTS SIM VoIP Voice over IP VVoIP Voice and Video over IP WAP Wireless Application Protocol WB Wideband W-CDMA Wideband Code Division Multiple Access XML Extensible Markup Language

Declaration of Independent Work

With this, I declare that I have written this

paper on my own, distinguished citations, and

used no other than the named sources and aids.

____________________________________ _______________ Signature Date