APPLICATION OF CHROMA KEY AND LOW BIT-RATE ENCODING ... · 1. Title: APPLICATION OF CHROMA KEY AND LOW BIT-RATE ENCODING TECHNIQUE IN INTERNET VIDEO STREAMING Total Number of Pages:

APPLICATION OF CHROMA KEY AND LOW BIT-RATE ENCODING

TECHNIQUE IN INTERNET VIDEO STREAMING

A Thesis

Presented to the Faculty

of the Graduate Program in Information Technology

Saint Louis University

In Partial Fulfillment

of the Requirements for the Degree

Master of Science in Information Technology

by

Sushil Kumar Sharma

August 2003

ACKNOWLEDGEMENTS The complexity of this paper, especially required the help and input of many

capable people. Without their support, this paper could not have been completed.

In this regard, my most heartfelt acknowledgement and appreciation to my adviser

Mr. Daniel-Rey M. Bayog who is also the Graduate Program Co-ordinator. He is always

willing and has untiringly devoted his precious time in guiding and assisting me; he is

always there when I needed him. I would also like to take this opportunity to express my

sincere thanks to Engr. Angelito C. Peralta, Director of the Management of Information

System Office; Ms. Claire B. Berto; and Mrs. Cecilia A. Mercado, Dean of the College of

Information and Computing Sciences, and members of advisory committee for their

encouragement, guidance, valuable suggestions and moral support. I also appreciate the

encouragement and support provided by the instructors of the College of Information and

Computing Sciences Departments and the MIS staff.

Special thanks also go to my colleague Deepak Kumar Shrestha, who is always

patient and understanding with my questions. I also wish to express thanks to my friends

and peer members:

Sanjay Kumar Hirachan

Ram Pulami

Dujendra Kumar Basnet

Prem Sagar Subedi

Yamuna Prasad Sah

Niraj Narsingh Rajbhandari

Niraj Man Shrestha

Narendra Pradhan

Sita Ram Kafle

Rajiv Gupta

Rabin Gupta

Arket Raj Maharjan

Pramod Tuladhar

Manish Pradhan

Keyoor Gautam

Roshan Lal Dhonju

Sudesh Shrestha and

the other Nepalese friends studying in SLU, UB, BCF and BSU who made my stay in

the Philippines pleasant, memorable and meaningful.

My classmates and friends Tabdi Fargo, Carmelo Fidel Rustia, Grace A. Babate

Adelaida B. Laking, Dennis Tampoa, Christever Del Rosario and Teofilo Llanes who

helped me during my study period in this university. I have a special place in my heart

for them.

I would like to dedicate this thesis to my parents who have made many sacrifices

to allow me to be where I am today. My parents have taught me to value integrity,

determination, and concern for other people that I have found tremendously valuable. I

am very grateful for everything that they have provided to me. My brothers, sisters and

the other family members and relatives who gave their unending love, financial and

moral support, encouragement, understanding, and inspiration.

Last but not the least, I also gratefully acknowledge the direct encouragement and

support provided by Ms. Ladyluck Katter.

Above all, the Almighty God for showering his unending blessings on me.

Sushil Kumar Sharma

vii

TABLE OF CONTENTS

Page

List of Tables ..................................................................................................... x

List of Figures ..................................................................................................... xi

List of Appendices ............................................................................................... xii

CHAPTER

1 Introduction .............................................................................. 1

1.1 Conceptual Framework………………………………... 6

1.2 Statement of the Problem…………………………….... 12

1.3 Objectives of the Study……………………………....... 14

1.4 Scope and Delimitation of the Study………………....... 14

1.5 Significance of the Study………………………............ 15

1.6 Definition of Terms……………………......................... 15

2 Study of Related Literature ..................................................... 20

2.1 Delivery of Video-Audio ............................................... 20

2.1.1 Video-Audio Streaming .....…………………….. 20

2.1.2 Congestion Control …....……………………...... 23

2.1.3 Transport Protocol ...………………………........ 23

2.1.4 Bandwidth ……………………………............... 23

2.1.5 Chroma-keying Technique ..…………………… 26

2.1.6 AMOS (Active MPEG-4 Object Segmentation System) …............................................................

26

2.2 Technology and Techniques Related to the Video and

Audio …......................................................................... 29

2.2.1 Streaming Media Technology ………………….. 29

2.2.2 Live Intranet Distance Learning System using MPEG-4 over RTP/RTSP …………………........ 30

2.2.3 Some Low Bit-rate Video Coding Technique ...... 31

viii

CHAPTER Page

2.2.3.1 Model-based Techniques for Low Bit-rate Video Technique .............................. 31

2.2.3.2 Low Bit-rate Speech Coding .................... 32

2.2.4 Codec ……………..…………............................. 32

2.2.4.1 Quick Time ………….....……..…........... 33

2.2.4.2 Real Audio Real Video ............................ 33

2.2.4.3 Window Media Technology ……………. 34

2.3 A Quick Glance at Audio-Video Software and Hardware used in Developing Streaming Media ....... 34

2.3.1 Capturing Video ………………………............... 34

2.3.1.1 IEEE 1394, Firewire iLink ………........... 35

2.3.1.2 Osprey – 500 …………………................ 35

2.3.1.3 Pinnacle Video Capture Card …….…….. 35

2.3.2 Editing and Encoding Tools …………………..... 36

2.3.2.1 Final Cut Pro 4 ...……………...…........... 36

2.3.2.2 Vegas Video 3.0 ...…………….…........... 36

2.3.2.3 Adobe Premiere 6.5 ..……….…...........… 36

2.3.2.4 MGI VideoWave 4.0 .…………………... 37

2.3.2.5 VideoMach 3.0 ......………...................... 38

2.3.2.6 Window Media Encoder 7.1 ………….... 39

2.3.2.7 Helix Producer Basic ............................... 40

3 Solution Methods and Techniques .......................................... 44

4 Presentation of Findings .......................................................... 54

4.1 Effective Delivery of Video and Audio ......................... 55

4.1.1 Packaging Video – audio and Presentation Material ................................................................ 59

4.2 Technologies and Techniques of Low bit-rate Video-Audio Encoding ……….……………............................ 60

4.3 Streaming Media Presentation Development ................. 69

ix

CHAPTER Page

4.3.1 Hardware and Software used in the Development Procedure ...................................... 69

4.3.2 Development Procedure for the Streaming Media Presentation ...………........................................... 69

4.3.2.1 Recording and Capturing the Video (A) .. 71

4.3.2.2 Splitting the Video and Audio Extraction (B) ...............................................................

72

4.3.2.3 Video Positioning Required Before Superimposing the Presentation Slide (C) 72

4.3.2.4 Compositing the Slides and the Video of the Speaker (D) ........................................ 74

4.3.2.4.1 Problems encountered when presentation slide and video file were treated separately ............. 77

4.3.2.5 Low Bit-rate Video-Audio Encoding (E) . 77

5 Summary, Conclusion and Recommendations ...................... 81

5.1 Summary of Findings ……………………………......... 81

5.1.1 Visual Outcome at Various Stages of the Content Development ……………………………............ 83

5.2 Conclusion ………………………………….................. 86

5.3 Recommendations …………………………….............. 88

Bibliography ..................................................................................................... 89

Curriculum Vitae ......................................................................................... 115

x

LIST OF TABLES

TABLE Page

1 Growth of Streaming Media Use …………………............................... 22

2 Recommended Streaming Rates ………………………….................... 25

3 System Requirements for Encoding ……………………...................... 39

4 (a) Listening-quality Scale …………………………............................. 47

(b) Listening-effort Scale …………………………............................... 47

(c) Loudness-preference Scale ............................................................... 47

5 Conversation Difficulty Scale ................................................................ 48

6 (a) Image Quality Scale ………………………..................................... 48

(b) Image Impairment Scale ………………………............................... 49

(c) Double Stimulus Continuous Quality Scale ……………................. 49

7 Listening-effort Scale …………………………..................................... 51

8 Image Impairment Scale ……………………….................................... 51

9 Comparative Data Between Original Non-streaming Media and Converted Streaming Media Formats ......………….............................. 56

10 Summary of Converted Streaming Media Formats ............................... 59

11 Test Results for 56 Kbps Modem Users ................................................ 66

12 Quality Assessment of Video - Audio and Bandwidth .…..................... 67

13 Summary Results of the Different Codecs and Software ...................... 68

xi

LIST OF FIGURES

FIGURE Page

1 An Architecture for Audio - Video Streaming .....……………........... 7

2 Conceptual Paradigm ….….…………………………………............. 9

3 Presentation Data Must Fit with Player’s Bandwidth ……….............. 25

4 (a) Architecture of the AMOS System ...………………...................... 28

(b) Region in Inside the Object and those Outside the Object are Both Tracked Overtime ….......…………………………................ 28

5 Model-based Coding System …………………………....................... 31

6 The Flow Chart of Streaming Media Presentation Development Procedure ......…................................................................................... 70

7 Sample Video with Plain Background ………………………............. 71

8 Positioning the Video in Adobe Premier 6.5 ………........................... 73

9 Output of Repositioning the Video ...................................................... 73

10 Superimposing PowerPoint Slide on the Speaker’s Video……........... 75

11 Superimpose of the PowerPoint Slide in the Back of the Video ......... 76

12 Encoding the Video and Audio in Window Media Encoder ............... 78

13 After Encoding the Video File in .wmv Format .................................. 79

14 Video Frame in the Original Digital Movie …….…………..........…. 83

15 Output of Repositioning the Video ..…………………………............ 84

16 Superimpose of the PowerPoint Slide in the Back of the Video ......... 84

17 After Encoding the Video file in .wmv Format ………………........... 85

xii

LIST OF APPENDICES

APPENDICES

A Sample Video Frame Encoded in Window Media Encoder ...... 96

B Sample Video Frame Encoded in Vegas Video ......................... 97

C Sample Video Frame Encoded in Helix Producer Basic and Vegas Video ............................................................................... 98

D Sample Video Frame Encoded in Quick Time Pro and Vegas Video .......................................................................................... 99

E Methodology For the Subjective Assessment of the Quality of Television Pictures .....................................................................

100

F Methods for Subjective Determination of Transmission Quality ........................................................................................ 110

THESIS ABSTRACT

1. Title: APPLICATION OF CHROMA KEY AND LOW BIT-RATE ENCODING TECHNIQUE IN INTERNET VIDEO STREAMING

Total Number of Pages: 127 Text Number of Pages: 88

2. Author: SUSHIL KUMAR SHARMA 3. Type of Document: Thesis 4. Type of Publication : Unpublished 5. Host / Accrediting Institution Saint Louis University (Private) Bonifacio Street, Baguio City CHED-CAR 6. Sponsor (for funded research): not applicable 7. Keyword: LOW BIT-RATE ENCODING, CHROMA KEY, VIDEO-AUDIO

STREAMING 8. Abstract 8.1 Summary : The research presented an application of chroma-key and low bit-rate encoding techniques in Internet video streaming over 56 kbps bandwidth connection. To deliver the video-audio streaming in low bandwidth communication channels, low bit rate encoding technique is effective for the normal dial up modem users. The chroma-key technique is used to develop the presentation material and video-audio in single frame. The design of study is experimental. The data gathering tools are 5 minutes video-audio clip; encoding software and low bit-rate codecs. 8.2 Findings : The study provided an application of chroma-key and low bit-rate encoding techniques in Internet video streaming. It is possible to produce a streaming media using low bit-rate encoding technique delivering video-audio effectively in normal dial-up connection modem users. The chroma-key techniques are helpful to packaging video-audio and presentation slides in single frame. There are three major types of streaming media formats: (a) Window Media Video Audio (.wmv) (b) Real Video Audio (.rm) and (c) Quick Time (.mov). Window Media Encoder, Helix Producer, Vegas Video and Quick Time are also the major encoding softwares used to develop streaming media.

8.3. Conclusions : The low bit rate encoding and the chroma-key techniques can be used to effectively deliver the video-audio and presentation material. The chroma-key tool can be used to package video-audio and presentation material in a single frame to deliver in a web server. Based on comparison of video-audio codecs and software, the combination of MPEG-4, ACELP.net and Window Media Encoder is the best software, and Real Video 9 and Real Audio 8 and Helix Producer Basic are the second best. 8.4 Recommendations : Some of the limitations in this study like unsmooth edges of the speaker's video and reduced audio quality can be further improved by taking the video footage in a well lit room and with good sound equipment as found in studios. It is highly recommended for other researchers to study the feasibility of using these type of contents. The laboratory testing for video-audio quality is recommended. To get better quality of presentation material it is highly recommended to use real time mixing technology with audio-video and presentation material.

CHAPTER 1

Introduction

Internet technology is changing at a rapid pace. The faster the technology

changes, the more the expectation of people from the Internet. Once, the users were

satisfied with only the text and still images on their web pages, but now they want video

at the faster speed. They expect to see television quality and are unsatisfied when they

see anything less. Because of bandwidth issues, we are still several years away from that

reality, but Internet technology is changing on a daily basis. Internet streaming video is

one way to deliver video over the Internet. With streaming video, many daily

organizational tasks are simpler and cheaper. Designers can broadcast lectures, make

announcements, deliver seminars and show users how certain things work. Users can

view it live, instantaneously, quenching some of their thirst for fast, high quality video. 1

Many large organizations are using the technology to broadcast annual meetings

so that remote offices are able to view the meeting live, or for those who have prior

commitments they are able to video the meeting and view at a later date. Not only are

meetings being viewed in the corporate setting, Internet is being used for training

purposes too. Internet streaming video allows current training applications to be

administered to a broad, geographically dispersed audience simultaneously, cutting the

costs associated with training. 2

Another large user of streaming video is distance learning. This application is

spreading quickly throughout the educational system. Internet classes are administered

2

using this technology. Many of the large universities are broadcasting lectures and

archiving them so that students who miss a class are able to view the archives on their

convenience at their own computer. Micke O, Donoghu et al3 concluded that the use of

the PowerPoint Material indexed to the presentation created a dynamic presentation

environment in which each media (text, graphics, audio and video) was designed to

compliment the others. Many users commented on the innovative use of the browser

within the presentation which not only provided a sense of personal involvement, but

also brought together other resources related to the discussion.

The chroma-key is a well-known video mixing technique, practiced over many

years in television production, for insertion of foreground action shot at a different

location into a selected background scene. Initially the main reason for using this

technique has been the requirement of artistic directors to film people in places where this

is hard to achieve. Later, the economical advantages also encouraged film people to use

the chroma-key. The next step has been the usage of synthetically generated

backgrounds, firstly hand-constructed and painted then generated by computers, to put

actors into locations which do not exist in reality. The chroma-key is also the basis for

performing the mixing in a virtual studio. However, one essential requirement for a

virtual studio construction is the ability to combine foreground and background images in

a way that actors from the foreground image could be covered by set components from

the background image.

Other applications include those associated with websites. Many web developers

are incorporating streaming video-audio into websites to capture the attention and time of

visitors to their site.

3

Until recently, audios and videos on the web were primarily a download-and-play

technology. We had to first download an entire media file before it could play. It was

like pouring milk into a glass first and then drinking it. But because media files are

usually very large and take a long time to download; the only content found on the web

are clips lasting 30 seconds or even shorter. Even these files take 20 minutes or longer to

download.4 It has become evident that the future of digital media on demand is bright.

Educators are presented with new possibilities but faced also with new challenges.

However, as the bandwidth becomes less problematic and the compression technology

more efficient, it is clear that streaming media will become a significant delivery mode

for video-audio instructions as well as presentation materials.

Traditional applications such as interactive terminals (telnet), bulk file transfers

(email, FTP) and the World Wide Web are becoming an attractive medium for a broader

spectrum of applications. The applications that rely on the real-time delivery of data,

such as video-conferencing tools, Internet telephony, streaming audio and video players

are gaining prominence in the Internet application space. In particular, the streaming

media application has considerable potential to change the way people watch video.

While most people today think of sitting in front of a television to watch a movie, the

ability to deliver high-quality streaming video would allow the Internet to compete with

traditional modes of video content distribution.

Using the Internet as a medium for transmission of real-time interactive video is a

challenging problem. Although recent efforts have been made and some progress has

been achieved in terms of streaming media delivery, today’s solutions are proprietary,

inflexible, and do not provide the user with a pleasant viewing experience.5 In general,

4

current streaming video applications deliver low quality pictures and require large

amount of buffering. Therefore, they neither allow high user interactivity nor respond

well to the changing conditions on the Internet. Due to this problem, end users at remote

locations cannot view video presentation clearly on their desktops, especially for those

who have 56 kbps connection.

The importance of visual communication has increased tremendously in the last

decade. The progress in micro-electronics and computer technology together with the

creation of networks operating with various channel capacities is the basis of an

infrastructure for a new era of telecommunications. New applications are preparing a

revolution in every day’s life of our modern society. Emerging applications such as

videoconferencing, cellular videophones and multimedia will have a great impact on

nowadays professional life, education and entertainment. The digital representation of

the visual information in its canonic form leads to a huge amount of data. In order to

meet the requirements of the new applications, powerful image sequence compression

techniques are needed to drastically reduce the global bit rate.

A number of standards have been defined for the compression of visual

information. The JPEG still image compressor was proposed by the Joint Photographic

Expert Group and it is also a general-purpose image compression standard. The Moving

Picture Expert Group (MPEG) standards address the compression of video signals.

MPEG-1 operates at bit rates of about 1.5 Mbit/s and targets storage and transmission

over communication channels as the integrated-services digital network (ISDN) or the

local area network (LAN). MPEG-2 operates at bit rates around 10 Mbit/s and is

designed for the compression of higher resolution video signals. The recommendation

5

H.261 was proposed by the International Telegraph and Telephone Consultative

Committee. Based on the standard, videoconferencing at bit rates of 64kbit/s has become

feasible. This requires the capacity of one channel of the ISDN. In the near future,

modern visual communications applications will be possible for the general public. For

that objective, the transmission media must switch to Public Switched Telephone

Networks (PSTN) or mobile channels. The transmission of the video sequences at bit

rates as low as 9.6 kbit/s will be strongly needed. Efforts of defining new standards for

these applications are still in the beginning phase. Several expert groups have been

created to pursue this objective. The major ones are ISO/MPEG-4 and ITU-T/H.26P.6

An uncompressed video sequence for very low bit rates applications typically

requires a bit-stream of up to 10 Mbit/s. In order to achieve very low data rates

compression ratios of about 1000:1 are required to meet the needs of the large public.7

Today’s streaming applications are closed and proprietary; the emerging MPEG-4

standard has some acceptance and appears to be promising for open standard in Internet

video.8 It has been believed that MPEG-4 has the potential to make significant inroads as

the preferred streaming media format over the next few years because of its superior

compression, its ability to code individual objects in a video stream, and its increasing

interest in the industry. It also incorporates several feature-based schemes for very low

bit-rate coding: FDP and FAP sets (Facial Definition and Facial Animation Parameters).

It can, for example, be used for a real videophone communication requiring lower

bandwidth than ever imaginable traditional stream-coding techniques. Feature-based

media coding is not yet widely available and it will not also become popular except in the

limited number of application areas.

6

In the context of video-audio and presentation material delivery in the Internet at

the remote locations users can expect rich viewing environment plus pleasing

presentation on his desktop. With the streaming media, the quick playback can be

handled where the player can start playback without waiting for the whole media file to

be downloaded to the local storage. In order to resolve the issue of quick playback over

today’s narrow communication channels, the low bit-rate audio-video coding is a natural

choice.

The researcher focused on the presentation content development procedure for

rich viewing environment and delivery of that material through low bandwidth condition

while maintaining the reasonable quality of the slides on the client’s desktop at the

remote location. It is coupled with the streaming media technology in achieving the

presentation material and audio-video with the existing web server infrastructure.

1.1 Conceptual Framework

Recent advances in computing technology, compression technology, high

bandwidth storage devices, and high speed networks have made it feasible to provide real

time multimedia services over the Internet. Real time multimedia, as the name implies

has timing constraints. For example, audio and video data must be played out

continuously. If the data do not arrive in time, the play out process will pause which is

annoying to human ears and eyes.

Real time transport of live video or stored video is the predominant part of real-

time multimedia. There are two modes for transmission of stored video over the Internet,

namely, the download mode and streaming mode i.e., video streaming. In the download

7

mode, a user downloads the entire video file and then plays back the video file. However,

full file transfer in the download mode usually suffers long and perhaps unacceptable

transfer time. In contrast, in the streaming mode, the video content need not be

downloaded in full, but is being played out while parts of the content are being received

and decoded. Due to its real-time nature, video streaming typically has bandwidth delay

and loss requirements. However, the current best effort Internet does not offer any

quality of service (QoS) guarantees to streaming video over the Internet.9 In addition, it

is difficult for multicast to efficiently support multicast video while providing service

flexibility to meet a wide range of quality of service requirements from the users. Thus,

designing mechanisms and protocols for the Internet streaming video poses many

challenges.

Figure 1 shows an architecture for audio-video streaming. Raw video and audio

data are pre-compressed by video compression and audio compression algorithms and

Figure 1

An Architecture for Audio-Video Streaming

Streaming Server Client/Receiver

Storage Device Compressed

Video

Compressed Audio

Application-layerQoS Control

Transport Protocol

Video C

Audio C

Raw Video

Raw Audio

Video

Transp

Application-layer QoS Control

AudiMedia

Synchronization

Internet

8

then saved in storage devices. Upon the client's request, a streaming server retrieves

compressed video/audio data from storage devices and then the application-layer quality

of service control module adapts the video/audio bit-streams according to the network

status and quality of service requirements. After the adaptation, the transport protocols

pocketsize the compressed bit-streams and send the video/audio packets to the Internet.

Packets may be dropped or excessive delay may be experienced inside the Internet due to

congestion. To improve the quality of vide-audio transmission, continuous media

distribution service e.g., caching, is deployed in the Internet. Packets that are

successfully delivered to the receiver, first pass through the transport layers and then they

are processed by the application layer before being decoded at the video/audio decoder.

To achieve synchronization between video and audio presentation, media synchronization

mechanisms are required. From Fig. 1, the six areas (Video compression; Application-

layer QoS control; Continuous media distribution services; Streaming servers; Media

synchronization mechanisms and Protocols for streaming) can be seen closely related and

they are coherent constituents of the video streaming architecture.

Since the start of history of computer, our main communication medium with the

computer is the console (screen). The input and output of text and bitmaps are natively

incorporated in the computer system. It is well understood that anything that can be

displayed in the monitor can also be extracted from the screen and saved in the file.

Based from this idea, business presentation materials like presentation slides can be

captured from the monitor and saved in different file formats.

Traditional media (audio/video like VCD, VHS, BETAMAX movies) contents

are not optimized for delivery in the Internet because during that time, Internet was not so

9

Procedure in Developing Presentation

Figure 2

Conceptual Paradigm

Streaming Media Technology

(Using Low Bit Rate Encoding Technique and Chroma-Key Tool based on an acceptable data rate in a 56 kbps modem)

• Streaming media with reasonable quality of video-audio and presentation slide

• Technologies and Techniques of Streaming Media • Procedure of developing streaming media

Presentation Material

Raw Video and Audio

Non-Streaming Media

10

popular and those formats were developed solely for the specialized hardware devices in

mind. Now with the rapid growth in multimedia technology, those traditional formats

could be captured in digital format and reconverted to any format we wish.

Due to the increased popularity of the Internet for communication and business,

and due to the recent developments in Information Technology, the larger uncompressed

media files could be compressed to save the disk space required without compromising

on the quality of the media content at the same time; it could also be changed to

streaming format, which reduces waiting time in the client side.

High demand from the film industry in using computer as a tool to generate special

effects now have produced many technologies related to multimedia, including rich

graphical interface, sound and video. Now we also have professional tools in computer

for producing special effects from mixing, animation, sound effects etc. Among mixing

technologies, the chroma-key and the luma key are well known technologies for

superimposing the one graphical content over another.

The researcher aims to develop a procedure of applying the chroma-key and the

low bit-rate encoding technique in Internet video streaming. This conceptual framework

is presented diagrammatically in the conceptual paradigm as shown in figure 2.

The framework is based on an input-process-output model, where the non-

streaming media (composed of the raw video and audio, and the presentation slide) is the

input to the study. The process (which is determined in the study), is the procedure to

convert the non-streaming media to a streaming media, which is the output of the study.

Different low bit-rate encoding techniques as well as the chroma-key tool are used in the

conversion. These streaming media technologies served as the independent variables.

11

The resulting streaming media is evaluated and compared in terms of video and audio

quality as well as the acceptability of the data rate in a 56 kbps modem. These

measurements are the dependent variables in the study.

In the video-audio and presentation material delivery over the existing web infrastructure,

remote clients expects rich viewing environment plus pleasing presentation. The

streaming media technology using low bit-rate encoding technique resolves the issue of

quick playback, where the player starts playing immediately without waiting for the

whole media file to be downloaded on the client side and the chroma-key tool will be the

mixing technology superimposing the video over the presentation material. In this study,

the researcher focused on the delivery of video-audio and presentation material in single

frame using low bit rate encoding technique and chroma-key tool for the normal dial-up

connection of 56 kbps.

The simplistic approach to the low bit-rate video coding degrades the quality of

the presentation video, especially the slides being explained by the speaker. The

researcher focused on the presentation content development procedure for rich viewing

environment and delivery of that material through normal dial-up connection while

maintaining the reasonable quality of the slides on the client’s desktop at the remote

location. Thus, a client must have minimum hardware, software and network

transmission requirements as controlled variables needed to play the video and audio to

client’s desktop. These variables exert influence on the aspects of delivery of video-

audio on the network. The client needs at least:

• 56 kbps connection internet line,

• A Pentium 166MHZ processor equivalent,

12

• Minimum 32 MB RAM,

• A 16-bit Sound Card (any type),

• Speakers,

• Color monitor with color depth of 16 bit,

• A 256-color video display card,

• Screen resolution of at least 640 x 480 pixels or higher,

• Any streaming video-audio player which supports file format and codec version.

1.2 Statement of the Problem

Today’s video streaming technology provides various means to preserve the

content in the archives and reuse them whenever needed. But most of the streaming

technology exploited so far are mostly for entertainment media type and designed to be

used on the high bandwidth connections. From this scenario, it is clear that most of the

people with regular low bandwidth connection (usually up to 56kbps) cannot enjoy this

technology.

Observing the current economic condition of the people, especially in the

developing countries, it is hard to afford a personal computer and Internet connection for

personal use and maintain it. Even if a few can afford it, most of them have dial-up

connections of up to 56 kbps. Technologies like DSL and ISDN are mostly used by the

ISP’s (Internet Service Provider), Internet Cafés and other businesses only. It is not

feasible for a person to maintain this type of connection for personal use due to its high

cost. Even though businesses use high-speed connections, the connection speed is

13

usually slow to the end user (depends on varying condition) due to the fact that the single

connection is branched to its maximum limit.

Keeping this scenario in mind, if an end user tries to watch the streaming video

content intended for high-speed connection, the following anomalies would be

encountered:

1. Video playback will be jerky due to the speed limitations.

2. Sometimes the required video frame may not arrive on time and may go blank

during playback.

3. If the end user’s computer is below the standard (like memory requirements,

processor speed etc.) for video playback, the computer may take time to decode

the high-resolution video that will cause the rough and slow playback.

4. Sometimes all the anomalies mentioned could be encountered even if the

requirement is met because network congestion might occur due to the growing

number of connected users in the Internet.

From the points just mentioned, it is natural for someone to think of the solution

for this problem and experiment with the possible available technologies to at least

overcome these kinds of problems. We naturally ask:

1. How can video-audio and presentation be packaged into a single frame and be

delivered effectively on normal dial up connection?

2. What is the best technologies and techniques for developing streaming media that

use low bit rate encoding technique and chroma-key tool?

3. What is the procedure in developing a presentation using the streaming media

technology?

14

1.3 Objectives of the Study

This research aims to provide quality video-audio presentation material to the

clients via web infrastructure using available technology. Specifically, its main

objectives are as follows:

1. To develop the reasonable quality presentation that contains video and audio

together in a single frame through the streaming media technology.

2. To investigate and experiment on the low bit rate encoding and chroma-key

technologies and techniques for developing streaming media containing video-

audio and presentation.

3. To define a procedure in developing presentation using the streaming media

technology.

1.4 Scope and Delimitation of the Study

This study centered on the issue of uninterrupted playback while maintaining the

reasonable quality of the slides on the remote client’s desktop with video streaming which

refers to stored video for remote location users.

This research mainly focused on the normal dial-up Internet connection. It was

implemented on a Window platform due to its friendly interface and also due to the fact

that it is widely used everywhere in the world. This study was tested for research

purpose only. Furthermore, this study did not explain the technical part regarding

uploading video-audio and presentation material on the server side.

15

1.5 Significance of the Study There is no doubt that a revolution in technology-enhanced educational practices

is upon us. Institutions worldwide are leveraging new technologies in Internet audio and

video to maximize the impact of professors’ time, energy, creativity and intellect to reach

students in more efficient and effective ways.

This study is mainly beneficial to those professionals and students who are

involved in the distance education. A student at a remote location can view clearly the

classroom lecture video and presentation slide of the lecture in his/her desktop.

This research is also important to business firms for providing marketing for the

company because it will reduce the download time and deliver the reasonable quality of

presentation slide thus saving time and effort in the creation of streams publicly on a

website. Providing on demand information is a powerful marketing tool that can help

businesses communicate.

This study will also benefit the research activities about streaming media and

distance education. It may serve as an introductory piece for interested researcher of the

said areas.

1.6 Definition of Terms Terms used throughout the study are as follows:

ACELP: Algebraic-Code-Excited Linear Prediction (ACELP) technology is a

standard in low-bit rate speech compression within wireless and Internet applications.

AMOS: Active MPEG-4 Object Segmentation System (AMOS) which

effectively combines automatic region segmentation method with the active method for

16

defining and tracking video objects at a higher level.

Annoying: Not pleaseant video and text when watching.

Bit-rate: The rate at which the encoded bitstream is delivered from the storage

medium to the input of a decoder.

Bandwidth: Amount of data a given connection can pass in a given amount of time.

Blur: Video image became less clear or distinct. Some marks on video make it

unclear.

Buffering: The situation which occurs when a streaming media player is saving

small portions of a streaming media file to local storage for playback.

Capture: It is the process to digitize and record audio and video content from an

analog format.

Chroma-key: A video mixer-based electronic effect, in which a second image

source is substituted for a color (or range of shades within a color) within a video shot.

Compression: Process by which files are reduced in size to save space by the

removal of redundant or less important data. Compressed media files are then

decompressed on the user's end.

Codecs: Compress and Decompress audio and video files into streaming media

files by erasing redundant data.

Effective Delivery: Effective delivery means to deliver the video-audio and

presentation material in 56 kbps connection line, maintaining reasonable quality of

presentation slide in remote users.

Encoding: Compressing audio and video to turn it into streaming media format.

Frame Rate: Number of data which can be transmitted per second.

17

HTTP: The HyperText Transfer Protocol is an application level protocol designed

for distribution of hypertext and multimedia documents over the World Wide Web.

Imperceptible: The degradation of quality of video is less noticeable.

Jerky: Not in smooth motion or interruption occurs while playing video-audio.

Jitter: The variable network latency, normally caused by the queuing of packets

at each router between source and destination.

Low bit-rate video coding: The video compression technique which outputs a

coded video stream of not greater than 64 kbps bit-rate.

MPEG: Moving Picture Expert Group(MPEG) is a standard used for coding and

compressing video.

Multimedia Presentation: The integrated presentation of text, audio and video in

single frame.

Network Congestion: When network equipment including switches and routers,

might experience more traffic than it can handle, the performance degrades. This

situation is called network congestion.

Non-Streaming Media: Video-audio can not be played on desktop without

downloading entire file over the internet or other networks.

QoS: Quality of Service (QoS) provides guarantee on the ability of a network to

deliver predictable results. Elements of network performance within the scope of QoS

often include availability (uptime), bandwidth, latency, and error.

Reasonable Quality: It means users can read text and watch the video and listen

to the audio without any extra effort.

18

Real-Time Delivery of Data: If the actual rate of data delivery matches the rate of

data consumption, then it is called real-time delivery of data.

RTSP: The Real Time Streaming Protocol or RTSP, is an application level

protocol for control over the delivery of data with real-time properties. RTSP provides an

extensible framework to enable controlled, on-demand delivery of real-time data, such as

video and audio from live data and stored clips. The protocol is intended to control

multiple data delivery sessions, provide a means for choosing delivery channels such as

UDP (User Datagram Protocol), multicast UDP and TCP (Transmission Control

Protocol), and provide a means for choosing delivery mechanism based upon RTP

(Realtime Transport Protocol).

Superimposing: It is the process of combining two images to yield a resulting,

and enriched image.

Streaming Media: Process of sending media over the internet or other network,

allowing playback on the desktop as the video is received, rather than requiring that the

entire file be downloded prior to playback.

Streaming Video: It is a seqence of moving images that are sent in compressed

form over the Internet and displayed by the viewer as they arrive. It is with sound.

TCP: Transport Control Protocol (TCP) is a reliable, connection oriented, end-to-

end protocol. Break up data into chunks that never exceed 64K bytes and sends each as

separate IP datagram.

UDP: User Datagram Protocol (UDP). It is an unreliable, connectionless

transport protocol. There is no ordering of packets, no retransmission of lost or damaged

packets, and no splitting of data into packets.

19

NOTES

1“A review of Video Streaming over the Internet”. SuperNOVA Project. DSTC Teaching Report.

<http://archieve.dstc.edu.au/RDUstaff/Jane-hunter/videostreaming.html>,10 August 1997. Accessed in August, 2002.

2Cunningham, David and Francis, Neil. “An introduction to Streaming Video”. Cultivate

Interactive, Issue 4, <http://www.cultivate-int.org/issue4/video/> 7May, 2001. Accessed in August,

2002. 3Micke O,Donoghu, and et al., The Role of Streaming Media in the delivery of distance

learning, Pre-Submission Draft – Lancaster University, August 2000. 4Streaming Methods. Web Server Vs. Streaming Media Server. <http://www.microsoft.com/Windows/windowsmedia/compare/webservvstreamse

rv.asp> Last updated Thrusday, March 21, 2001. Accessed in August, 2002. 5Microsoft Windows Media Player <http://www.microsoft.com/windows/mediaplayer> June 2002. Accessed in July,

2002. 6O. Egger, and et al., Very Low Bit Rate Coding of Visual Information, Swiss Federal

Institute of Technology at Lausanne, Switzerland. 1996. 7T. Ebrahimi, and et. al., New Trends in Very Low Bitrate Video Coding, Proceedings of

the IEEE, July 1995. 8Divx Networks. <http://divxnetworks.com/,2001> (2001). Accessed in September, 2002. 9Dapeng Wu, and et al., Streaming Video over the Internet: Approaches and Directions,

IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, No. 1. February 2001.

10Micke O,Donoghu, Michael Barber and Steve Childs, op. cit.

CHAPTER 2

Study of Related Literature

Video-audio streaming technology is important to individuals and organizations

especially remote users. It is an emerging technology to deliver video-audio and

presentation material in the Internet. This chapter overviews the materials related to the

streaming media, chroma-key and low bit-rate encoding techniques. Section 2.1 presents

all about delivery of audio-video, technologies and techniques related to the audio-video

are discussed in Section 2.2 and Section 2.3 discusses a quick glance at audio-video

software and hardware which are used in developing streaming media.

2.1 Delivery of Video-Audio

2.1.1 Video-Audio Streaming. An emerging, less expensive option to full

broadcast video is “streaming” sound and video over the Internet. The video audio is

streamed through the computer and displayed on the screen without needing to be saved

onto its hard drive, as most hard drives are not large enough to store a whole video

programme. Streaming video-audio can also be broadcast simultaneously around the

world (called streaming or web casting), or achieved and accessed as required.

Video-audio streaming technology becomes of interest and importance to the

educational community. Such interests can be explained by widespread increases in

bandwidth and computation, but interest may also be fueled by institutional needs to

21

increase income by widening participation through the use of information and

communication technology, or to realize lifelong learning policies, or perhaps to develop

cost-effective flexible resources to support off-campus learning.

By using streamed media, it is possible for a user with a suitable computer, web

browser and media player to receive live and recorded video and audio materials without

the need to download large video and audio files or have an ISDN connection. Microsoft

have made use of this technology to support users and developers through a number of

live information and Q&A sessions; the BBC and ITN currently make use of it on their

web sites to show news clips. Television has been used to support educational activities

since 1950s and 1960s, but the cost of producing television programmes has been

prohibitive to many educational establishments1. Satellite broadcasting has been used to

support a range of learning activities across the world, and though transmission uplink

and equipment reception costs have continued to decrease the cost of producing

programmes and materials has remained a barrier to widespread use2. Using broadcast

materials within a networked learning framework has been reported to motivate and

attract adult learner, though there are issues surrounding pedagogic design and

embedding educational practices which need careful consideration.3

Streaming media is a new technology that has entered the distance learning arena.

The technical requirements and issues of production have over shadowed the important

issues related to change in the educational community. How we use it will determine its

future as available delivery method for distance learner.

“Integrating a pedagogical framework based on well-established learning theories

through the design of a Web-delivered classroom encounter is key to establishing many

22

potential roles of streaming media as a delivery method”4. We need to develop

“appropriate instruction technology frameworks” for streaming media which will lay the

“foundation for educational change and deep learning”.5

In addition, Sircar6 argues that the streaming media via the Web can remove the

differences between learning onsite and learning online and that the power of the stream

to deliver anytime, anywhere learning makes it economically viable. The traditional

lecture-test-homework paradigm does not exist when instructional technologies are used

to enhance problem-solving skills, collaboration, and interaction. Video technologies can

be examined within the framework of learner centered principles. Streaming allows us to

restructure the delivery of content by creating small units of instruction that can reflect

best practices.

Streaming media is pervasive on the internet now and is continuing to grow

rapidly. Most streaming media systems have adopted the model of broadcast. The use of

streaming media rose by 17% in 1999 compared to 1998 and this figure increased by

30% in 2000 which is shown in Table 1.

Table 1

Growth of Streaming Media Use 6

Year Growth Rate

In 1998 9% enterprises

In 1999 17% of enterprises

In 2000 30% of enterprises

Until 2004 Streaming Media Services will grow 20 fold to $2.5 billion.

23

2.1.2 Congestion Control. Loss and excessive delay have devastating effect on

video presentation quality and they are usually caused by network congestion. Thus,

congestion control mechanisms at end systems are necessary to help reduce packet loss

and delay.

Congestion control takes the form of rate control7. Rate control is a technique

used to determine the sending rate of video traffic based on the estimated available

bandwidth in the network. There are three kinds of rate control: source based, receive

based and hybrid. The source based rate control is suitable for unicast; the receiver based

and hybrid rate controls are suitable for multicast video8.

2.1.3 Transport Protocol9. Streaming audio and video packets can be delivered

using several transport protocols, each with some advantages and disadvantages.

UDP is the best choice in most cases, because of its lack of retransmissions and

data-rate management. This is ideal for transmitting real-time audio and video data,

which can tolerate some lost packets and need a steady stream of data. Most streaming

servers and proxies implement intelligent retransmission schemes on top of UDP, so that

only lost packets that can be sent to the client in time to get played are retransmitted.

TCP provides an adequate, though not necessarily efficient, protocol for

delivering streaming media content. Its slow-start and its automatic retransmission of

lost packets add unnecessary overhead without improving quality. However, TCP traffic

is much more likely to pass through a firewall than UDP.

2.1.4 Bandwidth. Bandwidth is a big issue and is being addressed by Internet

service providers who are offering distributed servers so that users get local performance.

24

“The services point web traffic to the network hub closest to the user’s location, thereby

reducing the number of router hops, a packet must make and circumventing the already

jammed national and international Internet backbones”10.

Most users do not have adequate bandwidth to receive streaming video at an

acceptable quality, and won’t have it until around 200311.

When the requested video does not stream quickly enough, the presentation is not

smooth. Those connecting at less than T1 speeds will see “choppy, ‘freeze frame’”

pictures.12

The issue of bandwidth is an important one. In the most basic sense, bandwidth

can be defined as the amount of information that can be moved at one time. A good

analogy is that of water passing through a funnel. If the funnel has a wide opening at the

bottom a lot of water can pass through at once. If we fill the funnel at a rate faster than it

can pour out the bottom, then we have exceeded the available bandwidth. If the

bandwidth is limited then the transmission of data can be delayed. This leads to long

waits for the client as information is downloaded.

In the case of streaming video or audio this becomes a crucial point because the

player is interpreting the data stream as it is received. If the information is delayed, the

playback will either skip over the lost data packet or wait for its buffer to fill with enough

packets to continue. Therefore, we have to ensure that the bit rates of encoded files do

not exceed the target audience’s connection speed. Web users with 56 kbps modems, for

example can view only those presentations that stream less than 56 kb of data per second.

Presentation that stream more than that per second may stall because the data cannot get

over the modem fast enough to keep the clip flowing as shown in figure 3.

25

Figure 3

Presentation Data Must Fit with Player’s Bandwidth13

Streaming presentations should never consume all of audience's connection

bandwidth. They must always leave bandwidth for network overhead, error correction,

Table 2

Recommended Streaming Rates

Target Audience Maximum Streaming Rate 14.4 Kbps modem 10 Kbps

28.8 Kbps modem 20 Kbps

56 Kbps modem 34 Kbps

64 Kbps ISDN 45 Kbps

112 Kbps dual ISDN 80 Kbps

Corporate LAN 150 Kbps

Server Player

56 Kbps of Data Over 28.8 Kbps Modem

28.8 Kbps Modem

56 Kbps Modem

Stalled Presentation

26

resending lost data, and so on. Otherwise, they may require frequent re-buffering. Table

2 recommends maximum streaming speeds for common network connections. To reach

56 Kbps modems, for example, a presentation should stream no more than 34 Kb of data

per second.14

2.1.5 Chroma-Keying Technique. Keying means electronically cutting out

portions of a television picture and filling them in with another image.15 Chroma-key is a

special effect that uses color (chroma) for keying. Basically, the chroma-key process

uses a specific color, usually green or blue, for the background over which the keying

occurs. The green/blue becomes transparent during the keying and lets the picture of a

second source show through, without interfering with the foreground image.

Blue screen chroma-keying is a technique widely used in video production to

separate the objects in the foreground from a particular background whose color is

usually blue or green. The separated object can then be digitally composite on top of a

virtual background. Blue-screen chroma-keying and model-based foreground/background

segmentation are techniques to separate the foreground from the background.

This technique was implemented for incorporation such as United States Marine

Corps use it for grounds combat training and simulation applications. The technique

enables the insertion of real objects within the visual frame of a Head-Mounted Display

(HMD). It allows individuals and actual equipment such as maps, weapons, and other

items, to be effectively inserted into the visual display of a simulated environment.16

2.1.6 AMOS (Active MPEG-4 Object Segmentation System). To support

highly interactive functionalities in the future multimedia applications, the MPEG-4

27

standard proposed an object-based coding representation of audio-visual data. Compared

with MPEG-1 and MPEG-2, which provide efficient compression of conventional image

sequence, MPEG-4 provides great potential for content-based search of video data but

still there is challenging tasks for video object segmentation and content-based search

techniques.17

To solve these problems AMOS18 (Active MPEG-4 Object Segmentation

System), is an innovative method for combining low level automatic region segmentation

and tracking methods with an active method for defining and tracking video objects at a

higher level. It combines low level automatic region segmentation and tracking methods

with an active method for defining and tracking video objects at a higher level. The

architecture of the AMOS software is shown in figure 4(a) and 4(b).

The system allows users to identify a semantic object by using a mouse in the

starting frame of a video object. The object is defined by an outline polygon whose

vertices and edges are roughly along the desired object boundary. To tolerate user input

error, a snake algorithm19 is used to align the user- specified polygon to the actual object

boundary. The snake algorithm is based on the minimization of a specific energy function

associated with edge pixels. Users may also choose to skip the snake module if a

relatively accurate outline is already provided. Users can then start the object tracking

process by specifying a few parameters such as color threshold and motion threshold.

At any frame, users may stop the tracking process to refine the object boundary,

change the tracking parameters and resume the tracking process. The original footage

could be recorded in a controlled or uncontrolled manner.

Under controlled recording, the recording of the footage will be done in a well

28

prepared manner i.e., all the settings like background for the shooting are all fixed and the

speaker had practiced his/her lecture previously and is ready for the shooting.

Under uncontrolled recording, the recording of footage had to be done in real time

without any preparation while the speaker is presenting the presentation. In this situation,

there is no control over the background (color, light etc.) and the flow of the speech.

Starting frame

Succeeding Frames

Figure 4(a)

Architecture of the AMOS System.

Figure 4(b)

Region in Inside the Object (foreground regions) and those Outside the Object (background regions) are Both Tracked Overtime.

Foreground region

Background region

Video object

Region Segmentation

Region Tracking

Region Aggregation

Video Object

Motion Projection

Object Definition (User Input)

Homogeneous Region

29

Example would be the recording of the footage in a seminar, where the lecture material

will be given after the seminar. This kind of case is considered here because a situation

might arise that the same presentation material and the content need to be converted into

a presentation video for remote users due to different situations (like difficulty to find the

presentation for shooting the same thing again in a prepared manner, budget etc.).

If the video is recorded in uncontrolled manner and the background of the video is

removed, the AMOS software is used. It is an active object segmentation and tracking

system for general video.

The input video can be a PPM (Portable Pixel Map) sequence or MPEG motion

picture. The system generates a binary-mask in the PGM (Portable Gray Map) format for

the tracked object at each frame. To integrate two frames and to implement chroma-key

it needs to develop a small code in ANSI C. It takes the PPM files generated by the

preprocessing step and the PGM files generated by the active object segmentation step as

input. It treats a pixel in a PPM file as the background pixel, if the corresponding PGM

file contains the black pixel for the same position, and replaces the pixel with the chroma-

key color. Otherwise, the pixel in the PPM file is left unchanged.

2.2 Technology and Techniques Related to the Video and Audio

2.2.1 Streaming Media Technology. Real-time audio-video signals present

special needs for network transmission. Audio/video is time-critical and it needs Quality

of Service (QoS) transmission. The main problems in Internet real-time audio-video are

latency (network delay) and maintaining the bandwidth.

The current basic Internet architecture unfortunately allows absolutely no control

30

over either of these factors. On the contrary, over the Internet, latency may vary and it is

extremely difficult to estimate its value.20

Rajeev Sehgal,21 pointed that Microsoft’s and Real Networks’ technologies work

well only if there is bandwidth to spare in the user’s connection, such as on corporate

LAN’s which link a group of computers together within a building. But the technologies

typically fail over wide-area networks on the public Internet during peak usage periods.

Whatever the solution, the streaming industry badly needs to overcome the

buffering problem. Real Networks, Microsoft and Apple have made vast improvements

to their encoding and decoding technologies to deliver more content faster and in smaller

files. Another better solution to the buffering problem is to dynamically drop the client

bit rate and increase it again when network congestion eases up. Real Networks and

Windows Media (but not Quick Time) automatically detect the user’s Internet connection

speed and change the transmitted video quality to suit client’s connection.

2.2.2 Live Intranet Distance Learning System using MPEG-4 over

RTP/RTSP. A recent attempt uses MPEG-4 to realize distance education application.

MPEG-4 is a recent standard from ISO/IEC for the coding of natural and synthetic audio

visual data in the form of audio-visual objects that are arranged into an audio-visual scene

by means of a scene description. 22

The scenario involves the video and audio of a speaker’s room where the

overhead foils the speaker uses, are sent as separate data stream. The three streams are

encapsulated in MPEG-4 systems which add synchronization, among the streams, and to

a combined composition i.e. positioning and sizing into a single multimedia presentation.

This attempt uses Real-time Transport Protocol/Real-time Streaming Protocol

31

(RTP/RTSP) and Hyper Text Transfer Protocol (HTTP) as the transport mechanisms to

deliver the presentation material to the clients at the remote location.

2.2.3 Some Low Bit-rate Video Coding Technique

2.2.3.1 Model-based techniques for low bit-rate video technique23. Model based

video coding is a technique, which is suitable to achieve low bit-rate coding of a video

that contains the repeated/similar actions of a human body in some way and send only the

model information on the other side. Then, the human body image is reconstructed on

the other side using this information. It results in a fairly low bit-rate as the model data

and not the real image is sent over the network.

Various methods to create 2-D or 3-D models of human face and human body

have been studied. The general concept of the model-based image coding is shown in

figure 5. The scheme consists of three parts: the common knowledge, the encoder, and

the decoder. The encoder first extracts an initial fitting information for the wire-frame

Network Analysis data

Analysis

Synthesis

Image

Source

Model

Input image

Encoder

Decoder

Output image

Figure 5

Model-Based Coding System

32

model, which corresponds to the initial image, then estimates the global motion and

the local motion parameters. The decoder modifies the initial wire-frame to the specific

face model and synthesizes the output images using the global and local motion

parameters.

MPEG-4 standard supports very low bit-rate coding of virtual human animation,24

using model based approach, with bit-rate requirements as low as 1 kbps.

2.2.3.2 Low Bit-rate Speech Coding.25 Speech coding techniques can be

broadly divided into two classes: waveform coding and vocoders (voice coder)

technique. The waveform coders are able to produce high-quality speech at high bit-

rates; vocoders produce intelligible speech at much lower bit rates, but the level of speech

quality in terms of its naturalness and uniformity for different speakers is also much

lower.

For rates of 16 kbps and lower, high speech quality is achieved by using more

complex adaptive prediction, such as linear predictive coding (LPC) and pitch prediction

and by exploiting auditory marking and the underlying perceptual limitations of the ear.

Important examples of such coders are multi-pulse excitation, regular-pulse excitation,

and code-excitation linear prediction (CELP) coders. The CELP technique combines the

high quality potential of waveform coding with the compression efficiency of model-

based vocoders. At present, the CELP technique is the technology of choice for coding

speech at bit-rates of 16 kbps and lower.

2.2.4 Codec. The term Codec is short for coder-decoder or compression

decompressor. It is a software algorithm that transforms data from one format to

another.26 Different codecs use different algorithms to compress data. Each codec has its

33

advantages and disadvantages. In the streaming media process, codecs are used to reduce

the size of raw media files so they can be streamed across the Internet, and to convert the

files back into audio or video on the receiving end.

Video-audio codec are probably the single most important factor in determining

what makes a great video technology. Bandwidth on the web is still quite limited, and

trying to get high quality video to a consumer is like shoving an elephant.

There are several underlying technologies used by different video-audio for

windows codecs. Some commonly used codec claim to be more robust than the others in

streaming situations, because they were designed from the ground up as streaming codecs

rather than just data reduction schemes.

2.2.4.1 Quick Time. Although there are dozens of different codecs available in

Quick Time Sorenson Video is the Web champ. It’s the most flexible codec around,

providing competitive quality over data rates ranging from modems to CD-ROMs. It is

specially suitable for videoconferencing. Quick Time comes with the basic version of the

Sorenson encoder.

With the release of Quick Time version 4, Apple finally offered a true streaming

solution that included support for a number of codecs:

1. Video: H.261 and H.263; Radius cinepak; Sorenson Video; MPEG-4; Vp3

files. H.261 and H2.63 is a standard video conferencing codec.

2. Audio: Qdesign music codec; QalComm PureVocie Codec; MP3, IMA 4:1.

2.2.4.2 Real Audio Real Video. Real Networks only has one modern Video-

audio codec – the Real G2. Based on videoconferencing technology from Intel, it

provides high quality and a very fast encoder for high communication channels. This

34

video codec supports features such as edge artifact filtering and motion compensation.27

Real Video uses scalable video technology, which means that the performance will be

optimized for the speed and bandwidth of each computer.

The Real System natively supports a number of codes:

1. Video: Real Video 8, Real Video G2 and Real Video 9

2. Audio: Real Audio 8.0; Real AudioG2; ACELP.net voice codec; Real Audio

1.0, 2.0 and 3.0 legacy codecs.

By providing dramatically improved compression over previous generation

technologies, RealVideo 9 reduces bandwidth costs while enabling high-quality, rich

media experiences at any bit rate and on any device. Real Video 9 is improved by 30%

over RealVideo 8 and 50% improved over RealVideo G2 and same quality as MPEG-4.28

2.2.4.3 Window Media Technology. There is important video codec in Window

Media: MPEG-4. It’s a great codec, providing good quality and performance over a wide

range of data rates. It’s a fast compressor, but it doesn’t offer two pass or VBR (Variable

Bit Rate). The following are native encoding support:

1. Video: Window Media Video8, Window Media Video7, Microsoft MPEG-4.

2. Audio: Window Media Audio V7 and the ACELP.net voice codec.

2.3 A Quick Glance at Audio-Video Software and Hardware used in Developing Streaming Media 2.3.1 Capturing Video. The process of transferring the video content from a

camera or video tape to the computer is called digitizing or capturing. This process

involves playing back the video content while recording it into the computer by using

35

either a dedicated capture utility or via a video editing platform. To transfer the video

content from a camera or video tape to the computer, we need to run some software on

the PC which will read in the video data from the analogue capture card and place it in a

file on PC’s hard disk

There are many capture programs available for capturing video from an analogue

capture card and sound from a PC’s sound card.

2.3.1.1 IEEE 1394, Firewire iLink. This card records video in digital format on

tape and it has i.link (IEEE 1394). This type of output offers a high data transfer rate-up

to 400 Mbps which is necessary to transfer the large volume of data required for full-

motion video.29 The information transfer is digital and does not rely on a real time

conversion process; the transfer is lossless. We can automatically get full frame rate and

full screen size, and Firewire is more than fast enough to handle the data rate.

2.3.1.2 Osprey – 500. This professional streaming capture card provides

unparalleled quality through end-to-end digital encoding and advanced preprocessing

features. With these new features, the Osprey-500 family provides the best video quality

possible for streaming audio and videos.30

2.3.1.3 Pinnacle Video Capture Card. This card turns PC into a TV and a digital

video recorder for a new way of recording. It can convert in high quality MPEG1 or

MPEG2 format with compression. We need following minimum system requirement for

the installation.31

• Pentium II 450 or Celeron 600 or equivalent

• PC with 128 MB of RAM

• Direct X 8 or higher compatible graphics board and sound card

36

• One available PCI slot, PCI 2.1 compliant

• CD-ROM drive, mouse

• Windows98 (FE, SE)/ME/2000/XP

2.3.2 Editing and Encoding Tools. After capturing video clips, we can edit

them by using video editing software application. To edit the video-audio the computer

must be equipped with necessary hardware and software. When capturing the video for

streaming it is important to maintain high quality video. Video files will likely be very

large. Capturing, digitizing and editing video files for streaming is a much more

technical process than working with audio only.32

There are several software for capturing, digitizing and editing video-audio files.

Some popular software are as follows:

2.3.2.1 Final Cut Pro 4. It has sophisticated editing, compositing, effects and

audio tools that allow professional editor to need demanding post production deadlines

while maintaining their creativity.33 It is a powerful solution for creating high-quality

programming in a broad range of formats, frame rates and resolutions.

2.3.2.2 Vegas Video 3.0. It allows to capture, record, mix, edit composite, add

titles and effects, and manage media with control and higher quality. It sets a new

standard for professional multimedia production: apply high-quality transition, filters

and text animations; create sophisticated composites, key frame track motion and

pan/crop, all with unlimited tracks and unsurpassed flexibility. It also provides integrated

tool and high quality output options for streaming media technology.34

2.3.2.3 Adobe Premiere 6.5. Adobe premiere is a professional digital video

editing tool offering unmatched hardware support and real-time feedback. Adobe

37

Premier handles most demanding projects. It creates efficiently broadcast quality video

productions using extensive software real-time previewing features, including real time

titles, transitions and effects.

It works with the widest range of video hardware, like the latest digital video

decks and camcorders to third-party capture cards and real-time hardware. It supports

latest operating system, including Window XP and Mac OS X. It can deliver virtually any

medium. We can use the new Adobe MPEG encoder to create MPEG 2 files for DVD,

and Export MPEG 1 files and other leading formats for delivery to VCD, SVCD, CD-

ROM, streaming media for the web, and DV or analog tape.

Adobe premiere provides 15 keys (methods for creating transparency) that can

apply to a clip to create transparency in many different ways.35 We can use color-based

keys for superimposing, brightness keys for adding texture or special effects, alpha

channel keys for clips or images already containing an alpha channel, and matte keys for

adding traveling mattes or superimposes.

It supports the chroma-key and is used to select a color range of color in the clip

to be transparent. We can use this key when we have shot a scene against a screen that

contains a range of one color, such as shadowy blue screen. Moreover, this software

fulfill the requirements, of this study, namely chroma-key based transparency, reasonably

good audio-video synchronization and scalability. It creates efficient broadcast quality

video productions using extensive software real-time previewing features, including, real

time titles, transitions, and effects.36

2.3.2.4 MGI VideoWave 4.0. MGI VideoWave has been setting the standard for

years for PC video authors, helping novice and advanced users create superb video

38

presentations, complete with titles, special effects, transitions, audio mixing, video

mixing and overlay, and more. MGI VideoWave 4 has a bold new interface that is

designed to take advantage of powerful new features while remaining fast and intuitive.

MGI VideoWave 4 allows to use and combine both formats seamlessly, utilizing

existing equipment while being ready for the latest generation of hardware. Because the

data rate of raw video is quite high, the software used with the analog capture or IEEE

1394 card compresses the video as it is saved to disk to reduce the file size. Video is

captured to disk at a fixed compression ratio.37

Here is a partial list of the many features found in MGI VideoWave 4:

• Motion video, image, and audio capture from analog and DV sources

• Text, video mixing, transitions, special effects

• Real-time video preview of live video feed and capture

• Produce to AVI, MPEG-1, MPEG-2, DV, Real video, or WMV (ASF files)

• Smart DV for faster production to DV files

• Hardware accelerated production to Mpeg-2

• Output video directly to a connected VCR or DV device

2.3.2.5 VideoMach 3.0. VideoMach is a powerful audio/video builder and

converter. Use it to construct video clips from still images, enhance recorded material or

convert video, audio and image files between many supported formats. VideoMach is the

successor of Fast Movie Processor. With VideoMach we can construct AVI, MPEG,

FLIC and HAV clips, join or break apart media clips, including image sequences, extract

images from videos, extract audio tracks from movies, resize movies to widescreen (16:9)

or any other aspect ratio without deforming video content.38

39

2.3.2.6 Window Media Encoder 7.1. Microsoft Window Media Encoder has

enhanced the latest audio and video compression technologies like the Microsoft

windows media audio 8 codec and Microsoft windows media video 8 codec for real time

capture and streaming applications. These codes, or compressor/decompressors, deliver

incomparable audio and video quality at lower bit-rates than were possible with various

codec versions.39

Compressor/decompressors, or codecs, are the hardware or software used to

compress and decompress audio or video data. For example, windows media audio and

windows media video are software codecs used to decrease the bit rate of digital media

files so they can be delivered efficiently over a network. Window media encoder uses

their codecs to compress the data for streaming while window media player decompresses

Table 3

System Requirements for Encoding40

Encoding Task

Minimum Configuration

Recommended Configuration

Convert existing .wav, .avi, .mpg, and .mp3 files to windows media format

200 MHZ processor such as an Intel Pentium with MMX Microsoft Windows 98 second Edition 32 MB of RAM

500 MHZ processor or higher such as a Pentium III Windows 2000 128 MB RAM or more

Real-time capture and broadcast of audio and video files for dial-up modem and mid-bandwidth audiences using the windows media 7 codecs

Single stream and multiple bit rate content for 28.8 kbps and 56 kbps modems: 300 MHZ processor, such as a Pentium II or AMD Windows 98 Second Edition 32 MB of RAM Supported audio and Video Capture devices

Single stream and multiple bit-rate content for 100 kbps through 500 kbps 450 MHZ processor or higher, such as a Pentium III Windows 2000 250 MB RAM Supported audio and Video Capture devices

40

the data for playback. Table 3 shows minimum configurations for various encoding

scenarios.

The window media audio 8 and window media video 8 codecs offer excellent

compression quality and efficiency. The windows media audio 8 codec deliver a .wma

file of the same quality as an .mp3 file, but at nearly one-third the size. But Windows

Media Encoder 7.1 is still the best choice for encoding and streaming live content.41

While the quality of encoded video depends on the content being encoded, windows

media video 8 can deliver near VHS quality at bit rates ranging from 250 kilobits per second

to 450 kbps, and near DVD quality at 500 kbps to several megabits per second. Windows

media video 8 codecs is appropriate for both streaming and downloading digital media files.

2.3.2.7 Helix Producer Basic. Helix Producer from RealNetworks is the next

generation digital media production tool for broadcast streaming and download. It

provides robust, reliable, and fault-tolerant encoding to convert audio and video into

RealMedia format.42 Using RealMedia Events, Helix Producer can also be used to create

synchronized multimedia presentations for playback within the RealOne Player.

Helix Producer is one of the key elements of the RealNetworks system based on

the Helix platform, an integrated media-delivery system designed for rich media delivery

over the Internet and corporate intranets.

41

NOTES

1Bates, A.W., Technology, Open Learning, and Distance Education, Routledge 1995. 2O’ Donoghue, M. and et.al, Interactivity beyond Belief, Interfaces, Vol 8, Paris VIII

universite, 1995. 3Banks, S., and Mc Connell, D., On-line learning using broadcast materials: Case study

of the BBC On-line Learning Pilot Programme in Women’s Health, proceedings of the seoncd international networked learning coference pp 374-380, Lancaster University (2000).

4Sircar, J., Streaming Media Technology: Laying the Foundation for Education Change,

Syllabus, 14 (3), 2000 p. 56. 5ibid p. 57 6<http://www.zdnet.com> updated may 31, 2000. Accessed in July 2002. 7Dapeng Wu, and et al., Transporting Real-time Video Over the Internet: Challenges and

Approaches, Proceeding of the IEEE, Vol. 88, no. 12, Dec. 2000. 8Dapeng Wu and et al., Streaming video Over the Internet: Approaches and Directions,

IEEE Transaction on Circuits and Systems for Video Technology, Vol. 11. No. 1 February 2001.

9Dario Luparello and et al., “Streaming Media Traffic: An Empirical Study,” <www.bell-

labs.com/user/sanjoy/streaming-media-edgix.doc> (2000). Accessed in September, 2002.

10Radosevich, Lynda and Fitzoff, Emily, “Damming The Stream,”

<http:www.britannica.com/bcom/magazine/article/0,5744,212643,00.html> March 2, 1998. Accessed in August, 2002.

11Nielsen, Jakob, “Video and Streaming Media,”

<http:www.useit.com/alertbox/990808.html> August 8, 1999. Accessed in October, 2002.

12Larson, Don. (1996), “Does Multimedia Have a Dark Side?,”

<http://www.webdeveloper.com/multimedia/mutimedia_dark_side.html> (1996). Accessed in November, 2002.

13<http://service.real.com/help/library/guides/productiongiq/HTML/htmfiles/

realsys.htm #63065> Accessed in October 2002.

42

14 ibid. 15Chromakey techniques: Advanced <http://www.mvcc.net/comm/Tips/shtm> (2001).

Accessed in August, 2002. 16John D. Micheletti and Malachi J. Wurpts, “Applying Chroma-Keying Techniques in a

Virtual Environment,” <http://www.tss.swri.edu/pub/2000AEROSENSE_HMD.htm> (2000). Accessed in July, 2002.

17Di Zhong and Shih-Fu Chang, AMOS: AN ACTIVE SYSTEM FOR MPEG-4 VIDEO

OBJECT SEGMENTATION, 1998 International Conference on Image Processing, October 4-7, 1998, Chicago, Illinois, USA

18ibid. 19M.Kass, A.Witkin, D., Snakes: Active contour models, International Journal of

Computer Vision (1988), 321-331. 20Rahkila, Martti and Huopaniemi, Jyri, Real Time Internet Audio-Problems and

Solutions, AES 102nd International Convention, Munich, Germany, March 22-25, 1997.

21Sehgal, Rajeev. “Net Video’s Obstacle to a steady stream” <http://news.com.com/2100-

1023-900617.html> November, 2002. Accessed in November, 2002. 22P. Westerink, L. Amini, S. Veliah W. "A Live Intranet Distance Learning System Using

MPEG-4 over RTP/RTSP". IEEE, 601-604. <http://www.informatik.uni-trier.de/~ley/db/conf/icme2000html> (2000). Accessed in August, 2002.

23Interactive Model-Based Coding for Face Metaphor User: Interface in Network

Communciations <http://www.iuiconf.org/97pdf/1997-002-0036.pdf> (1997). Accessed in August, 2002.

24Very Low Bitrate Coding of Virtual Human Animation in MPEG-4.

<http://www.research.att.com/projects/tts/papers/2000_ICME/Coding pdf> (2000). Accessed in September, 2002.

25Speech Coding <http://cslu./cse.ogi.edu/HLT survey/ch10node4html> Accessed in

October, 2002. 26Mack, Steve, Streaming Media Bible, 2002 p 73. 27<http://www.cs.csustan.edu/~framirez/video.html> updated January 31, 2001. Accessed

in December, 2002.

43

28<http://www.realnetworks.com/solutions/leadership/realvideo.html> Accessed in

December, 2002. 29Capturing from Digital Sources

<http://www.microsoft.com/windowsxp/expertzone/columns/bridgman/02february18.asp> posted February 18, 2002. Accessed in December, 2002.

30<http://www.viewcast.com> 2000. Accessed date: February, 2003. 31<http://www.tigerdirect.com/applications/Category/category_slc.asp?Id=2806> 2002.

Accessed in January, 2003. 32Acquiring and Digitizing Media

<http://www.doit.wisc.edu/services/streaming/tutorial/transcripts/transcripts6.htm> 2001. Accessed in September, 2002.

33<http://www.apple.com/finalcutpro> 2003. Accessed in January, 2003. 34<http://www.sonicfoundary.com/PRODUCTS/minisites/vegas3/video-eit.htm>

Accessed in December, 2002. 35Adobe Premier: <http://www.adobe.com/prodcts/premier/overview.html> 4/24/2002.

Accessed in December, 2002. 36Adobe premiere 6.5 help file. 37<http://support.jp.dell.com/docs/video/dazzle/sw/en/Index.htm> Initial release on

November, 2000. Accessed in February, 2003. 38<http://www.videomach.com/VideoMach.html> Accessed in October, 2002.

39Tricia Gill, “An Introduction to Windows Media Encoder 7.1,” <http://msdn.microsoft.com/library/default.asp?url=/library/en-

us/dnwmt/html/encode71.asp> May 16, 2001. Accessed in September, 2002. 40Ibid. 41Tricia Gill, “An Introduction to Windows Media 8 Encoding Utility,” <http://msdn.microsoft.com/library/default.asp?url=/library/en-

us/dnwmt/html/wmencodutil.asp> March 16, 2001. Accessed in September, 2002. 42Helix Producer. <http://www.realnetworks.com/products/producer/> Accessed in

February, 2003.

CHAPTER 3

Solution Methods and Techniques

The method of research used in this study is experimentation. The experimental

method of research is defined by Good1 as “a method or procedure involving the control

or manipulation of conditions for the purpose of studying the relative effects of various

treatments applied to members of a sample, or of the same treatment applied to members

of different samples.” In this study, the sample included the video-audio media and the

presentation slide. The use of the different low bit-rate encoding technologies and

techniques as well as the chroma-key tool, in developing a streaming media served as the

independent variables. The basic purpose of this research is to be able to know which of

the low bit-rate encoding techniques is the best in producing a streaming media to deliver

in normal dial-up connection while maintaining a reasonable video and audio quality.

A client must have minimum hardware, software and network transmission

requirements needed to play the video and audio to client’s desktop. These variables may

exert influence on the aspects of delivery of video-audio on the network. The client

needs at least:

• 56 kbps connection internet line,

• A Pentium 166MHZ processor equivalent,

• Minimum 32 MB RAM

• A 16-bit Sound Card (any type)

45

• Speakers

• Color monitor capable of displaying color depth of 16 bit

• Video card capable of displaying 16 bit color depth

• Screen resolution of at least 640 x 480 pixels or higher

• Any streaming video-audio player which supports file format and codec version.

First, it is important to establish that low-bit rate encoding techniques could be

used in video-audio with presentation content in a single frame to produce a streaming

media within the 56 kbps bandwidth. The recommended streaming rates of 34 kbps for

the 56 kbps modem was adopted2. This was accomplished by using the chroma-key

technique to remove the background of the video and to superimpose the presentation

material in the video. The presentation material explained by the speaker was

synchronized with the speaker's audio and video that was based on the captured video.

Likewise output video file was encoded in streaming media file using low bit-rate.

Before following the procedure, one sample VCD video footage was tested. This step

was important because this determined the direction of the study, whether to continue

with the rest of the problems in chapter 1 or not. If the results would prove that the low-

bit rate encoding techniques could not produce a streaming media in the 56 kbps

bandwidth with a reasonable quality, thus the remaining issues or problems raised in this

study are considered irrelevant.

It was proven that the low-bit rate encoding techniques can produce a streaming

video-audio with presentation content in a single frame within the 56 kbps bandwidth.

The next step was to test a number of common low-bit rate codecs for audio and video to

46

determine which combination would produce the best quality. The following codecs and

streaming format were studied:

• Video Codec: MPEG-4 Video Codec; Real Media Codec, Windows Media

Video Codec

• Audio Codec: Qualcomm Pure Voice Codec; ACELP.net Codec, Real Audio

Codec; Window Media Audio Codec

• File Format: Apple Quick Time (.mov); Windows Media (Audio/Video - .wma

& .wmv); Real Media (Audio/Video - .ra & .rm)

• Encoding Softwares: Window Media Encoder 7.1; Vegas Video 3.0; Helix

Producer Basic and Quick Time Pro 6.0

To measure the quality of the streaming media, the basic criteria for audio and

video proposed by the International Telecommunications Union (ITU) was used.

The ITU-T (International Telecommunication Union – Telecommunication

standardization sector)3 and ITU-R (International Telecommunication Union – Radio

communication sector)4 recommendations addressed the speech transmission over

telephone networks and image quality over television systems, respectively. A series of

ITU-T recommendations also addressed the subjective assessment of multimedia

applications. The recommended scales are briefly presented below.

Speech Quality Scales

Opinion Scales Recommended by the ITU-T. For the assessment of speech

quality, the recommended rating scale for both listening-only and conversation tests is a

5-point category scale commonly known as the quality scale. Listening-only tests can

47

also be assessed via the listening effort scale. These scales are shown in Table 4(a-c).

Table 4 (a)

Listening-quality Scale

Quality of the speech / connection Score

Excellent 5

Good 4

Fair 3

Poor 2

Bad 1

Table 4 (b)

Listening-effort Scale

Effort required to understand the meaning of sentences Score

Complete relaxation possible; no effort required 5

Attention necessary; no appreciable effort required 4

Moderate effort required 3

Considerable effort required 2

No meaning understood with any feasible effort 1

Table 4 (c)

Loudness-preference Scale

48

Loudness-preference Score

Much louder than preferred 5

Louder than preferred 4

Preferred 3

Quieter than preferred 2

Much quieter than preferred 1

Table 5

Conversation Difficulty Scale

Did you or your partner have any difficulty in talking or hearing over the connection?

Yes 1

No 0

Difficulty Scale. This is a binary response obtained from each subject at the end

of each conversation. The scale is shown in table 5.

Image Quality Scales

For the assessment of image quality, single stimulus methods are rated using the

quality scale or impairment scale, and comparisons to reference conditions are made

using the double-stimulus continuous quality scale (DSCQS) or the double stimulus

impairment scale. The DSCQS method is cyclic, in a sense that the assessor is asked to

view a pair of pictures, each from the same source, but one via the process under

examination, and the other one directly from the source. These scales are shown in

Tables 6 (a-c).

Table 6 (a)

49

Image Quality Scale

Image quality Score

Excellent 5

Good 4

Fair 3

Poor 2

Bad 1

Table 6 (b)

Image Impairment Scale

Image impairment Score

Imperceptible 5

Perceptible, but not annoying 4

Slightly annoying 3

Annoying 2

Very annoying 1

Table 6 (c)

Double Stimulus Continuous Quality Scale

A B

Excellent

Good

Fair

Poor

Bad

50

Non-categorical Judgement Methods

In non-categorical judgement, an observer assigns a value to each image or image

sequence shown.

In continuous scaling, a variant of the categorical method, the assessor assigns

each image or image sequence to a point on a line drawn between two semantic labels.

The scale includes additional labels at intermediate points for reference. The distance

from an end of the scale was taken as the index for each condition.

In numerical scaling, the assessor assigns each image or image sequence a

number that reflected its judged level on a specified dimension (e.g. image sharpness).

The range of the numbers used could be restricted. Sometimes the number assigns

describes the judged level in "absolute" terms (without direct reference to the level of any

other image or image sequence as in some forms of magnitude estimation. In other cases,

the number describes the judged level relative to that of a previously seen "standard" (e.g.

magnitude estimation, fractionation, and ratio estimation).

Both forms resulted in a distribution of numbers for each condition. The method

of analysis depends upon the type of judgement and the information required.

For this study, listening-effort scale was used in measuring speech quality and the

image impairment scale was used in measuring image quality because these scales are

more descriptive. Double stimulus continuous quality scale was not used due to

unavailability of a system to test the quality of the video. Conversation difficulty scale

was not used in this study because it depended on binary response obtained from each

subject at the end of each conversation and this research depended on unitary response.

51

The researcher was not able to use any testing that requires the use of equipment

due to lack of resources. Test would then be conducted in a normal environment. The

modified scaling is described in Tables 7 and 8.

Evaluations of the qualities of video and audio were classified within the same

scale but there were also cases of slight differences between quality of video and text

Table 7

Listening-effort Scale

Quality

Effort required to understand the meaning of sentences

Score

Excellent Complete relaxation possible; no effort required 5

Good Attention necessary; no appreciable effort required 4

Fair Moderate effort required 3

Poor Considerable effort required 2

Bad No meaning understood with any feasible effort 1

(e.g. jerky moving video, blurred text and video). To emphasize these differences, non-

categorical judgement method was used. Based on non-categorical method, measuring

quality scale was modified for video and audio by using plus notations where a "++" is

greater than a "+" which is greater than a no plus. The number was split on one third

category and the value rating of 0.67 was used for "++" and 0.33 for "+".

Table 8

Image Impairment Scale

52

Quality Image impairment Score

Excellent Imperceptible 5

Good Perceptible, but not annoying 4

Fair Slightly annoying 3

Poor Annoying 2

Bad Very annoying 1

Bandwidth Rate Scales

The bandwidth rate measurement was based from the recommended streaming

rates which should not be more than 34 Kb of data per second. Therefore, the 34 kbps

rate was used as the maximum streaming rate of the media in a 56 kbps modem. The

bandwidths were ranked from highest to lowest with 1 being the highest bandwidth.

The factors that were measured are: Video, presentation content, audio, and

bandwidth rate. The procedure to produce the streaming media addressed the third

problem which is developing a presentation using the streaming media technology.

53

NOTES

1Jose F. Calderon and Expectacion C. Gonzales, Methods of Research and Thesis Writing, National Book Store. 1993. p. 83.

2<http://service.real.com/help/library/guides/productiongiq/HTML/htmfiles/ realsys.htm #63065>. Accessed in October 2002.

3International Telecommunication Union, ITU-T (Telecommunication Standardization

Sector of ITU: Methods for Subjective Determination of Transmission Quality. <http://www.doc.ua.pt/arch/itu/rec/product/p.htm> June, 2002. Accessed in June, 2003.

4 International Telecommunication Union, ITU-R (Radio Communication Sector of ITU):

Methodology for the Subjective Assessement of the Quality of Television Pictures. <http://www.itu.int/itudoc/itu-r/archives/rsg/1996-97/rsg11/34813.html> 2000. Accessed in June, 2003.

CHAPTER 4

Presentation of Findings

In the context of multimedia presentation and viewing in the Internet, remote

users expects rich viewing environment and pleasing presentation. Normally, the video-

audio and presentation material are available in the form of a digital movie. In the movie,

a fixed portion of each video frame was occupied by the presentation slide with the

speaker’s body movements occupying the small region and frequently overlapping the

slide. When the video-audio and presentation material were delivered in the streaming

media format, the client starts showing it immediately, before downloading it completely.

Therefore, the issue of playback startup time was resolved by the use of streaming media

technology. The low bit-rate encoding technique was a natural choice in order to deliver

video-audio and presentation material effectively over the normal dial up connection line.

The current implementation gives the partial solution towards delivering the

presentation materials like business presentation or lecture videos for business and

educational purposes. Here, the researcher followed a scheme for the content

development so that the presentation slides and audio-video content do not exceed the 34

kbps bit-rate.

The goal is to develop the video-audio with presentation slide in a single frame

maintaining the quality of the slide and to be delivered effectively in the streaming media

format, which is supported by the player.

55

It needed to be shown that the audio-video with presentation material is in a single

frame using low bit-rate. For this, there are two semantically different entities in the

digital movie of the presentation i.e., the slides and the speaker who is explaining the

slides. In this approach, these two entities were treated separately and later integrated

together. The developed presentation material with low bit-rate coded audio-video of the

speaker was made available through a web server.

4.1 Effective Delivery of Video and Audio Before the content development procedure, a 5-minute video footage was treated

to solve the problem on effective delivery of video and audio on normal dial-up

connection stated in section 1.2. In the first step, the existing video file’s resolution,

video-audio quality, frame rate, and data transfer rate were checked. The normal video

file found in VCDs have the following specifications:

Format encoded: MPEG1

Dimension: 352 x 240 pixel

Pixel per cm (Monitor resolution): 22

Pixel Depth/Colors: 24/16 million

Frame Rate: 30 fps (NTSC)

This format was clearly not for streaming, since it requires at least 600 to 650 MB

disk space for about 45 minutes content and requires high bandwidth. Thus it was not in

a streaming format (i.e., whole content should be downloaded before it could be played back).

The sample footage (about 5 minutes of content) was converted into a streaming format

using low-bit rate encoder for different bandwidth requirements to check if it was still of

56

Table 9

Comparative Data Between Original Non-streaming Media and Converted Streaming Media Formats.

Before Encoding 5minutes video clip

After Encoding (56 kbps bit-rate)




Resolution 352 x 240

pixels 320 x 240

pixels 320 x 240

pixels 320 x 240

pixels

320 x 240 pixels

Video quality 5 4 3++ 3+ 3

Audio quality 5 4 3++ 3+ 3

Video Codec - MPEG-4 MPEG-4 MPEG-4 MPEG-4

Audio Codec - ACELP.net ACELP.net ACELP.net ACELP.net

Frame rate 30fps 15fps 15fps 15fps 15fps

Video data rate

1152 kbps 46 kbps 26 kbps 25 kbps 24 kbps

Audio data rate

224 kbps 64 kbps 43 kbps 42 kbps 41 kbps

Average data rate of Audio and Video

- 55.5 kbps 34.9 kbps 33.7 kbps 32.8 kbps

File format MPEG Window Media Video

Window Media Video

Window Media Video

Window Media Video

File size 49.9 MB 2.20 MB 1.58 MB 1.54 MB 1.49 MB

Legend:

Quality of Audio Quality of moving Video and Text

5 = Complete relaxation possible; no effort required =Excellent 5 = Imperceptible =Excellent

4 = Attention necessary; no appreciable effort required = Good 4 = Perceptible, but not annoying = Good

3 = Moderate effort required = Fair 3 = Slightly annoying = Fair

2 = Considerable effort required = Poor 2 = Annoying = Poor

1 = No meaning understood with any feasible effort = Bad 1 = Very annoying = Bad

"++" = Slightly better quality than scaling (0.67), "+" = Slightly good quality than scaling (0.33)

57

the same quality and to check if a reasonable quality could be maintained after it was

compressed to a streaming media format. To measure the quality of audio and video, the

researcher used the measurement scales suggested by the ITU-T (International

Telecommunication Union – Telecommunication standardization sector)1 and ITU-R

(International Telecommunication Union – Radio communication sector)2. The

comparative result after encoding in streaming media format with different parameters is

shown in Table 9.

Window Media Encoder 7.1 was used to convert the MPEG format to Window

Media Video (.wmv) format. This software is free on the web. We can use Vegas video

but the unregistered version does not support MPEG video file for encoding.

Initially, the highest bandwidth data rate for 56 kbps modem which is 56 kbps

was used. The video file size was drastically reduced from 49.9 MB to 2.20 MB

retaining perceptible video quality but not annoying. It was 'necessary to pay attention

but no appreciable effort required' to hear audio. The video data rate was 46 kbps and the audio

data rate was 64 kbps. Although the audio data rate exceeded 56 kbps, average bandwidth data

rate of audio and video was 55.5 kbps which was achieved by the software. Still this was not

acceptable if we were to compare it to the recommended 34 kbps data rate for streaming media

in a 56 kbps modem. By reducing the bit-rate encoding the data rate was also reduced.

To test whether a lower bit-rate will produce acceptable results, the researcher re-

encoded the original video file in a 35 kbps bit rate. The 35 kbps data rate was chosen

based from the results in the 56 kbps data rate which produced a slightly lower data rate

average for audio and video which is 55.5 kbps. The average video and audio data rates

in a 35 kbps encoding might result to 34 kbps or lower. After re-encoding, the file size of

58

video was 1.58 MB which is smaller than the original video file (49.9MB). Bandwidth

data rate of the encoded video file was 26 kbps and the audio data rate was 43 kbps. The

average data rate of audio and video was 34.9 kbps which was nearly 35 kbps. The

quality of video and audio rate was 'slightly better' quality than the measuring scale of 3.

Again, this data rate exceeded the recommended data rate therefore the bit rate of 35 was

further reduced to 34 kbps. In this bit rate the video data rate was 25 kbps and audio data

rate of 42 kbps. The average data rate of audio and video was 33.7 kbps which is within

34 kbps. The quality of video was 'slightly good' than scale of 3 and the audio was

'slightly good' than 'moderate effort required'.

To test if the file still maintains similar quality of video and audio or if it

decreases the quality, the file was encoded in a lower bit rate which was 33 kbps. In this

bit-rate the video and audio data rate were 24 kbps and 41 kbps respectively and the

average data rate was 32.8 kbps. However, the quality of video and audio was lower than

media encoded in the 34 kbps bit rate. If we reduce the audio data rate, it increases the

quality of video and vice-versa. By reducing the data rate of audio and video it also

decrease the quality of audio and video.

The resulting video and audio quality evaluations of the media encoded in 35, 34,

and 33 kbps showed that they were within the same level of 3. Although the evaluation

of the qualities of video and audio were classified under the same level, there were still

slight differences in quality. Applying the modified scaling would show that the quality

of video and audio in media encoded in a 34 kbps bit rate was preferred over the quality

when it was encoded in a 33 kbps bit rate.

These results showed that it was possible to produce a streaming media using a

59

Table 10

Summary of Converted Streaming Media Formats





Acceptability

Not acceptable

(Exceed data rate)

Not acceptable

(Exceed data rate)

Acceptable

Not acceptable due to

quality loss

Average data rate of Audio

and Video

55.5 kbps 34.9 kbps 33.7 kbps 32.8 kbps

File format Window Media

Video

Window Media

Video

Window Media

Video

Window Media

Video

File size

2.20 MB

1.58 MB

1.54 MB

1.49 MB

low bit-rate encoding technique delivering audio-video effectively in normal

dial-up connection modem users and that the ideal encoding bit rate was at 34 kbps.

Table 10 shows that the 34 kbps bit rate was acceptable to develop a streaming media for

56 kbps modem users.

With the results, a 5-minute video presentation was developed using chroma-key

and that file was encoded in a low bit rate encoder within the 34 kbps bit rate.

4.1.1 Packaging Video-Audio and Presentation material. In combining the

video and presentation materials in a single frame, the chroma-key technique was used.

The chroma-key technique separated the objects in the foreground from a particular

background whose colors were usually blue or green. There were other possibilities and

techniques to combine the presentation and the video in the same frame. The idea

behind combining was to remove the background of the speaker’s video. Techniques

used to remove the background will greatly vary depending on the type of video taken

60

(like light settings, background etc.). Since the video for this study was taken in plain

background, the chroma-key technique was best suited for removing the background and

mixing the content with the presentation slide. In the absence of plain background, other

techniques like video segmentation could be used for the same purpose, which could be

taken as an alternative method. But this procedure was not used because it is a lengthy

and time consuming process. The video presentation was recorded in a plain

background.

In the video, a certain portion of each video frame was occupied by the

presentation slide with the speaker’s body movements occupying the small region and

frequently overlapping the slide. Due to this, some parts of the presentation slide could

not be read. The positioning of the video was then set on the lower right corner. It was

assumed that at the time of preparation of the presentation, the video on the lower right

corner was left blank so the speaker’s movement would not cover the presentation

content. The Adobe Premier 6.5 had been used to set the video positioning as well as

mixing the audio and presentation slide because in using this software there was no need

for other intermediate software for applying the chroma-key and synchronizing the

presentation material audio and video. The sequence of presentation slide and audio was

arranged by timing information when recording the video. The detailed procedure of

content-development would be discussed in section 4.3.

4.2 Technologies and Techniques of Low bit-rate Video-Audio Encoding

To deliver audio/video through the Internet, the streaming media is becoming the

de-facto standard. Streaming video quality is dependent in part upon the process of how

61

it was encoded for transmission and the amount of bandwidth required for it to be viewed

properly. It is possible, however, to prepare alternate video clips, which are of higher

video and audio quality, and which are specially meant for transmission to visitors who

are connected to the website at higher speeds. These audio and video formats use many

different types of codec for different bandwidths and the quality requirements.

The low bit rate encoder was used to effectively deliver video-audio and

presentation slide in the normal dial-up connection. The results for the different codecs

and software are shown in table 11, while the quality assessment is shown in table 12.

This study used four kinds of encoding software which are commonly used in

streaming media production. These were Window Media Encoder 7.1; Vegas Video 3.0;

Helix Producer Basic and Quick Time Pro 6.0. The codecs that were studied were the

following: MPEG-4; Real Video 8 and 9; Window Media Video and Audio 8;

ACELP.net and Qualcomm Pure Voice. Criteria for measurement included, streaming

data rate and comparisons of qualities in audio, text, and movement of the video. The

impairment scale was used to measure the quality of video and text while for audio, the

listening effort scale was used which is described in chapter 3. The frame was fixed to

320 x 240 pixel size because one objective of this study was to integrate the video-audio

with presentation material and to show the content clearly. If we reduced the frame size,

the presentation content could not be read. Frame rate was expressed as frames per

second (fps). Typically, 15 fps is the standard for streaming video. This rate allows for

smooth playback over the internet for 56 kbps modem users. Setting the frame rate

below 15 fps might make video playback appear slow and sluggish.3

After encoding the video-audio file in Vegas Video for 56 kbps modem users,

62

using ISO MPEG – 4 and ACELP.net 8.5 kbps mono for video and audio codecs under

the Window Media Video file format, the researcher achieved the 32 kbps streaming data

rate. This gives a 'no appreciable effort required' to understand the meaning of sentences;

'slightly annoying' image in the quality of text and moving video.

The use of Window Media Video 8 and Window Media Audio 8 as video and

audio codecs respectively under the same file format and encoding software resulted a

point higher bandwidth data rate than Vegas Video using ISO MPEG-4 and ACELP.net

codecs which was 33 kbps streaming data rate and having 1.42 MB file size, that gave

'slightly better' than 'moderate effort required' to understand meaning of sentences and

text was 'perceptible, but not annoying'. Moreover, the quality of moving video was

'slightly annoying'. Compared to Window Media Encoder having same codecs, the

Vegas Video software had reduced the file size as well as data rate when using ISO

MPEG-4 and ACELP.net codecs but the quality of video was less than that of Window

Media Encoder. For Real Video 8 and Real Audio 8 video and audio codecs respectively

under the file format Real Media the file size was 1.33 MB resulting to a streaming data

rate of 34 kbps and giving a 'no appreciable effort required' to understand the meaning of

sentences, while 'slightly annoying' in quality of text and moving video.

Under the file format Quick Time Movie using ISO MPEG – 4 and Qualcomm

Pure Voice as video and audio codecs, the file size was 2.57 MB. This resulted to 64

kbps streaming data rate and was higher than the 34 kbps recommended streaming data

rate. Quick time measures streaming data rate in bytes per second. RealNetworks and

Window Media, however, measure in bits per second. The quick time works as eight-

bits-equal-one bytes formula.4 It gives a 'slightly good' quality than 'moderate effort

63

required' to understand the meaning of sentences spoken, the quality of text was 'slightly

annoying' but 'slightly good' than a scale of 3, and for the moving video it was 'slightly

better' quality than a scale of 2.

Second, the Window Media Encoder 7.1 had been used to encode the file in

Windows Media Video (.wmv) format resulting to a streaming data rate of 34 kbps using

ISO MPEG – 4 and ACELP.net 8.5 kbps mono as video and audio codecs, the file size

was 1.48 MB. When using the codecs and file format a 'no appreciable effort required to

understand the meaning of sentences spoken' was observed but 'slightly good' than a scale

of 4; in the quality of text it was 'perceptible but not annoying' but 'slightly better' quality

than perceptible, while in the quality of moving video it was 'slightly annoying' but

'slightly better' quality than a scale of 3. Consequently, when Windows Media Video 8

and Windows Media Audio 8 were used as video and audio codecs under the file format

Windows Media Video the file size was 1.40 MB resulting to a streaming data rate of 34

kbps and gave a 'slightly good' quality than 'moderate effort required' to understand the

meaning of sentences in the quality of audio; the quality of text was 'perceptible, but not

annoying' and a 'slightly good' than 'annoying' moving video.

Third, Helix Producer Basic was used as encoding software. Real Video 9 and

Real Audio 8, 8.5 kbps voice were used as video and audio codecs under this software,

the file format was Real Media. The resulting file size was 2.49 MB and a streaming data

rate of 34 kbps. A 'no appreciable effort required' but 'slightly good' quality than a scale

of 4 to understand the meaning of sentences spoken was observed; the quality of text was

'slightly better' than 'perceptible but not annoying'; and the moving video was 'slightly

annoying' but 'slightly good' than a scale of 3.

64

Lastly, the Quick Time Pro 6.0 encoding Software that uses MPEG – 4 and

Qualcomm Pure Voice 8khz mono as video and audio codecs were used. Having the file

format Quick Time Movie under these codecs, the resulting file size was 2.11 MB and

the streaming data rate was 52.8 kbps. This was beyond the recommended streaming

data rate for a 56 kbps modem which was 34 kbps. The quality of audio was acquired

with 'moderate effort required' but 'slightly better' in quality than a scale of 3 to

understand the meaning of sentences, while for the quality of text it was 'slightly

annoying' but 'slightly good' than a scale of 3 and moving video was 'annoying' but

'slightly better' in quality than a scale of 2.

Among these combinations of software having ISO MPEG-4 and ACELP.net

video-audio codecs and .wmv file format, the Window Media Encoder was better because

it acquired a streaming data rate that was appropriate for 56 kbps modems users and gave

a good quality. However, Vegas Video was better with the same codecs (ISO MPEG-4

and ACELP.net) and file format (.wmv) in terms of bandwidth rate having 2 kbps lower

than Window Media Encoder. The Vegas Video was 1 kbps lower than Window Media

Encoder using same codecs (Window Media Video 8 and Window Media Audio 8) and

also the quality of moving video was higher than Window Media Encoder.

The Vegas Video and Helix Producer had the same file format (.rm) and audio

codecs. The video codecs were Real Video 8 and Real Video 9 respectively. The quality

of text, audio and moving video of Helix Producer was better than Vegas Video but

bandwidth rates were the same. The Vegas Video and Quick Time have the same video

and audio codecs, the bandwidth data rate in Quick Time was 11.2 kbps lower than

Vegas Video. The quality of moving video and text were the same but Quick Time's

65

quality of audio was 'slightly better' than Vegas Video. Quick Time movies are

commonly used for presentations played on portable computers, whether in presentation

software like Persuasion, or as stand-alone full-screen movies.5 These discussions and

comparisons are shown in tables 11 and 12.

Table 11

Test Results for 56 Kbps Modem Users

Encoding Software

Video Codec

Audio Codec File Format File Size Time Frame Rate per second

Image size in pixel

Bandwidth data rate

ISO MPEG-4

ACELP.net 8.5 kbits/s mono

Window Media Video 1.35 MB 5min 15 fps 320 x 240 32 kbps

Vegas Video 3.0

Window Media Video 8

Window Media Audio 8 Window Media

Video 1.42 MB 5min 15 fps 320 x 240 33 kpbs

Real Video 8

Real audio 8 Real Media 1.33 MB 5min 15 fps 320 x 240 34 kbps

ISO MPEG-4

Qualcomm Pure Voice

Quick Time Movie 2.57 MB 5min 15 fps 320 x 240 64 kbps

8 KBps* Window Media Encoder 7.1

ISO MPEG-4

ACELP.net 8.5 kbits/s mono Window Media

Video 1.48 MB 5min 15 fps 320 x 240 34 kbps


Window Media Audio 8 Window Media

Video 1.40 MB 5min 15 fps 320 x 240 34 kpbs

Helix Producer Basic

Real Video 9

Real Audio 8, 8.5 kbps voice Real Media 2.49 MB 5min 15 fps 320 x 240 34 kbps

Quick Time Pro 6.0

ISO MPEG-4

Qualcom Purevoice 8 khz

mono Quick Time

Movie 2.11 MB 5min 15 fps 320 x 240 52.8 kbps 6.6 KBps*

* Quick time measures streaming data rate eight-bits-equal-one-byte formula but RealNetworks and Window Media measure in bits per second.6 66

Table 12

Quality Assessment of Video-Audio and Bandwidth

Encoding Software

Video Codec Audio Codec File Format Quality of Audio

Quality of Text

Quality of moving Video

Bandwidth Data Rate

Total Score

ISO MPEG-4 ACELP.net 8.5 kbits/s mono

Window Media Video 4 3 3 5 15

Vegas Video 3.0


Window Media Audio 8

Window Media Video 3++ 4 3 4 14.67

Real Video 8

Real audio 8 Real Media 4 3 3 3 13

ISO MPEG-4 Qualcomm Pure Voice

Quick Time Movie 3+ 3+ 2++ 1 10.33

Window Media Encoder 7.1

ISO MPEG-4 ACELP.net 8.5 kbits/s mono

Window Media Video 4+ 4++ 3++ 3 15.67



Window Media Video 3+ 4 2+ 3 12.67


Real Video 9 Real Audio 8, 8.5 kbps voice

Real Media 4+ 4++ 3+ 3 15.33

Quick Time Pro 6.0

ISO MPEG-4 Qualcom Purevoice 8 khz mono

Quick Time Movie 3++ 3+ 2++ 2 11.67

Legend:

Quality of Audio Quality of moving Video and Text 5 = Complete relaxation possible; no effort required =Excellent 5 = Imperceptible =Excellent 4 = Attention necessary; no appreciable effort required = Good 4 = Perceptible, but not annoying = Good 3 = Moderate effort required = Fair 3 = Slightly annoying = Fair 2 = Considerable effort required = Poor 2 = Annoying = Poor 1 = No meaning understood with any feasible effort = Bad 1 = Very annoying = Bad

"++" = Slightly better quality than scaling (0.67), "+" = Slightly good quality than scaling (0.33) 67

68

In conclusion, the Window Media Encoder was the best software for producing

streaming media in .wmv format using MPEG-4 and ACELP.net video-audio codecs

respectively based on total score. Helix Producer Basic was the best for .rm file format

having Real Video 9 and Real Audio 8 codecs followed by Vegas Video with Real Video

8 and Real Audio 8 codecs. The encoding software with their codecs were ranked as

shown in table 13 based on assessment of table 12.

Table 13

Summary Results of the Different Codecs and Software

Rank Encoding Software Video Codec Audio Codec File Format

1 Window Media Encoder 7.1

ISO MPEG-4


Window Media

Video

2


Real Video 9

Real Audio 8, 8.5 kbps voice

Real Media

3

Vegas Video 3.0

ISO MPEG-4


Window Media

Video

4

Vegas Video 3.0



Window Media

Video

5 Vegas Video 3.0

Real Video 8

Real audio 8

Real Media

6




Window Media

Video

7

Quick Time Pro 6.0

ISO MPEG-4

Qualcom Purevoice 8 khz mono

Quick Time

Movie

8

Vegas Video 3.0

ISO MPEG-4

Qualcomm Pure Voice

Quick Time

Movie

69

4.3 Streaming Media Presentation Development

4.3.1 Hardware and Software used in the Development Procedure.

Throughout the study, the procedure to develop the streaming media, using low bit-rate

encoding techniques was recorded. The hardware and software that was used in the study

were the following:

Intel Celeron 600 MHZ processor

Microsoft Windows XP 128MB of RAM Video display adapter NVIDIA GeForce2 MX/MX 400 Pinnacle Video capture card

Panasonic Video Camera (CCD-TRV 21E Video 8 Handycam)

VideoMach 3.0

Vegas Video 3.0

Adobe Premiere 6.5



Quick Time Pro 6.0

4.3.2 Development procedure for the Streaming Media Presentation. The

procedure followed to develop the low bit-rate coded audio-video of the speaker, suggested

the way to integrate the existing software tools. The block diagram representing the scheme

for the usage of various tools in the content development procedure is shown in figure 6.

Various processes designated as A, B, C, D and E in the figure are discussed below:

70

Process

Input, output

Final output

B

Sample Video with plain background

Video with speaker's position on the lower right corner

C

D

Synchronized video-audio and presentation material

A

E

Audio Video

Video-audio and presentation content in single frame

Figure 6

The Flow Chart of Streaming Media Presentation Development Procedure

Legend: A: Recording, Capturing and Digitizing

the Video B: Splitting the Video and Audio

Extraction C: Video Positioning Required Before

Superimposing Presentation Slide D: Compositing the slide and the Video

of the Speaker E: Low bit-rate Audio-Video Encoding

71

The presentation contents are developed in the following steps:

4.3.2.1 Recording and Capturing the Video (A). A 5 minute sample video was

recorded in the plain blue background shown in figure 7. The blue background was

chosen for the study because it’s the color most opposite to skin colors, providing the

least distortion, and most people do not wear obnoxious blue or green clothing. It is also

easy for keying which means electronically cutting out portions of a picture and filling

them in with another image as already described in section 2.5. Recorded video was

digitized by using Pinnacle capture card in MPEG-1 format. The file size was 75.5 MB

with a bit-rate of 2380 kbps, frame rate of 25 fps, and frame size of 320 x 240 pixels.

Figure 7

Sample Video with Plain Background

72

4.3.2.2 Splitting the Video and Audio Extraction (B). This step was essential to

throw away unnecessary redundant, audio-visual information from the lecture movie.

This defined the scope for further development steps.

The Video Mach software was used for this step. It was used to construct video

clips from still images, and to enhance recorded material or convert video, audio and

image files between many supported formats.7 It took the digital movie of captured

presentation in MPEG-1 format as input, and generated the audio and the video of the

presentation in two separate files. The audio containing the audio of the speaker was

generated in a .wav file. The video contained only the significant rectangular portion of

the visuals of the presentation.

The video output was generated in the form of the video file saved as a separate

AVI file format and audio saved as a .wav file which was shown in the flow chart.

4.3.2.3 Video Positioning Required Before Superimposing the Presentation Slide (C).

This step was essential for positioning the video. In the movie, a fixed portion of each

video frame was occupied by the presentation slide with the speaker’s body movements

occupying the small region and frequently overlapping the slide. Due to this, clients

could not read some parts of the presentation slide. The video position was set on the

lower right corner. At the time of preparation of the presentation material, blank spaces

were left for the video on the lower right corner. Adobe Premier 6.5 was used to set the

video positioning. The zoom value was fixed at 75% from the start to the end in the

time line, because it would not cover the presentation slide explained by the speaker

which is shown in figure 8. The value of zoom could be changed as needed.

73

Figure 8

Positioning the Video in Adobe Premiere 6.5.

Figure 9

Output of Repositioning the Video

74

If there were some differences in color value, the color similarity value can be

increased. The box of real-time preview option could be checked to test the movie. The

timeline was exported and saved in AVI format. The outcome of repositioning the video

is shown in Figure 9.

4.3.2.4 Compositing the Slides and the Video of the speaker (D). Compositing, also

known as superimposing is the process of combining two or more images to yield a

resulting, and enriched image. Compositing could be made with still or moving images.

Compositing or superimposing simply means playing one clip on top of another. Adobe

Premier 6.5 is used for this step. It supports the chroma-key and is used to select a color

range of color in the clip to be transparent.

The term matting and keying, in video and film production, refers to specific

compositing techniques. Keying uses different types of transparency keys to find pixels

in an image that match a specified color or brightness, and make those pixels transparent

or semitransparent. For example, if we had a clip of a weather man standing in front of a

blue-screen background, we could key out the blue using a blue-screen key, and replace it

with a weather map. Matting uses a mask or matter to apply transparency or semi-

transparency to specified areas of an image. By using keying or matting to apply

transparency, the portions of lower images are revealed. Adobe Premier 6.5 supported

these features and this keying concept was used to develop a mix presentation slide and

video of the speaker.

In order to composite video clip, audio and presentation slide, they were imported.

Before importing the PowerPoint presentation, all the slides were saved on a bitmap image

sequence because Adobe Premiere does not support the PowerPoint slide directly.

75

Therefore, it needed to be converted to a bitmap image. After importing the video

audio and presentation slide, the media was placed on a timeline. The video was placed

on track 2 and the presentation slide on track 1 as shown on figure 10.

The Setting for the video composition were as follows:

Compressor: None

Color Depth: millions

Frame Size: 320 x 240 pixels

Frame Rate: 15fps

Figure 10

Superimposing PowerPoint Slide on the Speaker's Video

76

Pixel Aspect Ratio: Square Pixel Ratio (1:0)

Editing Mode: Video for Windows

Time base: 30

Time Display: 30 fps Non Drop Frame Time code

To key-out the video background and replace it with a presentation slide, a

transparency setting was used and the background color (blue) was picked by color

picker and the key type chroma was selected. Output was shown in a sample frame

which is shown in figure 10. The time sequence of audio and presentation slide had

already been recorded at the time of video recording. The video-audio and presentation

slides were placed in a timeline as sequence of timing information, so that the slide would

change when speaker content changed. The outcome of superimposing PowerPoint slide

is shown in figure 11.

Figure 11

Superimpose of the PowerPoint Slide in the Back of the Video.

77

4.3.2.4.1 Problems encountered when presentation slide and video file were

treated separately. When the two entities (the presentation slide and the speaker who was

explaining the slides) were treated separately, it was not possible to make the background

of the video file of the speaker translucent. The main concept of this study was to

develop presentation contents namely the presentation file using the Microsoft

PowerPoint and the separate low bit-rate coded audio-video of the speaker available

through a web server. Additionally, the presentation file was made up of the timing

information for each slide already recorded during the presentation event.

In order to play the presentation, the player treated the presentation slides and the

speaker as two separate inputs and achieved the realistic synchronization between the

running slides and the speaker’s explanation of the slides on the remote client’s desktop.

There was an Intel’s RDX (Realistic Display Mixer) in the player which mixed

the slides and the movie frames and displayed each frame of the presentation in a realistic

fashion. This technology was based on Intel’s MMX technology but had already been

phased out by the Intel Corporation.8

Further, there was only one codec, which supported the transparency that was

Intel's Indeo-Video 5.09 but it was an AVI codec, which was not a streaming media file.

Therefore, the researcher had mixed the presentation slide and video in the Adobe

Premiere and an output was made.

4.3.2.5 Low Bit-rate Video-Audio Encoding (E). This was the final step essential

to generate the low bit rate encoded audio-video of the speaker as a single file. Different

video and audio codecs were used for encoding the speaker’s video. Window Media

Encoder, Vegas Video, Helix producer and Quick Time were used for encoding. These

78

software were useful to convert in streaming media format. The results were already

discussed earlier. Visual outcome of different codecs and software are shown in appendix

A, B, C and D. Window Media Encoder is free and is available from window media

website.10 The encoding process in Window media encoder is shown in figure 12.

Visual outcome of Window Media Encoder using ISO MPEG-4 and ACELP.net codecs

is shown in Figure 13.

Vegas Video 3.0 was used to encode in Real Media (.rm), Window Media Video

(.wmv) Quick Time Movie (.mov) format. It could render directly from the timeline to

all three of the major streaming media formats. Each streaming media platform had a

number of default templates. In addition, it could modify and create its own templates by

clicking on the custom button. This software is free11 in the Web in demo version.

Figure 12

Encoding the Video and Audio in Window Media Encoder

79

Figure 13

After Encoding the Video File in .wmv Format.

Windows media technology includes intelligent streaming and also uses the

concept of target audiences. By using window media encoder, the desired target

audiences could be selected and the frame size and audio codec be specified. Unlike Real

Networks Sure Stream technology, it could not specify different audio codecs for

different target audiences.12

When encoding the video-audio file, we could select the many target audiences in

Window Media Encoder as needed, but, if we selected one of the target audience under

80 kbps then, the encoder did not allow choosing a target audience above 300 kbps.

80

NOTES

1International Telecommunication Union, ITU-T (Telecommunication Standardization Sector of ITU: Methods for Subjective Determination of Transmission Quality. <http://www.doc.ua.pt/arch/itu/rec/product/p.htm> June, 2002. Accessed in June, 2003.

2International Telecommunication Union, ITU-R (Radio Communication Sector of ITU):


3Crowell, Nancy. “How Do Your Media Add Up?” <http://www.workz.com/cgi-bin/gt/tpl_page.html,template=1&content=1897&nav1=1&> 2003. Accessed in March, 2003.

4Video over the Internet. <http://www.matrox.tv/includes/pdf> Accessed in May, 2003.

5<http://www.siggraph.org/education/materials/HyperGraph/video/codecs/PureVoice html> 2002. Accessed date: June, 2003.

6 Ibid. 7<http://www.videomach.com/VideoMach.html> Accessed date: October, 2002. 8Intel® Realistic Display Mixer (RDX). <http://www.intel.com/labs/archive/rdx.htm>

Accessed in January, 2003. 9Intel Indeo® Video 5. <http://www.siggraph.org/education/materials/HyperGraph

/video/codecs/indeo_v5/overview.htm> (1997). Accessed in January, 2003. 10<www.microsoft.com/windowmedia> Accessed in July, 2002. 11Steve Mack, Streaming Media Bible, Hungry Minds, Inc. NewYork, NY 10022. (2002). 12<http://www.sonicfoundry.com/PRODUCTS/showproduct.asp? PID=612&FeatureID=5447> Accessed in August, 2002.

81

CHAPTER 5

Summary, Conclusion and Recommendations

The study focused primarily on the video-audio streaming technology. It,

specifically, aimed to develop the video-audio with presentation slides in a single frame

maintaining the quality of the slides and delivering effectively in the streaming media

format which is supported by the player.

5.1 Summary of Findings

A five-minute VCD video footage in MPEG format was converted into streaming

media format using Window Media Encoder having ISO MPEG-4 and ACELP.net

codecs. The video footage was encoded in 34 kbps bit-rate that gives an average data

rate of 33.7 kbps i.e., within 34 kbps bit rate. The quality of video was ‘slightly good’

quality than ‘slightly annoying’ and the quality of audio is ‘slightly good’ than ‘moderate

effort required’ to understand the speaker. Video-audio and presentation materials were

packaged in a single frame by the use of the chroma-key tool.

Low bit-rate encoding and chroma-key technique were used to develop and

deliver a presentation material over the low bandwidth connection in the Internet. A five-

minutes presentation movie (in MPEG-1 format) had been used as sample.

The size of the movie was 75.5 MB with a bit-rate of 2380 kbps, frame rate of 25 fps,

and frame size of 320x240 pixels. Using this movie, the presentation contents in the

82

form of a .wmv file was developed, representing the audio-video of the speaker, and

presentation slide (made by using the Microsoft PowerPoint). The size of the resulting

.wmv file was 1.48 MB, with a bit-rate of 34 kbps when encoded in Window Media

Encoder while using Vegas Video with ISO MPEG-4 and ACELP.net codecs the bit-rate

was 32 kbps and file size was 1.35 MB.

In Real Media format using Helix Producer Basic, the bit-rate was 34 kbps and

file size 2.49 MB. When using Vegas Video the bit-rate was 34 kbps and file size was

1.33 MB. Besides the Windows Media (.wmv), other file formats were also studied with

the same encoding schemes (like MPEG-4 codec with QuickTime) and found different

results. Among other file types, Real audio-video format with real video 9 codec for

video and real audio 8 codec showed the next satisfactory result. Studying the result,

there were some limitations found in the current implementation which are as follows:

• The decoded video frame representing the speaker’s body shows some blocking

artifacts at the boundary of the speaker’s body, thus showing the chroma-key (in

this case, blue color) instead of smooth edge, at the boundary. This is clear as

shown in figure 16. This may be due to the light setting and camera. The video

footage was taken by normal Panasonic camera in natural light.

• The audio of the speaker appears different due to the distance of the speaker and

the microphone of camera. To achieve better quality it will be better to use

microphone in a controlled manner.

• The quality of presentation slide, which was explained by the speaker, was not in

original quality because the video and audio are combined and encoded together.

At this time, the researcher did not have enough resources (like real time display

83

mixing and necessary software) to study the other possibilities of mixing

technologies.

5.1.1 Visual Outcome at Various Stages of the Content Development. The

content of the media was developed on Windows XP platform (with Celeron 600 MHZ,

128 MB RAM) and the low bit-rate encoded audio-video in the Window Media Video

(.wmv); RealMedia (.rm) and Quick Time movie (.mov) format was generated.

For complete content development task, the machine took 2GB disk space. The

following are the intermediate results, represented in the form of one video frame of the

presentation, generated by different steps of the content development as discussed in

section 4.3.

Figure 14 shows a video frame in the original digital movie of the frame size 320

x 240 pixels. It represents the input to the first step of the content development.

Figure 14

Video Frame in the Original Digital Movie

84

Figure 15 shows the video frame when repositioning the video of the speaker. It

represents the outcome of the first step.

Figure 16 shows the video frame when the PowerPoint slide is superimposed with

the video of the speaker. It represents the outcome, of the third step, saved in AVI format.

Figure 16

Superimpose of the PowerPoint Slide in the Back of the Video.

Figure 15

Output of Repositioning the Video

85

Figure 17 shows the video frame when encoded with the video-audio file. It

represents the outcome of the final step, saved in .wmv format.

Figure 17

After Encoding the Video File in .wmv Format.

The following are the achievements made by the test:

When the video-audio file was tested using the window media player, quick time

and real media, the presentation showed no interruption. In terms of quality, Window

Media Video (.wmv) file format had given best result of video and audio quality followed

by Real Media. The video-audio playing data rate in Window Media Video format,

Quick Time and Real Media format were 32 and 33, 64, 34 kbps and file size was 1.35

and 1.42, 2.57, 1.33 MB respectively when encoded in Vegas Video as shown on table

11. Smoothness of video was observed in real media. Using Window Media Encoder

and Helix producer the data rate was 34 kbps but the combination of codec (MPEG-4 and

ACELP.net) was best in Window Media Encoder than Helix Producer Basic in terms of

overall quality. In quick time the data rate was 52.8 using Quick Time Pro 6.0 which

86

exceeded the acceptable data rate and quality of audio was slightly better than Vegas

Video but text and moving video were the same, where both softwares used ISO MPEG-

4 and Qualcomm Pure Voice codecs.

When the video-audio file is encoded with different format and with the different

compression technology, the streaming data rate and file size varied accordingly.

Therefore, the researcher used the low bandwidth compression technology. This

is helpful to achieve the 34 kbps data rate for 56 kbps modem users.

In addition to this study, the chroma-key helped to superimpose the presentation

slide in presentation video where the speaker is explaining the slide in front of the white

board by developing such contents. Consequently, it was found that it is possible to

provide the clients real experience of viewing a real live presentation video.

5.2 Conclusion

Streaming multimedia is a logical step in the development of the Internet, moving

from text-base content, then graphics and animation, to downloadable video and audio

files. It enhances the traditional text and image base presentation and makes rich

environment for viewing experience.

Based from the current study of different audio/video compression technology

and based on low bit-rate encoding techniques, it can be concluded that rich media can be

developed and delivered for the low bandwidth connection using low bit-rate encoder

such as MPEG-4 and ACELP.net. The chroma-key tool can be used to package video-

audio and presentation material in a single frame to deliver in a web server.

87

MPEG-4 codec for video and ACELP.net codec for audio is the best low bit-rate

codec, and Real Video 9 for Video and Real audio 8 for audio is the second best codec.

In terms of coding software, Window Media Encoder is the best encoder software for

.wmv format and Helix Producer Basic is the best for .rm format.

Likewise, to develop the streaming media presentation material with video-audio

in a single frame, video can be recorded in plain background color especially blue and

green. Video and audio quality also depends on camera, light setting, capture card,

microphone, etc. After digitizing the video and audio, split the video and audio in

separate file and throw away unnecessary redundant, audio-visual information from the

presentation video. Before superimposing the presentation material in video, it is

necessary to set the speaker's position using video editing software to avoid the speaker's

body movement overlapping the presentation slides. To superimpose the presentation

material in the video, the chroma-key tool is used. The presentation material and video-

audio is synchronized with time information after using chroma-key tool and

superimposing the presentation material. Lastly, video-audio presentation is encoded in

video-audio encoder software in desired streaming format using low bit-rate encoding

technique.

The use of the chroma-key technique in removing the background of video and

superimposing the presentation slide helps to reduce the file size and maintain the quality

of video for the streaming media. If there is one color in the video background, the

compression ratio is more effective than multiple color background. In this case, it

enables us to deliver the media contents to the users connected with low bandwidth

connection.

88

5.3 Recommendations

The researcher believes that the study provides early support and guidelines for

the development of the streaming media technology. It also provides guidelines in the

use of the chroma-key technology for streaming presentation content development and

delivery for the low bandwidth connection.

Some of the limitation faced in this study like unsmooth edges of the speaker's

video and reduced audio quality can be further improved by taking the video footage in a

well lit room with good sound equipments as found in studios. If recording and mixing

of audio and video is handled by the professionals, rich quality can be achieved.

The type of content developed for this study, which are presentation slides with

the speaker (or instructor) together in one frame looks suitable for the business

presentation and the lecture materials for the distance education. Based on the possibility

to develop such content for low bandwidth connection, it is highly recommended for

other researchers to study the possibility or feasibility by using this type of contents in the

distance education and business presentation context.

Due to lack of resources, the quality of audio-video was measured in a subjective

way. The laboratory testing for audio-video quality is recommended.

To get better quality of presentation material, it is highly recommended to use real

time mixing technology with audio-video and presentation material.

89

B I B L I O G R A P H Y

90

BIBLIOGRAPHY

I. BOOKS

Bates, A.W. Technology, Open Learning, and Distnace Education. Routledge 1995. Jose F. Calderon and Expectacion C. Gonzales. Methods of Research and Thesis Writing.

National Book Store. 1993.

Mack, Steve. Streaming Media Bible. Hungry Minds, Inc. NewYork, NY 10022. (2002).

II. Articles & Periodicals

Banks, S., and Mc Connell, D. On-line learning using broadcast materials: Case study of the BBC On-line Learning Pilot Programme in Women’s Health. proceedings of the seoncd international networked learning coference pp 374-380, Lancaster University (2000).

Dapeng Wu, Yiwei Thomas Hou, Wenwu Zhu, Ya-Qin Zhang, Jon M. Peha. Streaming

Video over the Internet: Approaches and Directions. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, No. 1. February 2001.

Dapeng Wu, Yiwei Thomas Hou and Ya- Qin Zhang., Transporting Real-time Video

Over the Internet: Challenges and Approaches. Proceeding of the IEEE, Vol. 88, no. 12, Dec. 2000.

Di Zhong and Shih-Fu Chang. AMOS: AN ACTIVE SYSTEM FOR MPEG-4 VIDEO

OBJECT SEGMENTATION. 1998 International Conference on Image Processing, October 4-7, 1998, Chicago, Illinois, USA

M.Kass, A.Witkin, D. "Snakes: Active contour models" International Journal of

Computer Vision (1988): 321-331.

Micke O,Donoghu, Michael Barber and Steve Childs. The Role of Streaming Media in the delivery of distance learning. Pre-Submission Draft – Lancaster University, August 2000.

O’ Donoghue, M. and Thily, H. Interactivity beyond Belief. Interfaces, Vol 8, Paris VIII

universite, 1995.

91

O. Egger, E. Reusens, T. Ebrahimi and M. Kunt. Very Low Bit Rate Coding of Visual Information. Swiss Federal Institute of Technology at Lausanne, Switzerland. 1996.

Rahkila, Martti and Huopaniemi, Jyri. Real Time Internet Audio-Problems and

Solutions. AES 102nd International Convention, Munich, Germany, March 22-25, 1997.

Sircar, J., Streaming Media Technology: Laying the Foundation for Education Change.

Syllabus, 14 (3), 2000 p. 26. T. Ebrahimi, E. Reusens, W. Li, and P. Cicconi. New Trends in Very Low Bitrate Video

Coding. Proceedings of the IEEE, July 1995.

III. Internet

Acquiring and Digitizing Media. <http://www.doit.wisc.edu/services/streaming/tutorial/transcripts/transcripts6.htm> 2001. Accessed in September, 2002.

"A review of Video Streaming over the Internet". SuperNOVA Project. DSTC Teaching

Report. <http://archieve.dstc.edu.au/RDUstaff/Jane-hunter/videostreaming.html>, 10

August 1997. Accessed in August, 2002. Adobe Premier. <http://www.adobe.com/prodcts/premier/overview.html> 4/24/2002.

Accessed in December, 2002. Adobe premiere 6.5 help file. Cunningham, David and Francis, Neil. "An introduction to Streaming Video". Cultivate

Interactive, Issue 4. <http://www.cultivate-int.org/issue4/video/> 7May, 2001. Accessed in August,

2002. Chromakey techniques: Advanced. <http://www.mvcc.net/comm/Tips/shtm> (2001).

Accessed in August, 2002. Capturing from Digital Sources.

<http://www.microsoft.com/windowsxp/expertzone/columns/bridgman/02february18.asp> : posted February 18, 2002. Accessed in December, 2002.

92

Crowell, Nancy. "How Do Your Media Add Up"?. <http://www.workz.com/cgi-

bin/gt/tpl_page.html,template=1&content=1897&nav1=1&> 2003. Accessed in March, 2003.

Divx Networks. <http://divxnetworks.com/,2001> (2001). Accessed in September, 2002. Dario Luparello, Sarit Mukherjee and Sanjoy Paul. "Streaming Media Traffic: An

Empirical Study", <www.bell-labs.com/user/sanjoy/streaming-media-edgix.doc> (2000). Accessed in September, 2002.

Helix Producer. <http://www.realnetworks.com/products/producer/> Accessed in

February, 2003. Intel® Realistic Display Mixer (RDX). <http://www.intel.com/labs/archive/rdx.htm>

Accessed in January, 2003. Intel Indeo® Video 5. <http://www.siggraph.org/education/materials/HyperGraph

/video/codecs/indeo_v5/overview.htm> (1997). Accessed in January, 2003. International Telecommunication Union, ITU-T (Telecommunication Standardization

Sector of ITU: Series P: Telephone Transmission Quality-Methods for objective and subjective assessment of quality. <http://www.doc.ua.pt/arch/itu/rec/product/p.html> June, 2002. Accessed in June, 2003.

International Telecommunication Union, ITU-R (Radio Communication Sector of ITU):


Interactive Model-Based Coding for Face Metaphor User: Interface in Network

Communciations. <http://www.iuiconf.org/97pdf/1997-002-0036.pdf> (1997). Accessed in August, 2002.

John D. Micheletti and Malachi J. Wurpts, "Applying Chorma-Keying Techniques in a

Virtual Environment". <http://www.tss.swri.edu/pub/2000AEROSENSE_HMD.htm> (2000). Accessed in July, 2002.

Larson, Don. (1996), "Does Multimedia Have a Dark Side"?

<http://wwwwebdeveloper.com/multimedia/mutimedia_dark_side.html> (1996). Accessed in November, 2002.

Microsoft Windows Media Player. <http://www.microsoft.com/windows/mediaplayer>

June 2002. Accessed in July, 2002.

93

Nielsen, Jakob, "Video and Streaming Media,"

<http:www.useit.com/alertbox/990808.html> August 8, 1999. Accessed in October, 2002.

P. Westerink, L. Amini, S. Veliah W. "A Live Intranet Distance Learning System Using

MPEG-4 over RTP/RTSP," IEEE, 601-604. <http://www.informatik.uni-trier.de/~ley/db/conf/icme2000html> (2000). Accessed in August, 2002.

Radosevich, Lynda and Fitzoff, Emily, "Damming The Stream,".

<http:www.britannica.com/bcom/magazine/article/0,5744,212643,00.html> March 2, 1998. Accessed in August, 2002.

Streaming Methods. Web Server Vs. Streaming Media Server. <http://www.microsoft.com/Windows/windowsmedia/compare/webservvstreamse

rv.asp> Last updated Thrusday, March 21, 2001. Accessed in August, 2002. Sehgal, Rajeev. "Net Video’s Obstacle to a steady stream," <http://news.com.com/2100-

1023-900617.html> November, 2002. Accessed in November, 2002. Speech Coding. <http://cslu./cse.ogi.edu/HLT survey/ch10node4html> Accessed in

October, 2002. Tricia Gill, "An Introduction to Windows Media Encoder 7.1," <http://msdn.microsoft.com/library/default.asp?url=/library/en-

us/dnwmt/html/encode71.asp> May 16, 2001. Accessed in September, 2002. Tricia Gill, "An Introduction to Windows Media 8 Encoding Utility," <http://msdn.microsoft.com/library/default.asp?url=/library/en-

us/dnwmt/html/wmencodutil.asp> March 16, 2001. Accessed in September, 2002. Very Low Bitrate Coding of Virtual Human Animation in MPEG-4.

<http://www.research.att.com/projects/tts/papers/2000_ICME/Coding pdf> (2000). Accessed in September, 2002.

Video over the Internet. <http://www.matrox.tv/includes/pdf> Accessed in May, 2003. <http://www.zdnet.com> updated may 31, 2000. Accessed in July 2002. <http://service.real.com/help/library/guides/productiongiq/HTML/htmfiles/

realsys.htm #63065>. Accessed in October 2002. <http://www.cs.csustan.edu/~framirez/video.html> updated January 31, 2001. Accessed

in December, 2002. <http://www.viewcast.com> 2000. Accessed in February, 2003.

94

<http://www.tigerdirect.com/applications/Category/category_slc.asp?Id=2806> 2002. Accessed in January, 2003.

<http://www.apple.com/finalcutpro> 2003. Accessed in January, 2003. <http://www.sonicfoundary.com/PRODUCTS/minisites/vegas3/video-eit.htm> Accessed

in December, 2002. <http://support.jp.dell.com/docs/video/dazzle/sw/en/Index.htm> Initial release on

November, 2000. Accessed in February, 2003. <http://www.realnetworks.com/solutions/leadership/realvideo.html> Accessed in

December, 2002. <http://www.videomach.com/VideoMach.html> Accessed in October, 2002. <http://www.siggraph.org/education/materials/HyperGraph/video/codecs/

PureVoice.html> 2002. Accessed in June, 2003. <www.microsoft.com/windowmedia> Accessed in July, 2002. <http://www.sonicfoundry.com/PRODUCTS/showproduct.asp?

PID=612&FeatureID=5447> Accessed in August, 2002.

A P P E N D I C E S

96

APPENDIX – A

Sample Video Frame Encoded in Window Media Encoder

Encoded in Window Media Encoder (file format .wmv, ISO-MPEG-4 codec)

Encoded in Window Media Encoder 7.1

(file format .wmv, Window Media Video 8 codec)

97

APPENDIX – B

Sample Video Frame Encoded in Vegas Video

Encoded in Vegas Video 3.0 (file format .wmv, ISO MPEG-4 codec)

Encoded in Vegas Video 3.0 (file format .wmv, Window Media Video 8 codec)

98

APPENDIX – C

Sample Video Frame Encoded in Helix Producer Basic and Vegas Video

Encoded in Window Helix Producer Basic (file format .rm, Window Media Video 9 codec)

Encoded in Vegas Video (file format .rm, Real Video 8 codec)

99

APPENDIX – D

Sample Video Frame Encoded in Quick Time Pro and Vegas Video

Encoded in Vegas Video (file format .mov, ISO MPEG-4 codec)

Encoded in Quick Time Pro (file format .mov, ISO MPEG-4 codec)

100

APPENDIX – E

Methodology for the Subjective Assessment of the Quality of Television Pictures Rec. ITU-R BT.500-10

The ITU Radiocommunication Assembly,

considering a) that a large amount of information has been collected about the methods used in

various laboratories for the assessment of picture quality; b) that examination of these methods shows that there exists a considerable measure

of agreement between the different laboratories about a number of aspects of the tests;

c) that the adoption of standardized methods is of importance in the exchange of information between various laboratories;

d) that routine or operational assessments of picture quality and/or impairments using a five-grade quality and impairment scale made during routine or special operations by certain supervisory engineers, can also make some use of certain aspects of the methods recommended for laboratory assessments;

e) that the introduction of new kinds of television signal processing such as digital coding and bit-rate reduction, new kinds of television signals using time-multiplexed components and, possibly, new services such as enhanced television and HDTV may require changes in the methods of making subjective assessments;

f) that the introduction of such processing, signals and services, will increase the likelihood that the performance of each section of the signal chain will be conditioned by processes carried out in previous parts of the chain,

recommends 1 that the general methods of test, the grading scales and the viewing conditions for

the assessment of picture quality should be used for laboratory experiments and whenever possible for operational assessments;

2 that, in the near future and notwithstanding the existence of alternative methods

and the development of new methods, Recommendation should be used when possible; and

3 that, in view of the importance of establishing the basis of subjective assessments,

the fullest descriptions possible of test configurations, test materials, observers, and methods should be provided in all test reports.

101

Description of assessment methods

1 Introduction

Subjective assessment methods are used to establish the performance of television systems using measurements that more directly anticipate the reactions of those who might view the systems tested. In this regard, it is understood that it may not be possible to fully characterize system performance by objective means; consequently, it is necessary to supplement objective measurements with subjective measurements.

In general, there are two classes of subjective assessments. First, there are assessments that establish the performance of systems under optimum conditions. These typically are called quality assessments. Second, there are assessments that establish the ability of systems to retain quality under non-optimum conditions that relate to transmission or emission. These typically are called impairment assessments.

To conduct appropriate subjective assessments, it is first necessary to select from the different options available those that best suit the objectives and circumstances of the assessment problem at hand.

The purpose of this Annex is limited to the detailed description of the assessment methods. The choice of the most appropriate method is nevertheless dependent on the service objectives the system under test aims at. The complete evaluation procedures of specific applications are therefore reported in other ITU-R Recommendations. 2 Common features

General viewing conditions for subjective assessments are given. Specific viewing conditions, for subjective assessments of specific systems, are given in the related Recommendations.

2.1 General viewing conditions

Different environments with different viewing conditions are described.

The laboratory viewing environment is intended to provide critical conditions to check systems. General viewing conditions for subjective assessments in the laboratory environment.

The home viewing environment is intended to provide a means to evaluate quality at the consumer side of the TV chain. General viewing conditions reproduce a near to home environment. These parameters have been selected to define an environment slightly more critical than the typical home viewing situations.

3 Selection of test methods

A wide variety of basic test methods have been used in television assessments. In practice, however, particular methods should be used to address particular assessment problems. A survey of typical assessment problems and of methods used to address these problems is given in Table 1.

102

TABLE 1

Selection of test methods

4 The double-stimulus impairment scale (DSIS) method (the EBU method) 4.1 General description

A typical assessment might call for an evaluation of either a new system, or the effect of a transmission path impairment. The initial steps for the test organizer would include the selection of sufficient test material to allow a meaningful evaluation to be made, and the establishment of which test conditions should be used. If the effect of parameter variation is of interest, it is necessary to choose a set of parameter values which cover the impairment grade range in a small number of roughly equal steps. If a new system, for which the parameter values cannot be so varied, is being evaluated, then either additional, but subjectively similar, impairments need to be added.

The double-stimulus (EBU) method is cyclic in that the assessor is first presented

with an unimpaired reference, then with the same picture impaired. Following this, he is asked to vote on the second, keeping in mind the first. In sessions, which last up to half an hour, the assessor is presented with a series of pictures or sequences in random order and with random impairments covering all required combinations. The unimpaired picture is included in the pictures or sequences to be assessed.

The method uses the impairment scale, for which it is usually found that the stability of the results is greater for small impairments than for large impairments.

Assessment problem Method used Description

Measure the quality of systems relative to a reference

Double-stimulus continuous quality-scale (DSCQS) method(1)

Rec. ITU-R BT.500, § 5

Measure the robustness of systems (i.e. failure characteristics)

Double-stimulus impairment scale (DSIS) method(1)


Quantify the quality of systems (when no reference is available)

Ratio-scaling method(2) or categorical scaling (under study)

Report ITU-R BT.1082

Compare the quality of alternative systems (when no reference is available)

Method of direct comparison, ratio-scaling method(2) or categorical scaling (under study)


Identify factors on which systems are perceived to differ and measure their perceptual influence

Method under study Report ITU-R BT.1082

Establish the point at which an impairment becomes visible

Threshold estimation by forced-choice method or method of adjustment (under study)


Determine whether systems are perceived to differ

Forced-choice method (under study) Report ITU-R BT.1082

Measure the quality of stereoscopic image coding

Double stimulus continuous quality-scale (DSCQS) method(3)


Measure the fidelity between two impaired video sequences

Simultaneous double stimulus for continuous evaluation (SDSCE) method

Rec. ITU-R BT.500, § 6.4

Compare different error resilience tools Simultaneous double stimulus for continuous evaluation (SDSCE) method

Rec. ITU-R BT.500, § 6.4

(1) Some studies on contextual effects were carried out for the DSCQS and the DSIS methods. It was found that the results of the DSIS method are biased to a certain degree by contextual effects.

(2) Some studies suggest that this method is more stable when a full range of quality is available. (3) Due to the possibility of high fatigue when evaluating stereoscopic images, the overall duration of a test session should be

shortened to be less than 30 min.

103

Although the method sometimes has been used with limited ranges of impairments, it is more properly used with a full range of impairments.

4.2 General arrangement

The way viewing conditions, source signals, test material and the observers and the presentation of results are defined or selected.

The generalized arrangement for the test system should be as shown in Fig. 1.

FIGURE 1

FIGURE 0500-02 = 9 CM

The assessors view an assessment display which is supplied with a signal via a timed switch. The signal path to the timed switch can be either directly from the source signal or indirectly via the system under test. Assessors are presented with a series of test pictures or sequences. They are arranged in pairs such that the first in the pair comes direct from the source, and the second is the same picture via the system under test.

4.3 Presentation of the test material

A test session comprises a number of presentations. There are two variants to the structure of presentations, I and II outlined below.

Variant I: The reference picture or sequence and the test picture or sequence are presented only once as is

shown in Fig. 2a). Variant II: The reference picture or sequence and the test picture or sequence are presented twice as is

shown in Fig. 2b).

104

Variant II, which is more time consuming than variant I, may be applied if the

discrimination of very small impairments is required or moving sequences are under test.

FIGURE 2

FIGURE 0500-03 = 13 CM

Phases of presentation

T1 = 10 s Reference picture T2 = 3 s Mid-grey produced by a video level of around 200 mV T3 = 10 s Test condition T4 = 5-11 s Mid-grey

Experience suggests that extending the periods T1 and T3 beyond 10 s does not improve the assessors' ability to grade the pictures of sequences.

4.4 Grading scales The five-grade impairment scale should be used:

a) Variant I

b) Variant II

105

5 imperceptible 4 perceptible, but not annoying 3 slightly annoying 2 annoying 1 very annoying Assessors should use a form which gives the scale very clearly, and has numbered boxes or some other means to record the gradings. 4.5 The introduction to the assessments At the beginning of each session, an explanation is given to the observers about the type of assessment, the grading scale, the sequence and timing. The range and type of the impairments to be assessed should be illustrated on pictures other than those used in the tests, but of comparable sensitivity. It must not be implied that the worst quality seen necessarily corresponds to the lowest subjective grade. Observers should be asked to base their judgement on the overall impression given by the picture, and to express these judgements in terms of the wordings used to define the subjective scale. 5 The double-stimulus continuous quality-scale (DSCQS) method 5.1 General description A typical assessment might call for evaluation of a new system or of the effects of transmission paths on quality. The double-stimulus method is thought to be especially useful when it is not possible to provide test stimulus test conditions that exhibit the full range of quality. The method is cyclic in that the assessor is asked to view a pair of pictures, each from the same source, but one via the process under examination, and the other one directly from the source. He is asked to assess the quality of both. In sessions which last up to half an hour, the assessor is presented with a series of picture pairs (internally random) in random order, and with random impairments covering all required combinations. 5.2 General arrangement The way viewing conditions, source signals, test material, the observers and the introduction to the assessment are defined or selected. The generalized arrangement for the test system should be as shown in Fig. 3.

FIGURE 3

106

5.3 Presentation of the test material

A test session comprises a number of presentations. For variant I which has a single observer, for each presentation the assessor is free to switch between the A and B signals until the assessor has the mental measure of the quality associated with each signal. The assessor may typically choose to do this two or three times for periods of up to 10 s. For variant II which uses a number of observers simultaneously, prior to recording results, the pair of conditions is shown one or more times for an equal length of time to allow the assessor to gain the mental measure of the qualities associated with them, then the pair is shown again one or more times while the results are recorded. The number of repetitions depends on the length of the test sequences. For still pictures, a 3-4 s sequence and five repetitions may be appropriate. For moving pictures with time-varying artefacts, a 10 s sequence with two repetitions may be appropriate. Where practical considerations limit the duration of sequences available to less than 10 s, compositions may be made using these shorter sequences as segments, to extend the display time to 10 s. In order to minimize discontinuity at the joints, successive sequence segments may be reversed in time (sometimes called “palindromic” display). Care must be taken to ensure that test conditions displayed as reverse time segments represent

107

causal processes, that is, they must be obtained by passing the reversed-time source signal through the system under test. 5.4 Grading scale

The method requires the assessment of two versions of each test picture. One of each pair of test pictures is unimpaired while the other presentation might or might not contain an impairment. The unimpaired picture is included to serve as a reference, but the observers are not told which is the reference picture. In the series of tests, the position of the reference picture is changed in pseudo-random fashion.

The observers are simply asked to assess the overall picture quality of each

presentation by inserting a mark on a vertical scale. The vertical scales are printed in pairs to accommodate the double presentation of each test picture. The scales provide a continuous rating system to avoid quantizing errors. The associated terms categorizing the different levels are the same as those normally used; but here they are included for general guidance and are printed only on the left of the first scale in each row of ten double columns on the score sheet. Figure 4 shows a section of a typical score sheet. Any possibility of confusion between the scale divisions and the test results is avoided by printing the scales in blue and recording the results in black.

FIGURE 4

Portion of quality-rating form using continuous scales*

108

6 Adjectival categorical judgement methods

In adjectival categorical judgements, observer assign an image or image sequence to one of a set of categories that, typically, are defined in semantic terms. The categories may reflect judgements of whether or not an attribute is detected (e.g. to establish the impairment threshold). Categorical scales that assess image quality and image impairment, have been used most often, and the ITU-R scales are given in Table 2 and 3. In operational monitoring, half grades sometimes are used. Scales that assess text legibility, reading effort, and image usefulness have been used in special cases.

TABLE 2

ITU-R quality scales

Five-grade scale

Quality Score

Excellent Good Fair Poor Bad

5 4 3 2 1

TABLE 3

ITU-R impairment scales

Image Impairment Score

Imperceptible

Perceptible, but not annoying

Slightly annoying

Annoying

Very annoying

5

4

3

2

1

This method yields a distribution of judgements across scale categories for each

condition. The way in which responses are analysed depends upon the judgement (detection, etc.) and the information sought (detection threshold, ranks or central tendency of conditions, psychological “distances” among conditions).

6.1 Non-categorical judgement methods

In non-categorical judgements, observers assign a value to each image or image sequence shown. There are two forms of the method.

109

In continuous scaling, a variant of the categorical method, the assessor assigns

each image or image sequence to a point on a line drawn between two semantic labels (e.g. the ends of a categorical scale as in Table 3). The scale may include additional labels at intermediate points for reference. The distance from an end of the scale is taken as the index for each condition.

In numerical scaling, the assessor assigns each image or image sequence a number that reflects its judged level on a specified dimension (e.g. image sharpness). The range of the numbers used may be restricted (e.g. 0-100) or not. Sometimes, the number assigned describes the judged level in “absolute” terms (without direct reference to the level of any other image or image sequence as in some forms of magnitude estimation. In other cases, the number describes the judged level relative to that of a previously seen “standard” (e.g. magnitude estimation, fractionation, and ratio estimation). Both forms result in a distribution of numbers for each condition. The method of analysis used depends upon the type of judgement and the information required (e.g. ranks, central tendency, psychological “distances”).

110

APPENDIX – F

Methods for Subjective Determination of Transmission Quality 1 Scope

This Recommendation contains advice to Administrations on conducting subjective tests of transmission quality in their own laboratories. It does not however deal with types of tests described in detail in other ITU–T Recommendations and documentation, namely:

a) determination of Reference and Relative Equivalents – see Handbook on

Telephonometry, Geneva, 1993; b) determination of Loudness Ratings – see Recommendation P.78; c) determination of Articulation Ratings (A.E.N. values) – see Handbook on

Telephonometry, Geneva, 1993.

Neither does it deal with the various kinds of specialized tests used in the course of developing items of telephone equipment, for the purpose of diagnosing faults and shortcomings, such as Diagnostic Rhyme Tests [1] and other tests dedicated to the study of specific aspects of speech output.

This Recommendation gives the approved methods which are considered to be

suitable for determining how satisfactorily given telephone connections may be expected to perform.

The methods indicated here are intended to be generally applicable whatever the

form of degradation factors present. Examples of degrading factors include: loss (often frequency dependent); circuit noise; transmission errors (random bit errors as well as erased frames that occur in systems such as mobile communications); environmental noise; sidetone; talker echo; non-linear distortion of various kinds including low bit-rate encoding; propagation time; harmful effects of voice-operated devices; distortions of the time scale arising from packet switching; and time-varying degradations of the communication channel, including those arising in loudspeaking sets. Combinations of two or more of such factors also have to be catered for. Further guidance for specific applications is available in Recommendations P.830 (digital speech codecs), P.84 (DCME/PCME), and P.85 (speech output devices).

2 References

The following Recommendations and other references contain provisions that, through reference in this text, constitute provisions of this Recommendation. At the time of publication, the editions indicated are valid. All Recommendations and other references are subject to revision; all users of this Recommendation are therefore encouraged to investigate the possibility of applying the most recent edition of the

111

Recommendations listed below. A list of the currently valid ITU–T Recommendations is regularly published.

– IEC Publication 1260: 1995, Electroacoustics – Octave-band and fractional – Octave-band filters. – IEC Publication 581-5: 1981, High fidelity audio equipment and systems; Minimum performance

requirements – Part 5: Microphones. – IEC Publication 651: 1979, Sound level meters. (Amendment 1-1993) (Corrigendum March 1994). – ISO 266: 1975, Acoustics – Preferred frequencies for measurements. – ISO 1996-1: 1982, Acoustics – Description and measurement of environmental noise – Part 1: Basic

quantities and procedures. – ISO 1996-2: Acoustics – Description and measurement of environmental noise – Part 2: Acquisition

of data pertinent to land use. – ISO 1996-3: 1987, Acoustics – Description and measurement of environmental noise – Part 3:

Application to noise limits. – ITU-T Recommendation G.113 (1996), Transmission impairments. – CCITT Recommendation G.722 (1988), 7 kHz audio-coding within 64 kbit/s. – CCITT Recommendation G.726 (1990), 40, 32, 24 and 16 kbit/s Adaptive Differential Pulse Code

Modulation (ADPCM). – CCITT Recommendation G.728 (1992), Coding of speech at 16 kbit/s using low-delay code excited

linear prediction. – ITU–T Recommendation G.729 (1996), Coding of speech at 8 kbit/s using Conjugate-Structure

Algebraic-Code-Excited Linear-Prediction (CS-ACELP). – ITU–T Recommendation P.10 (1993), Vocabulary of terms on telephone transmission quality and

telephone sets. – ITU–T Recommendation P.11 (1993), Effect of transmission impairments. – CCITT Recommendation P.48 (1988), Specification for an intermediate reference system. – ITU–T Recommendation P.56 (1993), Objective measurement of active speech level. – ITU–T Recommendation P.78 (1993), Subjective testing method for determination of loudness ratings

in accordance with Recommendation P.76. – ITU–T Recommendation P.810 (1996), Modulated Noise Reference Unit (MNRU). – CCITT Recommendation P.82 (1984), Method for evaluation of service from the standpoint of speech

transmission quality. – ITU–T Recommendation P.830 (1996), Subjective performance assessment of telephone-band and

wideband digital codecs. – ITU–T Recommendation P.84 (1993), Subjective listening test method for evaluating digital circuit

multiplication and packetized voice systems. – ITU–T Recommendation P.85 (1994), A method for subjective performance assessment of the quality

of speech voice output devices. 3 Definitions

For the purposes of this Recommendation, the following definitions apply:

3.1 dBov: dB relative to the overload of a digital system. 3.2 Q: The ratio, in dB, of speech power to modulated noise power in the Modulated Noise Reference Unit, as described in Recommendation P.810.

112

4 Conventions

Subjective evaluation of telecommunications equipment and systems may, in principle, be conducted using listening-only or conversational methods of subjective testing. As a practical matter, listening-only tests may be the only feasible method of subjective testing during the development of new transmission equipment or telecommunication services. 5 Recommended methods 5.1 Opinion scale

The following opinion scales are those recommended by the ITU-T.

5.1.1 Conversation opinion scale

Various five-point category-judgement scales may be used for different purposes. The layout and wording of opinion scales, as seen by subjects in experiments, is very important, and should follow the standard arrived at through years of experience. The following opinion scale is the most frequently used for ITU-T applications and equivalent wording should be used depending on language which might result in small variations to the original English text.

This is a category rating obtained from each subject at the end of each

conversation. Opinion of the connection you have just been using Excellent

Good

Fair

Poor

Bad

The experimenter allocates the following values to the scores: Excellent = 5; Good = 4; Fair = 3; Poor = 2; Bad = 1

113

5.1.2 Difficulty scale

This is a binary response obtained from each subject at the end of each conversation.

Did you or your partner have any difficulty in talking or hearing over the connection?

Yes

No

The experimenter allocates the following values to the scores: Yes = 1 No = 0 The quantity evaluated (percentage of "yes" responses) is called percentage Difficulty or per cent "Difficult", and is denoted by the symbol %D. The corresponding simple proportion is denoted by the symbol d; in other words, %D = 100d.

NOTE – It is often the case that the nature of the difficulty is required and then it is usual for the experimenter to ask the subject to describe in his/her own words their perception of the difficulty.

5.2 Opinion scales recommended by the ITU-T

Various five-point category-judgement scales may be used for different purposes. The layout and wording of opinion scales, as seen by subjects in experiments, is very important, and should follow the standard arrived at through years of experience. The following opinion scales are those most frequently used for ITU-T applications and equivalent wording should be used depending on language which might result in small variations to the original English text: a) Listening-quality scale Quality of the speech Score

Excellent 5

Good 4

Fair 3

Poor 2

Bad 1

114

b) Listening-effort scale The heading of the listening-effort opinion scale is particularly important. Without it, the other descriptions are liable to be seriously misunderstood. Effort required to understand the meanings of sentences Score Complete relaxation possible; no effort required 5

Attention necessary; no appreciable effort required 4

Moderate effort required 3

Considerable effort required 2

No meaning understood with any feasible effort 1

c) Loudness-preference scale

Loudness preference Score Much louder than preferred 5

Louder than preferred 4

Preferred 3

Quieter than preferred 2

Much quieter than preferred 1

CURRICULUM VITAE

Sushil Kumar Sharma Hekuli – 4, Mouli, Dang, Nepal

Email: [email protected]

[email protected]

PERSONAL INFORMATION

Date of Birth : 18 August, 1971

Mailing Address : G.P.O. Box 4513,

Kathmandu, Nepal

EDUCATIONAL BACKGROUND

Master of Science in Information

Technology

: Saint Louis University

Baguio City, Philippines

August, 2003

Master Degree in Economics : Tribhuvan University

Kathmandu, Nepal

June, 1994

Bachelor of Arts : Tribhuvan Unversity

Kathmandu, Nepal

July, 1992

WORK EXPERIENCES

Administrative Officer : Shiva Nirman Company

Kalimati, Kathmandu, Nepal

January 1998 to October, 2000

Administrative Assistant : National Forensic Science Laboratory

Khumaltar, Lalitpur, Nepal.

February, 1996 to December, 1997

Documents

APPLICATION OF CHROMA KEY AND LOW BIT-RATE ENCODING ... · 1. Title: APPLICATION OF CHROMA KEY AND LOW BIT-RATE ENCODING TECHNIQUE IN INTERNET VIDEO STREAMING Total Number of Pages: