Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
APPLICATION OF CHROMA KEY AND LOW BIT-RATE ENCODING
TECHNIQUE IN INTERNET VIDEO STREAMING
A Thesis
Presented to the Faculty
of the Graduate Program in Information Technology
Saint Louis University
In Partial Fulfillment
of the Requirements for the Degree
Master of Science in Information Technology
by
Sushil Kumar Sharma
August 2003
ACKNOWLEDGEMENTS The complexity of this paper, especially required the help and input of many
capable people. Without their support, this paper could not have been completed.
In this regard, my most heartfelt acknowledgement and appreciation to my adviser
Mr. Daniel-Rey M. Bayog who is also the Graduate Program Co-ordinator. He is always
willing and has untiringly devoted his precious time in guiding and assisting me; he is
always there when I needed him. I would also like to take this opportunity to express my
sincere thanks to Engr. Angelito C. Peralta, Director of the Management of Information
System Office; Ms. Claire B. Berto; and Mrs. Cecilia A. Mercado, Dean of the College of
Information and Computing Sciences, and members of advisory committee for their
encouragement, guidance, valuable suggestions and moral support. I also appreciate the
encouragement and support provided by the instructors of the College of Information and
Computing Sciences Departments and the MIS staff.
Special thanks also go to my colleague Deepak Kumar Shrestha, who is always
patient and understanding with my questions. I also wish to express thanks to my friends
and peer members:
Sanjay Kumar Hirachan
Ram Pulami
Dujendra Kumar Basnet
Prem Sagar Subedi
Yamuna Prasad Sah
Niraj Narsingh Rajbhandari
Niraj Man Shrestha
Narendra Pradhan
Sita Ram Kafle
Rajiv Gupta
Rabin Gupta
Arket Raj Maharjan
Pramod Tuladhar
Manish Pradhan
Keyoor Gautam
Roshan Lal Dhonju
Sudesh Shrestha and
the other Nepalese friends studying in SLU, UB, BCF and BSU who made my stay in
the Philippines pleasant, memorable and meaningful.
My classmates and friends Tabdi Fargo, Carmelo Fidel Rustia, Grace A. Babate
Adelaida B. Laking, Dennis Tampoa, Christever Del Rosario and Teofilo Llanes who
helped me during my study period in this university. I have a special place in my heart
for them.
I would like to dedicate this thesis to my parents who have made many sacrifices
to allow me to be where I am today. My parents have taught me to value integrity,
determination, and concern for other people that I have found tremendously valuable. I
am very grateful for everything that they have provided to me. My brothers, sisters and
the other family members and relatives who gave their unending love, financial and
moral support, encouragement, understanding, and inspiration.
Last but not the least, I also gratefully acknowledge the direct encouragement and
support provided by Ms. Ladyluck Katter.
Above all, the Almighty God for showering his unending blessings on me.
Sushil Kumar Sharma
vii
TABLE OF CONTENTS
Page
List of Tables ..................................................................................................... x
List of Figures ..................................................................................................... xi
List of Appendices ............................................................................................... xii
CHAPTER
1 Introduction .............................................................................. 1
1.1 Conceptual Framework………………………………... 6
1.2 Statement of the Problem…………………………….... 12
1.3 Objectives of the Study……………………………....... 14
1.4 Scope and Delimitation of the Study………………....... 14
1.5 Significance of the Study………………………............ 15
1.6 Definition of Terms……………………......................... 15
2 Study of Related Literature ..................................................... 20
2.1 Delivery of Video-Audio ............................................... 20
2.1.1 Video-Audio Streaming .....…………………….. 20
2.1.2 Congestion Control …....……………………...... 23
2.1.3 Transport Protocol ...………………………........ 23
2.1.4 Bandwidth ……………………………............... 23
2.1.5 Chroma-keying Technique ..…………………… 26
2.1.6 AMOS (Active MPEG-4 Object Segmentation System) …............................................................
26
2.2 Technology and Techniques Related to the Video and
Audio …......................................................................... 29
2.2.1 Streaming Media Technology ………………….. 29
2.2.2 Live Intranet Distance Learning System using MPEG-4 over RTP/RTSP …………………........ 30
2.2.3 Some Low Bit-rate Video Coding Technique ...... 31
viii
CHAPTER Page
2.2.3.1 Model-based Techniques for Low Bit-rate Video Technique .............................. 31
2.2.3.2 Low Bit-rate Speech Coding .................... 32
2.2.4 Codec ……………..…………............................. 32
2.2.4.1 Quick Time ………….....……..…........... 33
2.2.4.2 Real Audio Real Video ............................ 33
2.2.4.3 Window Media Technology ……………. 34
2.3 A Quick Glance at Audio-Video Software and Hardware used in Developing Streaming Media ....... 34
2.3.1 Capturing Video ………………………............... 34
2.3.1.1 IEEE 1394, Firewire iLink ………........... 35
2.3.1.2 Osprey – 500 …………………................ 35
2.3.1.3 Pinnacle Video Capture Card …….…….. 35
2.3.2 Editing and Encoding Tools …………………..... 36
2.3.2.1 Final Cut Pro 4 ...……………...…........... 36
2.3.2.2 Vegas Video 3.0 ...…………….…........... 36
2.3.2.3 Adobe Premiere 6.5 ..……….…...........… 36
2.3.2.4 MGI VideoWave 4.0 .…………………... 37
2.3.2.5 VideoMach 3.0 ......………...................... 38
2.3.2.6 Window Media Encoder 7.1 ………….... 39
2.3.2.7 Helix Producer Basic ............................... 40
3 Solution Methods and Techniques .......................................... 44
4 Presentation of Findings .......................................................... 54
4.1 Effective Delivery of Video and Audio ......................... 55
4.1.1 Packaging Video – audio and Presentation Material ................................................................ 59
4.2 Technologies and Techniques of Low bit-rate Video-Audio Encoding ……….……………............................ 60
4.3 Streaming Media Presentation Development ................. 69
ix
CHAPTER Page
4.3.1 Hardware and Software used in the Development Procedure ...................................... 69
4.3.2 Development Procedure for the Streaming Media Presentation ...………........................................... 69
4.3.2.1 Recording and Capturing the Video (A) .. 71
4.3.2.2 Splitting the Video and Audio Extraction (B) ...............................................................
72
4.3.2.3 Video Positioning Required Before Superimposing the Presentation Slide (C) 72
4.3.2.4 Compositing the Slides and the Video of the Speaker (D) ........................................ 74
4.3.2.4.1 Problems encountered when presentation slide and video file were treated separately ............. 77
4.3.2.5 Low Bit-rate Video-Audio Encoding (E) . 77
5 Summary, Conclusion and Recommendations ...................... 81
5.1 Summary of Findings ……………………………......... 81
5.1.1 Visual Outcome at Various Stages of the Content Development ……………………………............ 83
5.2 Conclusion ………………………………….................. 86
5.3 Recommendations …………………………….............. 88
Bibliography ..................................................................................................... 89
Curriculum Vitae ......................................................................................... 115
x
LIST OF TABLES
TABLE Page
1 Growth of Streaming Media Use …………………............................... 22
2 Recommended Streaming Rates ………………………….................... 25
3 System Requirements for Encoding ……………………...................... 39
4 (a) Listening-quality Scale …………………………............................. 47
(b) Listening-effort Scale …………………………............................... 47
(c) Loudness-preference Scale ............................................................... 47
5 Conversation Difficulty Scale ................................................................ 48
6 (a) Image Quality Scale ………………………..................................... 48
(b) Image Impairment Scale ………………………............................... 49
(c) Double Stimulus Continuous Quality Scale ……………................. 49
7 Listening-effort Scale …………………………..................................... 51
8 Image Impairment Scale ……………………….................................... 51
9 Comparative Data Between Original Non-streaming Media and Converted Streaming Media Formats ......………….............................. 56
10 Summary of Converted Streaming Media Formats ............................... 59
11 Test Results for 56 Kbps Modem Users ................................................ 66
12 Quality Assessment of Video - Audio and Bandwidth .…..................... 67
13 Summary Results of the Different Codecs and Software ...................... 68
xi
LIST OF FIGURES
FIGURE Page
1 An Architecture for Audio - Video Streaming .....……………........... 7
2 Conceptual Paradigm ….….…………………………………............. 9
3 Presentation Data Must Fit with Player’s Bandwidth ……….............. 25
4 (a) Architecture of the AMOS System ...………………...................... 28
(b) Region in Inside the Object and those Outside the Object are Both Tracked Overtime ….......…………………………................ 28
5 Model-based Coding System …………………………....................... 31
6 The Flow Chart of Streaming Media Presentation Development Procedure ......…................................................................................... 70
7 Sample Video with Plain Background ………………………............. 71
8 Positioning the Video in Adobe Premier 6.5 ………........................... 73
9 Output of Repositioning the Video ...................................................... 73
10 Superimposing PowerPoint Slide on the Speaker’s Video……........... 75
11 Superimpose of the PowerPoint Slide in the Back of the Video ......... 76
12 Encoding the Video and Audio in Window Media Encoder ............... 78
13 After Encoding the Video File in .wmv Format .................................. 79
14 Video Frame in the Original Digital Movie …….…………..........…. 83
15 Output of Repositioning the Video ..…………………………............ 84
16 Superimpose of the PowerPoint Slide in the Back of the Video ......... 84
17 After Encoding the Video file in .wmv Format ………………........... 85
xii
LIST OF APPENDICES
APPENDICES
A Sample Video Frame Encoded in Window Media Encoder ...... 96
B Sample Video Frame Encoded in Vegas Video ......................... 97
C Sample Video Frame Encoded in Helix Producer Basic and Vegas Video ............................................................................... 98
D Sample Video Frame Encoded in Quick Time Pro and Vegas Video .......................................................................................... 99
E Methodology For the Subjective Assessment of the Quality of Television Pictures .....................................................................
100
F Methods for Subjective Determination of Transmission Quality ........................................................................................ 110
THESIS ABSTRACT
1. Title: APPLICATION OF CHROMA KEY AND LOW BIT-RATE ENCODING TECHNIQUE IN INTERNET VIDEO STREAMING
Total Number of Pages: 127 Text Number of Pages: 88
2. Author: SUSHIL KUMAR SHARMA 3. Type of Document: Thesis 4. Type of Publication : Unpublished 5. Host / Accrediting Institution Saint Louis University (Private) Bonifacio Street, Baguio City CHED-CAR 6. Sponsor (for funded research): not applicable 7. Keyword: LOW BIT-RATE ENCODING, CHROMA KEY, VIDEO-AUDIO
STREAMING 8. Abstract 8.1 Summary : The research presented an application of chroma-key and low bit-rate encoding techniques in Internet video streaming over 56 kbps bandwidth connection. To deliver the video-audio streaming in low bandwidth communication channels, low bit rate encoding technique is effective for the normal dial up modem users. The chroma-key technique is used to develop the presentation material and video-audio in single frame. The design of study is experimental. The data gathering tools are 5 minutes video-audio clip; encoding software and low bit-rate codecs. 8.2 Findings : The study provided an application of chroma-key and low bit-rate encoding techniques in Internet video streaming. It is possible to produce a streaming media using low bit-rate encoding technique delivering video-audio effectively in normal dial-up connection modem users. The chroma-key techniques are helpful to packaging video-audio and presentation slides in single frame. There are three major types of streaming media formats: (a) Window Media Video Audio (.wmv) (b) Real Video Audio (.rm) and (c) Quick Time (.mov). Window Media Encoder, Helix Producer, Vegas Video and Quick Time are also the major encoding softwares used to develop streaming media.
8.3. Conclusions : The low bit rate encoding and the chroma-key techniques can be used to effectively deliver the video-audio and presentation material. The chroma-key tool can be used to package video-audio and presentation material in a single frame to deliver in a web server. Based on comparison of video-audio codecs and software, the combination of MPEG-4, ACELP.net and Window Media Encoder is the best software, and Real Video 9 and Real Audio 8 and Helix Producer Basic are the second best. 8.4 Recommendations : Some of the limitations in this study like unsmooth edges of the speaker's video and reduced audio quality can be further improved by taking the video footage in a well lit room and with good sound equipment as found in studios. It is highly recommended for other researchers to study the feasibility of using these type of contents. The laboratory testing for video-audio quality is recommended. To get better quality of presentation material it is highly recommended to use real time mixing technology with audio-video and presentation material.
CHAPTER 1
Introduction
Internet technology is changing at a rapid pace. The faster the technology
changes, the more the expectation of people from the Internet. Once, the users were
satisfied with only the text and still images on their web pages, but now they want video
at the faster speed. They expect to see television quality and are unsatisfied when they
see anything less. Because of bandwidth issues, we are still several years away from that
reality, but Internet technology is changing on a daily basis. Internet streaming video is
one way to deliver video over the Internet. With streaming video, many daily
organizational tasks are simpler and cheaper. Designers can broadcast lectures, make
announcements, deliver seminars and show users how certain things work. Users can
view it live, instantaneously, quenching some of their thirst for fast, high quality video. 1
Many large organizations are using the technology to broadcast annual meetings
so that remote offices are able to view the meeting live, or for those who have prior
commitments they are able to video the meeting and view at a later date. Not only are
meetings being viewed in the corporate setting, Internet is being used for training
purposes too. Internet streaming video allows current training applications to be
administered to a broad, geographically dispersed audience simultaneously, cutting the
costs associated with training. 2
Another large user of streaming video is distance learning. This application is
spreading quickly throughout the educational system. Internet classes are administered
2
using this technology. Many of the large universities are broadcasting lectures and
archiving them so that students who miss a class are able to view the archives on their
convenience at their own computer. Micke O, Donoghu et al3 concluded that the use of
the PowerPoint Material indexed to the presentation created a dynamic presentation
environment in which each media (text, graphics, audio and video) was designed to
compliment the others. Many users commented on the innovative use of the browser
within the presentation which not only provided a sense of personal involvement, but
also brought together other resources related to the discussion.
The chroma-key is a well-known video mixing technique, practiced over many
years in television production, for insertion of foreground action shot at a different
location into a selected background scene. Initially the main reason for using this
technique has been the requirement of artistic directors to film people in places where this
is hard to achieve. Later, the economical advantages also encouraged film people to use
the chroma-key. The next step has been the usage of synthetically generated
backgrounds, firstly hand-constructed and painted then generated by computers, to put
actors into locations which do not exist in reality. The chroma-key is also the basis for
performing the mixing in a virtual studio. However, one essential requirement for a
virtual studio construction is the ability to combine foreground and background images in
a way that actors from the foreground image could be covered by set components from
the background image.
Other applications include those associated with websites. Many web developers
are incorporating streaming video-audio into websites to capture the attention and time of
visitors to their site.
3
Until recently, audios and videos on the web were primarily a download-and-play
technology. We had to first download an entire media file before it could play. It was
like pouring milk into a glass first and then drinking it. But because media files are
usually very large and take a long time to download; the only content found on the web
are clips lasting 30 seconds or even shorter. Even these files take 20 minutes or longer to
download.4 It has become evident that the future of digital media on demand is bright.
Educators are presented with new possibilities but faced also with new challenges.
However, as the bandwidth becomes less problematic and the compression technology
more efficient, it is clear that streaming media will become a significant delivery mode
for video-audio instructions as well as presentation materials.
Traditional applications such as interactive terminals (telnet), bulk file transfers
(email, FTP) and the World Wide Web are becoming an attractive medium for a broader
spectrum of applications. The applications that rely on the real-time delivery of data,
such as video-conferencing tools, Internet telephony, streaming audio and video players
are gaining prominence in the Internet application space. In particular, the streaming
media application has considerable potential to change the way people watch video.
While most people today think of sitting in front of a television to watch a movie, the
ability to deliver high-quality streaming video would allow the Internet to compete with
traditional modes of video content distribution.
Using the Internet as a medium for transmission of real-time interactive video is a
challenging problem. Although recent efforts have been made and some progress has
been achieved in terms of streaming media delivery, today’s solutions are proprietary,
inflexible, and do not provide the user with a pleasant viewing experience.5 In general,
4
current streaming video applications deliver low quality pictures and require large
amount of buffering. Therefore, they neither allow high user interactivity nor respond
well to the changing conditions on the Internet. Due to this problem, end users at remote
locations cannot view video presentation clearly on their desktops, especially for those
who have 56 kbps connection.
The importance of visual communication has increased tremendously in the last
decade. The progress in micro-electronics and computer technology together with the
creation of networks operating with various channel capacities is the basis of an
infrastructure for a new era of telecommunications. New applications are preparing a
revolution in every day’s life of our modern society. Emerging applications such as
videoconferencing, cellular videophones and multimedia will have a great impact on
nowadays professional life, education and entertainment. The digital representation of
the visual information in its canonic form leads to a huge amount of data. In order to
meet the requirements of the new applications, powerful image sequence compression
techniques are needed to drastically reduce the global bit rate.
A number of standards have been defined for the compression of visual
information. The JPEG still image compressor was proposed by the Joint Photographic
Expert Group and it is also a general-purpose image compression standard. The Moving
Picture Expert Group (MPEG) standards address the compression of video signals.
MPEG-1 operates at bit rates of about 1.5 Mbit/s and targets storage and transmission
over communication channels as the integrated-services digital network (ISDN) or the
local area network (LAN). MPEG-2 operates at bit rates around 10 Mbit/s and is
designed for the compression of higher resolution video signals. The recommendation
5
H.261 was proposed by the International Telegraph and Telephone Consultative
Committee. Based on the standard, videoconferencing at bit rates of 64kbit/s has become
feasible. This requires the capacity of one channel of the ISDN. In the near future,
modern visual communications applications will be possible for the general public. For
that objective, the transmission media must switch to Public Switched Telephone
Networks (PSTN) or mobile channels. The transmission of the video sequences at bit
rates as low as 9.6 kbit/s will be strongly needed. Efforts of defining new standards for
these applications are still in the beginning phase. Several expert groups have been
created to pursue this objective. The major ones are ISO/MPEG-4 and ITU-T/H.26P.6
An uncompressed video sequence for very low bit rates applications typically
requires a bit-stream of up to 10 Mbit/s. In order to achieve very low data rates
compression ratios of about 1000:1 are required to meet the needs of the large public.7
Today’s streaming applications are closed and proprietary; the emerging MPEG-4
standard has some acceptance and appears to be promising for open standard in Internet
video.8 It has been believed that MPEG-4 has the potential to make significant inroads as
the preferred streaming media format over the next few years because of its superior
compression, its ability to code individual objects in a video stream, and its increasing
interest in the industry. It also incorporates several feature-based schemes for very low
bit-rate coding: FDP and FAP sets (Facial Definition and Facial Animation Parameters).
It can, for example, be used for a real videophone communication requiring lower
bandwidth than ever imaginable traditional stream-coding techniques. Feature-based
media coding is not yet widely available and it will not also become popular except in the
limited number of application areas.
6
In the context of video-audio and presentation material delivery in the Internet at
the remote locations users can expect rich viewing environment plus pleasing
presentation on his desktop. With the streaming media, the quick playback can be
handled where the player can start playback without waiting for the whole media file to
be downloaded to the local storage. In order to resolve the issue of quick playback over
today’s narrow communication channels, the low bit-rate audio-video coding is a natural
choice.
The researcher focused on the presentation content development procedure for
rich viewing environment and delivery of that material through low bandwidth condition
while maintaining the reasonable quality of the slides on the client’s desktop at the
remote location. It is coupled with the streaming media technology in achieving the
presentation material and audio-video with the existing web server infrastructure.
1.1 Conceptual Framework
Recent advances in computing technology, compression technology, high
bandwidth storage devices, and high speed networks have made it feasible to provide real
time multimedia services over the Internet. Real time multimedia, as the name implies
has timing constraints. For example, audio and video data must be played out
continuously. If the data do not arrive in time, the play out process will pause which is
annoying to human ears and eyes.
Real time transport of live video or stored video is the predominant part of real-
time multimedia. There are two modes for transmission of stored video over the Internet,
namely, the download mode and streaming mode i.e., video streaming. In the download
7
mode, a user downloads the entire video file and then plays back the video file. However,
full file transfer in the download mode usually suffers long and perhaps unacceptable
transfer time. In contrast, in the streaming mode, the video content need not be
downloaded in full, but is being played out while parts of the content are being received
and decoded. Due to its real-time nature, video streaming typically has bandwidth delay
and loss requirements. However, the current best effort Internet does not offer any
quality of service (QoS) guarantees to streaming video over the Internet.9 In addition, it
is difficult for multicast to efficiently support multicast video while providing service
flexibility to meet a wide range of quality of service requirements from the users. Thus,
designing mechanisms and protocols for the Internet streaming video poses many
challenges.
Figure 1 shows an architecture for audio-video streaming. Raw video and audio
data are pre-compressed by video compression and audio compression algorithms and
Figure 1
An Architecture for Audio-Video Streaming
Streaming Server Client/Receiver
Storage Device Compressed
Video
Compressed Audio
Application-layerQoS Control
Transport Protocol
Video C
Audio C
Raw Video
Raw Audio
Video
Transp
Application-layer QoS Control
AudiMedia
Synchronization
Internet
8
then saved in storage devices. Upon the client's request, a streaming server retrieves
compressed video/audio data from storage devices and then the application-layer quality
of service control module adapts the video/audio bit-streams according to the network
status and quality of service requirements. After the adaptation, the transport protocols
pocketsize the compressed bit-streams and send the video/audio packets to the Internet.
Packets may be dropped or excessive delay may be experienced inside the Internet due to
congestion. To improve the quality of vide-audio transmission, continuous media
distribution service e.g., caching, is deployed in the Internet. Packets that are
successfully delivered to the receiver, first pass through the transport layers and then they
are processed by the application layer before being decoded at the video/audio decoder.
To achieve synchronization between video and audio presentation, media synchronization
mechanisms are required. From Fig. 1, the six areas (Video compression; Application-
layer QoS control; Continuous media distribution services; Streaming servers; Media
synchronization mechanisms and Protocols for streaming) can be seen closely related and
they are coherent constituents of the video streaming architecture.
Since the start of history of computer, our main communication medium with the
computer is the console (screen). The input and output of text and bitmaps are natively
incorporated in the computer system. It is well understood that anything that can be
displayed in the monitor can also be extracted from the screen and saved in the file.
Based from this idea, business presentation materials like presentation slides can be
captured from the monitor and saved in different file formats.
Traditional media (audio/video like VCD, VHS, BETAMAX movies) contents
are not optimized for delivery in the Internet because during that time, Internet was not so
9
Procedure in Developing Presentation
Figure 2
Conceptual Paradigm
Streaming Media Technology
(Using Low Bit Rate Encoding Technique and Chroma-Key Tool based on an acceptable data rate in a 56 kbps modem)
• Streaming media with reasonable quality of video-audio and presentation slide
• Technologies and Techniques of Streaming Media • Procedure of developing streaming media
Presentation Material
Raw Video and Audio
Non-Streaming Media
10
popular and those formats were developed solely for the specialized hardware devices in
mind. Now with the rapid growth in multimedia technology, those traditional formats
could be captured in digital format and reconverted to any format we wish.
Due to the increased popularity of the Internet for communication and business,
and due to the recent developments in Information Technology, the larger uncompressed
media files could be compressed to save the disk space required without compromising
on the quality of the media content at the same time; it could also be changed to
streaming format, which reduces waiting time in the client side.
High demand from the film industry in using computer as a tool to generate special
effects now have produced many technologies related to multimedia, including rich
graphical interface, sound and video. Now we also have professional tools in computer
for producing special effects from mixing, animation, sound effects etc. Among mixing
technologies, the chroma-key and the luma key are well known technologies for
superimposing the one graphical content over another.
The researcher aims to develop a procedure of applying the chroma-key and the
low bit-rate encoding technique in Internet video streaming. This conceptual framework
is presented diagrammatically in the conceptual paradigm as shown in figure 2.
The framework is based on an input-process-output model, where the non-
streaming media (composed of the raw video and audio, and the presentation slide) is the
input to the study. The process (which is determined in the study), is the procedure to
convert the non-streaming media to a streaming media, which is the output of the study.
Different low bit-rate encoding techniques as well as the chroma-key tool are used in the
conversion. These streaming media technologies served as the independent variables.
11
The resulting streaming media is evaluated and compared in terms of video and audio
quality as well as the acceptability of the data rate in a 56 kbps modem. These
measurements are the dependent variables in the study.
In the video-audio and presentation material delivery over the existing web infrastructure,
remote clients expects rich viewing environment plus pleasing presentation. The
streaming media technology using low bit-rate encoding technique resolves the issue of
quick playback, where the player starts playing immediately without waiting for the
whole media file to be downloaded on the client side and the chroma-key tool will be the
mixing technology superimposing the video over the presentation material. In this study,
the researcher focused on the delivery of video-audio and presentation material in single
frame using low bit rate encoding technique and chroma-key tool for the normal dial-up
connection of 56 kbps.
The simplistic approach to the low bit-rate video coding degrades the quality of
the presentation video, especially the slides being explained by the speaker. The
researcher focused on the presentation content development procedure for rich viewing
environment and delivery of that material through normal dial-up connection while
maintaining the reasonable quality of the slides on the client’s desktop at the remote
location. Thus, a client must have minimum hardware, software and network
transmission requirements as controlled variables needed to play the video and audio to
client’s desktop. These variables exert influence on the aspects of delivery of video-
audio on the network. The client needs at least:
• 56 kbps connection internet line,
• A Pentium 166MHZ processor equivalent,
12
• Minimum 32 MB RAM,
• A 16-bit Sound Card (any type),
• Speakers,
• Color monitor with color depth of 16 bit,
• A 256-color video display card,
• Screen resolution of at least 640 x 480 pixels or higher,
• Any streaming video-audio player which supports file format and codec version.
1.2 Statement of the Problem
Today’s video streaming technology provides various means to preserve the
content in the archives and reuse them whenever needed. But most of the streaming
technology exploited so far are mostly for entertainment media type and designed to be
used on the high bandwidth connections. From this scenario, it is clear that most of the
people with regular low bandwidth connection (usually up to 56kbps) cannot enjoy this
technology.
Observing the current economic condition of the people, especially in the
developing countries, it is hard to afford a personal computer and Internet connection for
personal use and maintain it. Even if a few can afford it, most of them have dial-up
connections of up to 56 kbps. Technologies like DSL and ISDN are mostly used by the
ISP’s (Internet Service Provider), Internet Cafés and other businesses only. It is not
feasible for a person to maintain this type of connection for personal use due to its high
cost. Even though businesses use high-speed connections, the connection speed is
13
usually slow to the end user (depends on varying condition) due to the fact that the single
connection is branched to its maximum limit.
Keeping this scenario in mind, if an end user tries to watch the streaming video
content intended for high-speed connection, the following anomalies would be
encountered:
1. Video playback will be jerky due to the speed limitations.
2. Sometimes the required video frame may not arrive on time and may go blank
during playback.
3. If the end user’s computer is below the standard (like memory requirements,
processor speed etc.) for video playback, the computer may take time to decode
the high-resolution video that will cause the rough and slow playback.
4. Sometimes all the anomalies mentioned could be encountered even if the
requirement is met because network congestion might occur due to the growing
number of connected users in the Internet.
From the points just mentioned, it is natural for someone to think of the solution
for this problem and experiment with the possible available technologies to at least
overcome these kinds of problems. We naturally ask:
1. How can video-audio and presentation be packaged into a single frame and be
delivered effectively on normal dial up connection?
2. What is the best technologies and techniques for developing streaming media that
use low bit rate encoding technique and chroma-key tool?
3. What is the procedure in developing a presentation using the streaming media
technology?
14
1.3 Objectives of the Study
This research aims to provide quality video-audio presentation material to the
clients via web infrastructure using available technology. Specifically, its main
objectives are as follows:
1. To develop the reasonable quality presentation that contains video and audio
together in a single frame through the streaming media technology.
2. To investigate and experiment on the low bit rate encoding and chroma-key
technologies and techniques for developing streaming media containing video-
audio and presentation.
3. To define a procedure in developing presentation using the streaming media
technology.
1.4 Scope and Delimitation of the Study
This study centered on the issue of uninterrupted playback while maintaining the
reasonable quality of the slides on the remote client’s desktop with video streaming which
refers to stored video for remote location users.
This research mainly focused on the normal dial-up Internet connection. It was
implemented on a Window platform due to its friendly interface and also due to the fact
that it is widely used everywhere in the world. This study was tested for research
purpose only. Furthermore, this study did not explain the technical part regarding
uploading video-audio and presentation material on the server side.
15
1.5 Significance of the Study There is no doubt that a revolution in technology-enhanced educational practices
is upon us. Institutions worldwide are leveraging new technologies in Internet audio and
video to maximize the impact of professors’ time, energy, creativity and intellect to reach
students in more efficient and effective ways.
This study is mainly beneficial to those professionals and students who are
involved in the distance education. A student at a remote location can view clearly the
classroom lecture video and presentation slide of the lecture in his/her desktop.
This research is also important to business firms for providing marketing for the
company because it will reduce the download time and deliver the reasonable quality of
presentation slide thus saving time and effort in the creation of streams publicly on a
website. Providing on demand information is a powerful marketing tool that can help
businesses communicate.
This study will also benefit the research activities about streaming media and
distance education. It may serve as an introductory piece for interested researcher of the
said areas.
1.6 Definition of Terms Terms used throughout the study are as follows:
ACELP: Algebraic-Code-Excited Linear Prediction (ACELP) technology is a
standard in low-bit rate speech compression within wireless and Internet applications.
AMOS: Active MPEG-4 Object Segmentation System (AMOS) which
effectively combines automatic region segmentation method with the active method for
16
defining and tracking video objects at a higher level.
Annoying: Not pleaseant video and text when watching.
Bit-rate: The rate at which the encoded bitstream is delivered from the storage
medium to the input of a decoder.
Bandwidth: Amount of data a given connection can pass in a given amount of time.
Blur: Video image became less clear or distinct. Some marks on video make it
unclear.
Buffering: The situation which occurs when a streaming media player is saving
small portions of a streaming media file to local storage for playback.
Capture: It is the process to digitize and record audio and video content from an
analog format.
Chroma-key: A video mixer-based electronic effect, in which a second image
source is substituted for a color (or range of shades within a color) within a video shot.
Compression: Process by which files are reduced in size to save space by the
removal of redundant or less important data. Compressed media files are then
decompressed on the user's end.
Codecs: Compress and Decompress audio and video files into streaming media
files by erasing redundant data.
Effective Delivery: Effective delivery means to deliver the video-audio and
presentation material in 56 kbps connection line, maintaining reasonable quality of
presentation slide in remote users.
Encoding: Compressing audio and video to turn it into streaming media format.
Frame Rate: Number of data which can be transmitted per second.
17
HTTP: The HyperText Transfer Protocol is an application level protocol designed
for distribution of hypertext and multimedia documents over the World Wide Web.
Imperceptible: The degradation of quality of video is less noticeable.
Jerky: Not in smooth motion or interruption occurs while playing video-audio.
Jitter: The variable network latency, normally caused by the queuing of packets
at each router between source and destination.
Low bit-rate video coding: The video compression technique which outputs a
coded video stream of not greater than 64 kbps bit-rate.
MPEG: Moving Picture Expert Group(MPEG) is a standard used for coding and
compressing video.
Multimedia Presentation: The integrated presentation of text, audio and video in
single frame.
Network Congestion: When network equipment including switches and routers,
might experience more traffic than it can handle, the performance degrades. This
situation is called network congestion.
Non-Streaming Media: Video-audio can not be played on desktop without
downloading entire file over the internet or other networks.
QoS: Quality of Service (QoS) provides guarantee on the ability of a network to
deliver predictable results. Elements of network performance within the scope of QoS
often include availability (uptime), bandwidth, latency, and error.
Reasonable Quality: It means users can read text and watch the video and listen
to the audio without any extra effort.
18
Real-Time Delivery of Data: If the actual rate of data delivery matches the rate of
data consumption, then it is called real-time delivery of data.
RTSP: The Real Time Streaming Protocol or RTSP, is an application level
protocol for control over the delivery of data with real-time properties. RTSP provides an
extensible framework to enable controlled, on-demand delivery of real-time data, such as
video and audio from live data and stored clips. The protocol is intended to control
multiple data delivery sessions, provide a means for choosing delivery channels such as
UDP (User Datagram Protocol), multicast UDP and TCP (Transmission Control
Protocol), and provide a means for choosing delivery mechanism based upon RTP
(Realtime Transport Protocol).
Superimposing: It is the process of combining two images to yield a resulting,
and enriched image.
Streaming Media: Process of sending media over the internet or other network,
allowing playback on the desktop as the video is received, rather than requiring that the
entire file be downloded prior to playback.
Streaming Video: It is a seqence of moving images that are sent in compressed
form over the Internet and displayed by the viewer as they arrive. It is with sound.
TCP: Transport Control Protocol (TCP) is a reliable, connection oriented, end-to-
end protocol. Break up data into chunks that never exceed 64K bytes and sends each as
separate IP datagram.
UDP: User Datagram Protocol (UDP). It is an unreliable, connectionless
transport protocol. There is no ordering of packets, no retransmission of lost or damaged
packets, and no splitting of data into packets.
19
NOTES
1“A review of Video Streaming over the Internet”. SuperNOVA Project. DSTC Teaching Report.
<http://archieve.dstc.edu.au/RDUstaff/Jane-hunter/videostreaming.html>,10 August 1997. Accessed in August, 2002.
2Cunningham, David and Francis, Neil. “An introduction to Streaming Video”. Cultivate
Interactive, Issue 4, <http://www.cultivate-int.org/issue4/video/> 7May, 2001. Accessed in August,
2002. 3Micke O,Donoghu, and et al., The Role of Streaming Media in the delivery of distance
learning, Pre-Submission Draft – Lancaster University, August 2000. 4Streaming Methods. Web Server Vs. Streaming Media Server. <http://www.microsoft.com/Windows/windowsmedia/compare/webservvstreamse
rv.asp> Last updated Thrusday, March 21, 2001. Accessed in August, 2002. 5Microsoft Windows Media Player <http://www.microsoft.com/windows/mediaplayer> June 2002. Accessed in July,
2002. 6O. Egger, and et al., Very Low Bit Rate Coding of Visual Information, Swiss Federal
Institute of Technology at Lausanne, Switzerland. 1996. 7T. Ebrahimi, and et. al., New Trends in Very Low Bitrate Video Coding, Proceedings of
the IEEE, July 1995. 8Divx Networks. <http://divxnetworks.com/,2001> (2001). Accessed in September, 2002. 9Dapeng Wu, and et al., Streaming Video over the Internet: Approaches and Directions,
IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, No. 1. February 2001.
10Micke O,Donoghu, Michael Barber and Steve Childs, op. cit.
CHAPTER 2
Study of Related Literature
Video-audio streaming technology is important to individuals and organizations
especially remote users. It is an emerging technology to deliver video-audio and
presentation material in the Internet. This chapter overviews the materials related to the
streaming media, chroma-key and low bit-rate encoding techniques. Section 2.1 presents
all about delivery of audio-video, technologies and techniques related to the audio-video
are discussed in Section 2.2 and Section 2.3 discusses a quick glance at audio-video
software and hardware which are used in developing streaming media.
2.1 Delivery of Video-Audio
2.1.1 Video-Audio Streaming. An emerging, less expensive option to full
broadcast video is “streaming” sound and video over the Internet. The video audio is
streamed through the computer and displayed on the screen without needing to be saved
onto its hard drive, as most hard drives are not large enough to store a whole video
programme. Streaming video-audio can also be broadcast simultaneously around the
world (called streaming or web casting), or achieved and accessed as required.
Video-audio streaming technology becomes of interest and importance to the
educational community. Such interests can be explained by widespread increases in
bandwidth and computation, but interest may also be fueled by institutional needs to
21
increase income by widening participation through the use of information and
communication technology, or to realize lifelong learning policies, or perhaps to develop
cost-effective flexible resources to support off-campus learning.
By using streamed media, it is possible for a user with a suitable computer, web
browser and media player to receive live and recorded video and audio materials without
the need to download large video and audio files or have an ISDN connection. Microsoft
have made use of this technology to support users and developers through a number of
live information and Q&A sessions; the BBC and ITN currently make use of it on their
web sites to show news clips. Television has been used to support educational activities
since 1950s and 1960s, but the cost of producing television programmes has been
prohibitive to many educational establishments1. Satellite broadcasting has been used to
support a range of learning activities across the world, and though transmission uplink
and equipment reception costs have continued to decrease the cost of producing
programmes and materials has remained a barrier to widespread use2. Using broadcast
materials within a networked learning framework has been reported to motivate and
attract adult learner, though there are issues surrounding pedagogic design and
embedding educational practices which need careful consideration.3
Streaming media is a new technology that has entered the distance learning arena.
The technical requirements and issues of production have over shadowed the important
issues related to change in the educational community. How we use it will determine its
future as available delivery method for distance learner.
“Integrating a pedagogical framework based on well-established learning theories
through the design of a Web-delivered classroom encounter is key to establishing many
22
potential roles of streaming media as a delivery method”4. We need to develop
“appropriate instruction technology frameworks” for streaming media which will lay the
“foundation for educational change and deep learning”.5
In addition, Sircar6 argues that the streaming media via the Web can remove the
differences between learning onsite and learning online and that the power of the stream
to deliver anytime, anywhere learning makes it economically viable. The traditional
lecture-test-homework paradigm does not exist when instructional technologies are used
to enhance problem-solving skills, collaboration, and interaction. Video technologies can
be examined within the framework of learner centered principles. Streaming allows us to
restructure the delivery of content by creating small units of instruction that can reflect
best practices.
Streaming media is pervasive on the internet now and is continuing to grow
rapidly. Most streaming media systems have adopted the model of broadcast. The use of
streaming media rose by 17% in 1999 compared to 1998 and this figure increased by
30% in 2000 which is shown in Table 1.
Table 1
Growth of Streaming Media Use 6
Year Growth Rate
In 1998 9% enterprises
In 1999 17% of enterprises
In 2000 30% of enterprises
Until 2004 Streaming Media Services will grow 20 fold to $2.5 billion.
23
2.1.2 Congestion Control. Loss and excessive delay have devastating effect on
video presentation quality and they are usually caused by network congestion. Thus,
congestion control mechanisms at end systems are necessary to help reduce packet loss
and delay.
Congestion control takes the form of rate control7. Rate control is a technique
used to determine the sending rate of video traffic based on the estimated available
bandwidth in the network. There are three kinds of rate control: source based, receive
based and hybrid. The source based rate control is suitable for unicast; the receiver based
and hybrid rate controls are suitable for multicast video8.
2.1.3 Transport Protocol9. Streaming audio and video packets can be delivered
using several transport protocols, each with some advantages and disadvantages.
UDP is the best choice in most cases, because of its lack of retransmissions and
data-rate management. This is ideal for transmitting real-time audio and video data,
which can tolerate some lost packets and need a steady stream of data. Most streaming
servers and proxies implement intelligent retransmission schemes on top of UDP, so that
only lost packets that can be sent to the client in time to get played are retransmitted.
TCP provides an adequate, though not necessarily efficient, protocol for
delivering streaming media content. Its slow-start and its automatic retransmission of
lost packets add unnecessary overhead without improving quality. However, TCP traffic
is much more likely to pass through a firewall than UDP.
2.1.4 Bandwidth. Bandwidth is a big issue and is being addressed by Internet
service providers who are offering distributed servers so that users get local performance.
24
“The services point web traffic to the network hub closest to the user’s location, thereby
reducing the number of router hops, a packet must make and circumventing the already
jammed national and international Internet backbones”10.
Most users do not have adequate bandwidth to receive streaming video at an
acceptable quality, and won’t have it until around 200311.
When the requested video does not stream quickly enough, the presentation is not
smooth. Those connecting at less than T1 speeds will see “choppy, ‘freeze frame’”
pictures.12
The issue of bandwidth is an important one. In the most basic sense, bandwidth
can be defined as the amount of information that can be moved at one time. A good
analogy is that of water passing through a funnel. If the funnel has a wide opening at the
bottom a lot of water can pass through at once. If we fill the funnel at a rate faster than it
can pour out the bottom, then we have exceeded the available bandwidth. If the
bandwidth is limited then the transmission of data can be delayed. This leads to long
waits for the client as information is downloaded.
In the case of streaming video or audio this becomes a crucial point because the
player is interpreting the data stream as it is received. If the information is delayed, the
playback will either skip over the lost data packet or wait for its buffer to fill with enough
packets to continue. Therefore, we have to ensure that the bit rates of encoded files do
not exceed the target audience’s connection speed. Web users with 56 kbps modems, for
example can view only those presentations that stream less than 56 kb of data per second.
Presentation that stream more than that per second may stall because the data cannot get
over the modem fast enough to keep the clip flowing as shown in figure 3.
25
Figure 3
Presentation Data Must Fit with Player’s Bandwidth13
Streaming presentations should never consume all of audience's connection
bandwidth. They must always leave bandwidth for network overhead, error correction,
Table 2
Recommended Streaming Rates
Target Audience Maximum Streaming Rate 14.4 Kbps modem 10 Kbps
28.8 Kbps modem 20 Kbps
56 Kbps modem 34 Kbps
64 Kbps ISDN 45 Kbps
112 Kbps dual ISDN 80 Kbps
Corporate LAN 150 Kbps
Server Player
56 Kbps of Data Over 28.8 Kbps Modem
28.8 Kbps Modem
56 Kbps Modem
Stalled Presentation
26
resending lost data, and so on. Otherwise, they may require frequent re-buffering. Table
2 recommends maximum streaming speeds for common network connections. To reach
56 Kbps modems, for example, a presentation should stream no more than 34 Kb of data
per second.14
2.1.5 Chroma-Keying Technique. Keying means electronically cutting out
portions of a television picture and filling them in with another image.15 Chroma-key is a
special effect that uses color (chroma) for keying. Basically, the chroma-key process
uses a specific color, usually green or blue, for the background over which the keying
occurs. The green/blue becomes transparent during the keying and lets the picture of a
second source show through, without interfering with the foreground image.
Blue screen chroma-keying is a technique widely used in video production to
separate the objects in the foreground from a particular background whose color is
usually blue or green. The separated object can then be digitally composite on top of a
virtual background. Blue-screen chroma-keying and model-based foreground/background
segmentation are techniques to separate the foreground from the background.
This technique was implemented for incorporation such as United States Marine
Corps use it for grounds combat training and simulation applications. The technique
enables the insertion of real objects within the visual frame of a Head-Mounted Display
(HMD). It allows individuals and actual equipment such as maps, weapons, and other
items, to be effectively inserted into the visual display of a simulated environment.16
2.1.6 AMOS (Active MPEG-4 Object Segmentation System). To support
highly interactive functionalities in the future multimedia applications, the MPEG-4
27
standard proposed an object-based coding representation of audio-visual data. Compared
with MPEG-1 and MPEG-2, which provide efficient compression of conventional image
sequence, MPEG-4 provides great potential for content-based search of video data but
still there is challenging tasks for video object segmentation and content-based search
techniques.17
To solve these problems AMOS18 (Active MPEG-4 Object Segmentation
System), is an innovative method for combining low level automatic region segmentation
and tracking methods with an active method for defining and tracking video objects at a
higher level. It combines low level automatic region segmentation and tracking methods
with an active method for defining and tracking video objects at a higher level. The
architecture of the AMOS software is shown in figure 4(a) and 4(b).
The system allows users to identify a semantic object by using a mouse in the
starting frame of a video object. The object is defined by an outline polygon whose
vertices and edges are roughly along the desired object boundary. To tolerate user input
error, a snake algorithm19 is used to align the user- specified polygon to the actual object
boundary. The snake algorithm is based on the minimization of a specific energy function
associated with edge pixels. Users may also choose to skip the snake module if a
relatively accurate outline is already provided. Users can then start the object tracking
process by specifying a few parameters such as color threshold and motion threshold.
At any frame, users may stop the tracking process to refine the object boundary,
change the tracking parameters and resume the tracking process. The original footage
could be recorded in a controlled or uncontrolled manner.
Under controlled recording, the recording of the footage will be done in a well
28
prepared manner i.e., all the settings like background for the shooting are all fixed and the
speaker had practiced his/her lecture previously and is ready for the shooting.
Under uncontrolled recording, the recording of footage had to be done in real time
without any preparation while the speaker is presenting the presentation. In this situation,
there is no control over the background (color, light etc.) and the flow of the speech.
Starting frame
Succeeding Frames
Figure 4(a)
Architecture of the AMOS System.
Figure 4(b)
Region in Inside the Object (foreground regions) and those Outside the Object (background regions) are Both Tracked Overtime.
Foreground region
Background region
Video object
Region Segmentation
Region Tracking
Region Aggregation
Video Object
Motion Projection
Object Definition (User Input)
Homogeneous Region
29
Example would be the recording of the footage in a seminar, where the lecture material
will be given after the seminar. This kind of case is considered here because a situation
might arise that the same presentation material and the content need to be converted into
a presentation video for remote users due to different situations (like difficulty to find the
presentation for shooting the same thing again in a prepared manner, budget etc.).
If the video is recorded in uncontrolled manner and the background of the video is
removed, the AMOS software is used. It is an active object segmentation and tracking
system for general video.
The input video can be a PPM (Portable Pixel Map) sequence or MPEG motion
picture. The system generates a binary-mask in the PGM (Portable Gray Map) format for
the tracked object at each frame. To integrate two frames and to implement chroma-key
it needs to develop a small code in ANSI C. It takes the PPM files generated by the
preprocessing step and the PGM files generated by the active object segmentation step as
input. It treats a pixel in a PPM file as the background pixel, if the corresponding PGM
file contains the black pixel for the same position, and replaces the pixel with the chroma-
key color. Otherwise, the pixel in the PPM file is left unchanged.
2.2 Technology and Techniques Related to the Video and Audio
2.2.1 Streaming Media Technology. Real-time audio-video signals present
special needs for network transmission. Audio/video is time-critical and it needs Quality
of Service (QoS) transmission. The main problems in Internet real-time audio-video are
latency (network delay) and maintaining the bandwidth.
The current basic Internet architecture unfortunately allows absolutely no control
30
over either of these factors. On the contrary, over the Internet, latency may vary and it is
extremely difficult to estimate its value.20
Rajeev Sehgal,21 pointed that Microsoft’s and Real Networks’ technologies work
well only if there is bandwidth to spare in the user’s connection, such as on corporate
LAN’s which link a group of computers together within a building. But the technologies
typically fail over wide-area networks on the public Internet during peak usage periods.
Whatever the solution, the streaming industry badly needs to overcome the
buffering problem. Real Networks, Microsoft and Apple have made vast improvements
to their encoding and decoding technologies to deliver more content faster and in smaller
files. Another better solution to the buffering problem is to dynamically drop the client
bit rate and increase it again when network congestion eases up. Real Networks and
Windows Media (but not Quick Time) automatically detect the user’s Internet connection
speed and change the transmitted video quality to suit client’s connection.
2.2.2 Live Intranet Distance Learning System using MPEG-4 over
RTP/RTSP. A recent attempt uses MPEG-4 to realize distance education application.
MPEG-4 is a recent standard from ISO/IEC for the coding of natural and synthetic audio
visual data in the form of audio-visual objects that are arranged into an audio-visual scene
by means of a scene description. 22
The scenario involves the video and audio of a speaker’s room where the
overhead foils the speaker uses, are sent as separate data stream. The three streams are
encapsulated in MPEG-4 systems which add synchronization, among the streams, and to
a combined composition i.e. positioning and sizing into a single multimedia presentation.
This attempt uses Real-time Transport Protocol/Real-time Streaming Protocol
31
(RTP/RTSP) and Hyper Text Transfer Protocol (HTTP) as the transport mechanisms to
deliver the presentation material to the clients at the remote location.
2.2.3 Some Low Bit-rate Video Coding Technique
2.2.3.1 Model-based techniques for low bit-rate video technique23. Model based
video coding is a technique, which is suitable to achieve low bit-rate coding of a video
that contains the repeated/similar actions of a human body in some way and send only the
model information on the other side. Then, the human body image is reconstructed on
the other side using this information. It results in a fairly low bit-rate as the model data
and not the real image is sent over the network.
Various methods to create 2-D or 3-D models of human face and human body
have been studied. The general concept of the model-based image coding is shown in
figure 5. The scheme consists of three parts: the common knowledge, the encoder, and
the decoder. The encoder first extracts an initial fitting information for the wire-frame
Network Analysis data
Analysis
Synthesis
Image
Source
Model
Input image
Encoder
Decoder
Output image
Figure 5
Model-Based Coding System
32
model, which corresponds to the initial image, then estimates the global motion and
the local motion parameters. The decoder modifies the initial wire-frame to the specific
face model and synthesizes the output images using the global and local motion
parameters.
MPEG-4 standard supports very low bit-rate coding of virtual human animation,24
using model based approach, with bit-rate requirements as low as 1 kbps.
2.2.3.2 Low Bit-rate Speech Coding.25 Speech coding techniques can be
broadly divided into two classes: waveform coding and vocoders (voice coder)
technique. The waveform coders are able to produce high-quality speech at high bit-
rates; vocoders produce intelligible speech at much lower bit rates, but the level of speech
quality in terms of its naturalness and uniformity for different speakers is also much
lower.
For rates of 16 kbps and lower, high speech quality is achieved by using more
complex adaptive prediction, such as linear predictive coding (LPC) and pitch prediction
and by exploiting auditory marking and the underlying perceptual limitations of the ear.
Important examples of such coders are multi-pulse excitation, regular-pulse excitation,
and code-excitation linear prediction (CELP) coders. The CELP technique combines the
high quality potential of waveform coding with the compression efficiency of model-
based vocoders. At present, the CELP technique is the technology of choice for coding
speech at bit-rates of 16 kbps and lower.
2.2.4 Codec. The term Codec is short for coder-decoder or compression
decompressor. It is a software algorithm that transforms data from one format to
another.26 Different codecs use different algorithms to compress data. Each codec has its
33
advantages and disadvantages. In the streaming media process, codecs are used to reduce
the size of raw media files so they can be streamed across the Internet, and to convert the
files back into audio or video on the receiving end.
Video-audio codec are probably the single most important factor in determining
what makes a great video technology. Bandwidth on the web is still quite limited, and
trying to get high quality video to a consumer is like shoving an elephant.
There are several underlying technologies used by different video-audio for
windows codecs. Some commonly used codec claim to be more robust than the others in
streaming situations, because they were designed from the ground up as streaming codecs
rather than just data reduction schemes.
2.2.4.1 Quick Time. Although there are dozens of different codecs available in
Quick Time Sorenson Video is the Web champ. It’s the most flexible codec around,
providing competitive quality over data rates ranging from modems to CD-ROMs. It is
specially suitable for videoconferencing. Quick Time comes with the basic version of the
Sorenson encoder.
With the release of Quick Time version 4, Apple finally offered a true streaming
solution that included support for a number of codecs:
1. Video: H.261 and H.263; Radius cinepak; Sorenson Video; MPEG-4; Vp3
files. H.261 and H2.63 is a standard video conferencing codec.
2. Audio: Qdesign music codec; QalComm PureVocie Codec; MP3, IMA 4:1.
2.2.4.2 Real Audio Real Video. Real Networks only has one modern Video-
audio codec – the Real G2. Based on videoconferencing technology from Intel, it
provides high quality and a very fast encoder for high communication channels. This
34
video codec supports features such as edge artifact filtering and motion compensation.27
Real Video uses scalable video technology, which means that the performance will be
optimized for the speed and bandwidth of each computer.
The Real System natively supports a number of codes:
1. Video: Real Video 8, Real Video G2 and Real Video 9
2. Audio: Real Audio 8.0; Real AudioG2; ACELP.net voice codec; Real Audio
1.0, 2.0 and 3.0 legacy codecs.
By providing dramatically improved compression over previous generation
technologies, RealVideo 9 reduces bandwidth costs while enabling high-quality, rich
media experiences at any bit rate and on any device. Real Video 9 is improved by 30%
over RealVideo 8 and 50% improved over RealVideo G2 and same quality as MPEG-4.28
2.2.4.3 Window Media Technology. There is important video codec in Window
Media: MPEG-4. It’s a great codec, providing good quality and performance over a wide
range of data rates. It’s a fast compressor, but it doesn’t offer two pass or VBR (Variable
Bit Rate). The following are native encoding support:
1. Video: Window Media Video8, Window Media Video7, Microsoft MPEG-4.
2. Audio: Window Media Audio V7 and the ACELP.net voice codec.
2.3 A Quick Glance at Audio-Video Software and Hardware used in Developing Streaming Media 2.3.1 Capturing Video. The process of transferring the video content from a
camera or video tape to the computer is called digitizing or capturing. This process
involves playing back the video content while recording it into the computer by using
35
either a dedicated capture utility or via a video editing platform. To transfer the video
content from a camera or video tape to the computer, we need to run some software on
the PC which will read in the video data from the analogue capture card and place it in a
file on PC’s hard disk
There are many capture programs available for capturing video from an analogue
capture card and sound from a PC’s sound card.
2.3.1.1 IEEE 1394, Firewire iLink. This card records video in digital format on
tape and it has i.link (IEEE 1394). This type of output offers a high data transfer rate-up
to 400 Mbps which is necessary to transfer the large volume of data required for full-
motion video.29 The information transfer is digital and does not rely on a real time
conversion process; the transfer is lossless. We can automatically get full frame rate and
full screen size, and Firewire is more than fast enough to handle the data rate.
2.3.1.2 Osprey – 500. This professional streaming capture card provides
unparalleled quality through end-to-end digital encoding and advanced preprocessing
features. With these new features, the Osprey-500 family provides the best video quality
possible for streaming audio and videos.30
2.3.1.3 Pinnacle Video Capture Card. This card turns PC into a TV and a digital
video recorder for a new way of recording. It can convert in high quality MPEG1 or
MPEG2 format with compression. We need following minimum system requirement for
the installation.31
• Pentium II 450 or Celeron 600 or equivalent
• PC with 128 MB of RAM
• Direct X 8 or higher compatible graphics board and sound card
36
• One available PCI slot, PCI 2.1 compliant
• CD-ROM drive, mouse
• Windows98 (FE, SE)/ME/2000/XP
2.3.2 Editing and Encoding Tools. After capturing video clips, we can edit
them by using video editing software application. To edit the video-audio the computer
must be equipped with necessary hardware and software. When capturing the video for
streaming it is important to maintain high quality video. Video files will likely be very
large. Capturing, digitizing and editing video files for streaming is a much more
technical process than working with audio only.32
There are several software for capturing, digitizing and editing video-audio files.
Some popular software are as follows:
2.3.2.1 Final Cut Pro 4. It has sophisticated editing, compositing, effects and
audio tools that allow professional editor to need demanding post production deadlines
while maintaining their creativity.33 It is a powerful solution for creating high-quality
programming in a broad range of formats, frame rates and resolutions.
2.3.2.2 Vegas Video 3.0. It allows to capture, record, mix, edit composite, add
titles and effects, and manage media with control and higher quality. It sets a new
standard for professional multimedia production: apply high-quality transition, filters
and text animations; create sophisticated composites, key frame track motion and
pan/crop, all with unlimited tracks and unsurpassed flexibility. It also provides integrated
tool and high quality output options for streaming media technology.34
2.3.2.3 Adobe Premiere 6.5. Adobe premiere is a professional digital video
editing tool offering unmatched hardware support and real-time feedback. Adobe
37
Premier handles most demanding projects. It creates efficiently broadcast quality video
productions using extensive software real-time previewing features, including real time
titles, transitions and effects.
It works with the widest range of video hardware, like the latest digital video
decks and camcorders to third-party capture cards and real-time hardware. It supports
latest operating system, including Window XP and Mac OS X. It can deliver virtually any
medium. We can use the new Adobe MPEG encoder to create MPEG 2 files for DVD,
and Export MPEG 1 files and other leading formats for delivery to VCD, SVCD, CD-
ROM, streaming media for the web, and DV or analog tape.
Adobe premiere provides 15 keys (methods for creating transparency) that can
apply to a clip to create transparency in many different ways.35 We can use color-based
keys for superimposing, brightness keys for adding texture or special effects, alpha
channel keys for clips or images already containing an alpha channel, and matte keys for
adding traveling mattes or superimposes.
It supports the chroma-key and is used to select a color range of color in the clip
to be transparent. We can use this key when we have shot a scene against a screen that
contains a range of one color, such as shadowy blue screen. Moreover, this software
fulfill the requirements, of this study, namely chroma-key based transparency, reasonably
good audio-video synchronization and scalability. It creates efficient broadcast quality
video productions using extensive software real-time previewing features, including, real
time titles, transitions, and effects.36
2.3.2.4 MGI VideoWave 4.0. MGI VideoWave has been setting the standard for
years for PC video authors, helping novice and advanced users create superb video
38
presentations, complete with titles, special effects, transitions, audio mixing, video
mixing and overlay, and more. MGI VideoWave 4 has a bold new interface that is
designed to take advantage of powerful new features while remaining fast and intuitive.
MGI VideoWave 4 allows to use and combine both formats seamlessly, utilizing
existing equipment while being ready for the latest generation of hardware. Because the
data rate of raw video is quite high, the software used with the analog capture or IEEE
1394 card compresses the video as it is saved to disk to reduce the file size. Video is
captured to disk at a fixed compression ratio.37
Here is a partial list of the many features found in MGI VideoWave 4:
• Motion video, image, and audio capture from analog and DV sources
• Text, video mixing, transitions, special effects
• Real-time video preview of live video feed and capture
• Produce to AVI, MPEG-1, MPEG-2, DV, Real video, or WMV (ASF files)
• Smart DV for faster production to DV files
• Hardware accelerated production to Mpeg-2
• Output video directly to a connected VCR or DV device
2.3.2.5 VideoMach 3.0. VideoMach is a powerful audio/video builder and
converter. Use it to construct video clips from still images, enhance recorded material or
convert video, audio and image files between many supported formats. VideoMach is the
successor of Fast Movie Processor. With VideoMach we can construct AVI, MPEG,
FLIC and HAV clips, join or break apart media clips, including image sequences, extract
images from videos, extract audio tracks from movies, resize movies to widescreen (16:9)
or any other aspect ratio without deforming video content.38
39
2.3.2.6 Window Media Encoder 7.1. Microsoft Window Media Encoder has
enhanced the latest audio and video compression technologies like the Microsoft
windows media audio 8 codec and Microsoft windows media video 8 codec for real time
capture and streaming applications. These codes, or compressor/decompressors, deliver
incomparable audio and video quality at lower bit-rates than were possible with various
codec versions.39
Compressor/decompressors, or codecs, are the hardware or software used to
compress and decompress audio or video data. For example, windows media audio and
windows media video are software codecs used to decrease the bit rate of digital media
files so they can be delivered efficiently over a network. Window media encoder uses
their codecs to compress the data for streaming while window media player decompresses
Table 3
System Requirements for Encoding40
Encoding Task
Minimum Configuration
Recommended Configuration
Convert existing .wav, .avi, .mpg, and .mp3 files to windows media format
200 MHZ processor such as an Intel Pentium with MMX Microsoft Windows 98 second Edition 32 MB of RAM
500 MHZ processor or higher such as a Pentium III Windows 2000 128 MB RAM or more
Real-time capture and broadcast of audio and video files for dial-up modem and mid-bandwidth audiences using the windows media 7 codecs
Single stream and multiple bit rate content for 28.8 kbps and 56 kbps modems: 300 MHZ processor, such as a Pentium II or AMD Windows 98 Second Edition 32 MB of RAM Supported audio and Video Capture devices
Single stream and multiple bit-rate content for 100 kbps through 500 kbps 450 MHZ processor or higher, such as a Pentium III Windows 2000 250 MB RAM Supported audio and Video Capture devices
40
the data for playback. Table 3 shows minimum configurations for various encoding
scenarios.
The window media audio 8 and window media video 8 codecs offer excellent
compression quality and efficiency. The windows media audio 8 codec deliver a .wma
file of the same quality as an .mp3 file, but at nearly one-third the size. But Windows
Media Encoder 7.1 is still the best choice for encoding and streaming live content.41
While the quality of encoded video depends on the content being encoded, windows
media video 8 can deliver near VHS quality at bit rates ranging from 250 kilobits per second
to 450 kbps, and near DVD quality at 500 kbps to several megabits per second. Windows
media video 8 codecs is appropriate for both streaming and downloading digital media files.
2.3.2.7 Helix Producer Basic. Helix Producer from RealNetworks is the next
generation digital media production tool for broadcast streaming and download. It
provides robust, reliable, and fault-tolerant encoding to convert audio and video into
RealMedia format.42 Using RealMedia Events, Helix Producer can also be used to create
synchronized multimedia presentations for playback within the RealOne Player.
Helix Producer is one of the key elements of the RealNetworks system based on
the Helix platform, an integrated media-delivery system designed for rich media delivery
over the Internet and corporate intranets.
41
NOTES
1Bates, A.W., Technology, Open Learning, and Distance Education, Routledge 1995. 2O’ Donoghue, M. and et.al, Interactivity beyond Belief, Interfaces, Vol 8, Paris VIII
universite, 1995. 3Banks, S., and Mc Connell, D., On-line learning using broadcast materials: Case study
of the BBC On-line Learning Pilot Programme in Women’s Health, proceedings of the seoncd international networked learning coference pp 374-380, Lancaster University (2000).
4Sircar, J., Streaming Media Technology: Laying the Foundation for Education Change,
Syllabus, 14 (3), 2000 p. 56. 5ibid p. 57 6<http://www.zdnet.com> updated may 31, 2000. Accessed in July 2002. 7Dapeng Wu, and et al., Transporting Real-time Video Over the Internet: Challenges and
Approaches, Proceeding of the IEEE, Vol. 88, no. 12, Dec. 2000. 8Dapeng Wu and et al., Streaming video Over the Internet: Approaches and Directions,
IEEE Transaction on Circuits and Systems for Video Technology, Vol. 11. No. 1 February 2001.
9Dario Luparello and et al., “Streaming Media Traffic: An Empirical Study,” <www.bell-
labs.com/user/sanjoy/streaming-media-edgix.doc> (2000). Accessed in September, 2002.
10Radosevich, Lynda and Fitzoff, Emily, “Damming The Stream,”
<http:www.britannica.com/bcom/magazine/article/0,5744,212643,00.html> March 2, 1998. Accessed in August, 2002.
11Nielsen, Jakob, “Video and Streaming Media,”
<http:www.useit.com/alertbox/990808.html> August 8, 1999. Accessed in October, 2002.
12Larson, Don. (1996), “Does Multimedia Have a Dark Side?,”
<http://www.webdeveloper.com/multimedia/mutimedia_dark_side.html> (1996). Accessed in November, 2002.
13<http://service.real.com/help/library/guides/productiongiq/HTML/htmfiles/
realsys.htm #63065> Accessed in October 2002.
42
14 ibid. 15Chromakey techniques: Advanced <http://www.mvcc.net/comm/Tips/shtm> (2001).
Accessed in August, 2002. 16John D. Micheletti and Malachi J. Wurpts, “Applying Chroma-Keying Techniques in a
Virtual Environment,” <http://www.tss.swri.edu/pub/2000AEROSENSE_HMD.htm> (2000). Accessed in July, 2002.
17Di Zhong and Shih-Fu Chang, AMOS: AN ACTIVE SYSTEM FOR MPEG-4 VIDEO
OBJECT SEGMENTATION, 1998 International Conference on Image Processing, October 4-7, 1998, Chicago, Illinois, USA
18ibid. 19M.Kass, A.Witkin, D., Snakes: Active contour models, International Journal of
Computer Vision (1988), 321-331. 20Rahkila, Martti and Huopaniemi, Jyri, Real Time Internet Audio-Problems and
Solutions, AES 102nd International Convention, Munich, Germany, March 22-25, 1997.
21Sehgal, Rajeev. “Net Video’s Obstacle to a steady stream” <http://news.com.com/2100-
1023-900617.html> November, 2002. Accessed in November, 2002. 22P. Westerink, L. Amini, S. Veliah W. "A Live Intranet Distance Learning System Using
MPEG-4 over RTP/RTSP". IEEE, 601-604. <http://www.informatik.uni-trier.de/~ley/db/conf/icme2000html> (2000). Accessed in August, 2002.
23Interactive Model-Based Coding for Face Metaphor User: Interface in Network
Communciations <http://www.iuiconf.org/97pdf/1997-002-0036.pdf> (1997). Accessed in August, 2002.
24Very Low Bitrate Coding of Virtual Human Animation in MPEG-4.
<http://www.research.att.com/projects/tts/papers/2000_ICME/Coding pdf> (2000). Accessed in September, 2002.
25Speech Coding <http://cslu./cse.ogi.edu/HLT survey/ch10node4html> Accessed in
October, 2002. 26Mack, Steve, Streaming Media Bible, 2002 p 73. 27<http://www.cs.csustan.edu/~framirez/video.html> updated January 31, 2001. Accessed
in December, 2002.
43
28<http://www.realnetworks.com/solutions/leadership/realvideo.html> Accessed in
December, 2002. 29Capturing from Digital Sources
<http://www.microsoft.com/windowsxp/expertzone/columns/bridgman/02february18.asp> posted February 18, 2002. Accessed in December, 2002.
30<http://www.viewcast.com> 2000. Accessed date: February, 2003. 31<http://www.tigerdirect.com/applications/Category/category_slc.asp?Id=2806> 2002.
Accessed in January, 2003. 32Acquiring and Digitizing Media
<http://www.doit.wisc.edu/services/streaming/tutorial/transcripts/transcripts6.htm> 2001. Accessed in September, 2002.
33<http://www.apple.com/finalcutpro> 2003. Accessed in January, 2003. 34<http://www.sonicfoundary.com/PRODUCTS/minisites/vegas3/video-eit.htm>
Accessed in December, 2002. 35Adobe Premier: <http://www.adobe.com/prodcts/premier/overview.html> 4/24/2002.
Accessed in December, 2002. 36Adobe premiere 6.5 help file. 37<http://support.jp.dell.com/docs/video/dazzle/sw/en/Index.htm> Initial release on
November, 2000. Accessed in February, 2003. 38<http://www.videomach.com/VideoMach.html> Accessed in October, 2002.
39Tricia Gill, “An Introduction to Windows Media Encoder 7.1,” <http://msdn.microsoft.com/library/default.asp?url=/library/en-
us/dnwmt/html/encode71.asp> May 16, 2001. Accessed in September, 2002. 40Ibid. 41Tricia Gill, “An Introduction to Windows Media 8 Encoding Utility,” <http://msdn.microsoft.com/library/default.asp?url=/library/en-
us/dnwmt/html/wmencodutil.asp> March 16, 2001. Accessed in September, 2002. 42Helix Producer. <http://www.realnetworks.com/products/producer/> Accessed in
February, 2003.
CHAPTER 3
Solution Methods and Techniques
The method of research used in this study is experimentation. The experimental
method of research is defined by Good1 as “a method or procedure involving the control
or manipulation of conditions for the purpose of studying the relative effects of various
treatments applied to members of a sample, or of the same treatment applied to members
of different samples.” In this study, the sample included the video-audio media and the
presentation slide. The use of the different low bit-rate encoding technologies and
techniques as well as the chroma-key tool, in developing a streaming media served as the
independent variables. The basic purpose of this research is to be able to know which of
the low bit-rate encoding techniques is the best in producing a streaming media to deliver
in normal dial-up connection while maintaining a reasonable video and audio quality.
A client must have minimum hardware, software and network transmission
requirements needed to play the video and audio to client’s desktop. These variables may
exert influence on the aspects of delivery of video-audio on the network. The client
needs at least:
• 56 kbps connection internet line,
• A Pentium 166MHZ processor equivalent,
• Minimum 32 MB RAM
• A 16-bit Sound Card (any type)
45
• Speakers
• Color monitor capable of displaying color depth of 16 bit
• Video card capable of displaying 16 bit color depth
• Screen resolution of at least 640 x 480 pixels or higher
• Any streaming video-audio player which supports file format and codec version.
First, it is important to establish that low-bit rate encoding techniques could be
used in video-audio with presentation content in a single frame to produce a streaming
media within the 56 kbps bandwidth. The recommended streaming rates of 34 kbps for
the 56 kbps modem was adopted2. This was accomplished by using the chroma-key
technique to remove the background of the video and to superimpose the presentation
material in the video. The presentation material explained by the speaker was
synchronized with the speaker's audio and video that was based on the captured video.
Likewise output video file was encoded in streaming media file using low bit-rate.
Before following the procedure, one sample VCD video footage was tested. This step
was important because this determined the direction of the study, whether to continue
with the rest of the problems in chapter 1 or not. If the results would prove that the low-
bit rate encoding techniques could not produce a streaming media in the 56 kbps
bandwidth with a reasonable quality, thus the remaining issues or problems raised in this
study are considered irrelevant.
It was proven that the low-bit rate encoding techniques can produce a streaming
video-audio with presentation content in a single frame within the 56 kbps bandwidth.
The next step was to test a number of common low-bit rate codecs for audio and video to
46
determine which combination would produce the best quality. The following codecs and
streaming format were studied:
• Video Codec: MPEG-4 Video Codec; Real Media Codec, Windows Media
Video Codec
• Audio Codec: Qualcomm Pure Voice Codec; ACELP.net Codec, Real Audio
Codec; Window Media Audio Codec
• File Format: Apple Quick Time (.mov); Windows Media (Audio/Video - .wma
& .wmv); Real Media (Audio/Video - .ra & .rm)
• Encoding Softwares: Window Media Encoder 7.1; Vegas Video 3.0; Helix
Producer Basic and Quick Time Pro 6.0
To measure the quality of the streaming media, the basic criteria for audio and
video proposed by the International Telecommunications Union (ITU) was used.
The ITU-T (International Telecommunication Union – Telecommunication
standardization sector)3 and ITU-R (International Telecommunication Union – Radio
communication sector)4 recommendations addressed the speech transmission over
telephone networks and image quality over television systems, respectively. A series of
ITU-T recommendations also addressed the subjective assessment of multimedia
applications. The recommended scales are briefly presented below.
Speech Quality Scales
Opinion Scales Recommended by the ITU-T. For the assessment of speech
quality, the recommended rating scale for both listening-only and conversation tests is a
5-point category scale commonly known as the quality scale. Listening-only tests can
47
also be assessed via the listening effort scale. These scales are shown in Table 4(a-c).
Table 4 (a)
Listening-quality Scale
Quality of the speech / connection Score
Excellent 5
Good 4
Fair 3
Poor 2
Bad 1
Table 4 (b)
Listening-effort Scale
Effort required to understand the meaning of sentences Score
Complete relaxation possible; no effort required 5
Attention necessary; no appreciable effort required 4
Moderate effort required 3
Considerable effort required 2
No meaning understood with any feasible effort 1
Table 4 (c)
Loudness-preference Scale
48
Loudness-preference Score
Much louder than preferred 5
Louder than preferred 4
Preferred 3
Quieter than preferred 2
Much quieter than preferred 1
Table 5
Conversation Difficulty Scale
Did you or your partner have any difficulty in talking or hearing over the connection?
Yes 1
No 0
Difficulty Scale. This is a binary response obtained from each subject at the end
of each conversation. The scale is shown in table 5.
Image Quality Scales
For the assessment of image quality, single stimulus methods are rated using the
quality scale or impairment scale, and comparisons to reference conditions are made
using the double-stimulus continuous quality scale (DSCQS) or the double stimulus
impairment scale. The DSCQS method is cyclic, in a sense that the assessor is asked to
view a pair of pictures, each from the same source, but one via the process under
examination, and the other one directly from the source. These scales are shown in
Tables 6 (a-c).
Table 6 (a)
49
Image Quality Scale
Image quality Score
Excellent 5
Good 4
Fair 3
Poor 2
Bad 1
Table 6 (b)
Image Impairment Scale
Image impairment Score
Imperceptible 5
Perceptible, but not annoying 4
Slightly annoying 3
Annoying 2
Very annoying 1
Table 6 (c)
Double Stimulus Continuous Quality Scale
A B
Excellent
Good
Fair
Poor
Bad
50
Non-categorical Judgement Methods
In non-categorical judgement, an observer assigns a value to each image or image
sequence shown.
In continuous scaling, a variant of the categorical method, the assessor assigns
each image or image sequence to a point on a line drawn between two semantic labels.
The scale includes additional labels at intermediate points for reference. The distance
from an end of the scale was taken as the index for each condition.
In numerical scaling, the assessor assigns each image or image sequence a
number that reflected its judged level on a specified dimension (e.g. image sharpness).
The range of the numbers used could be restricted. Sometimes the number assigns
describes the judged level in "absolute" terms (without direct reference to the level of any
other image or image sequence as in some forms of magnitude estimation. In other cases,
the number describes the judged level relative to that of a previously seen "standard" (e.g.
magnitude estimation, fractionation, and ratio estimation).
Both forms resulted in a distribution of numbers for each condition. The method
of analysis depends upon the type of judgement and the information required.
For this study, listening-effort scale was used in measuring speech quality and the
image impairment scale was used in measuring image quality because these scales are
more descriptive. Double stimulus continuous quality scale was not used due to
unavailability of a system to test the quality of the video. Conversation difficulty scale
was not used in this study because it depended on binary response obtained from each
subject at the end of each conversation and this research depended on unitary response.
51
The researcher was not able to use any testing that requires the use of equipment
due to lack of resources. Test would then be conducted in a normal environment. The
modified scaling is described in Tables 7 and 8.
Evaluations of the qualities of video and audio were classified within the same
scale but there were also cases of slight differences between quality of video and text
Table 7
Listening-effort Scale
Quality
Effort required to understand the meaning of sentences
Score
Excellent Complete relaxation possible; no effort required 5
Good Attention necessary; no appreciable effort required 4
Fair Moderate effort required 3
Poor Considerable effort required 2
Bad No meaning understood with any feasible effort 1
(e.g. jerky moving video, blurred text and video). To emphasize these differences, non-
categorical judgement method was used. Based on non-categorical method, measuring
quality scale was modified for video and audio by using plus notations where a "++" is
greater than a "+" which is greater than a no plus. The number was split on one third
category and the value rating of 0.67 was used for "++" and 0.33 for "+".
Table 8
Image Impairment Scale
52
Quality Image impairment Score
Excellent Imperceptible 5
Good Perceptible, but not annoying 4
Fair Slightly annoying 3
Poor Annoying 2
Bad Very annoying 1
Bandwidth Rate Scales
The bandwidth rate measurement was based from the recommended streaming
rates which should not be more than 34 Kb of data per second. Therefore, the 34 kbps
rate was used as the maximum streaming rate of the media in a 56 kbps modem. The
bandwidths were ranked from highest to lowest with 1 being the highest bandwidth.
The factors that were measured are: Video, presentation content, audio, and
bandwidth rate. The procedure to produce the streaming media addressed the third
problem which is developing a presentation using the streaming media technology.
53
NOTES
1Jose F. Calderon and Expectacion C. Gonzales, Methods of Research and Thesis Writing, National Book Store. 1993. p. 83.
2<http://service.real.com/help/library/guides/productiongiq/HTML/htmfiles/ realsys.htm #63065>. Accessed in October 2002.
3International Telecommunication Union, ITU-T (Telecommunication Standardization
Sector of ITU: Methods for Subjective Determination of Transmission Quality. <http://www.doc.ua.pt/arch/itu/rec/product/p.htm> June, 2002. Accessed in June, 2003.
4 International Telecommunication Union, ITU-R (Radio Communication Sector of ITU):
Methodology for the Subjective Assessement of the Quality of Television Pictures. <http://www.itu.int/itudoc/itu-r/archives/rsg/1996-97/rsg11/34813.html> 2000. Accessed in June, 2003.
CHAPTER 4
Presentation of Findings
In the context of multimedia presentation and viewing in the Internet, remote
users expects rich viewing environment and pleasing presentation. Normally, the video-
audio and presentation material are available in the form of a digital movie. In the movie,
a fixed portion of each video frame was occupied by the presentation slide with the
speaker’s body movements occupying the small region and frequently overlapping the
slide. When the video-audio and presentation material were delivered in the streaming
media format, the client starts showing it immediately, before downloading it completely.
Therefore, the issue of playback startup time was resolved by the use of streaming media
technology. The low bit-rate encoding technique was a natural choice in order to deliver
video-audio and presentation material effectively over the normal dial up connection line.
The current implementation gives the partial solution towards delivering the
presentation materials like business presentation or lecture videos for business and
educational purposes. Here, the researcher followed a scheme for the content
development so that the presentation slides and audio-video content do not exceed the 34
kbps bit-rate.
The goal is to develop the video-audio with presentation slide in a single frame
maintaining the quality of the slide and to be delivered effectively in the streaming media
format, which is supported by the player.
55
It needed to be shown that the audio-video with presentation material is in a single
frame using low bit-rate. For this, there are two semantically different entities in the
digital movie of the presentation i.e., the slides and the speaker who is explaining the
slides. In this approach, these two entities were treated separately and later integrated
together. The developed presentation material with low bit-rate coded audio-video of the
speaker was made available through a web server.
4.1 Effective Delivery of Video and Audio Before the content development procedure, a 5-minute video footage was treated
to solve the problem on effective delivery of video and audio on normal dial-up
connection stated in section 1.2. In the first step, the existing video file’s resolution,
video-audio quality, frame rate, and data transfer rate were checked. The normal video
file found in VCDs have the following specifications:
Format encoded: MPEG1
Dimension: 352 x 240 pixel
Pixel per cm (Monitor resolution): 22
Pixel Depth/Colors: 24/16 million
Frame Rate: 30 fps (NTSC)
This format was clearly not for streaming, since it requires at least 600 to 650 MB
disk space for about 45 minutes content and requires high bandwidth. Thus it was not in
a streaming format (i.e., whole content should be downloaded before it could be played back).
The sample footage (about 5 minutes of content) was converted into a streaming format
using low-bit rate encoder for different bandwidth requirements to check if it was still of
56
Table 9
Comparative Data Between Original Non-streaming Media and Converted Streaming Media Formats.
Before Encoding 5minutes video clip
After Encoding (56 kbps bit-rate)
After Encoding (35 kbps bit-rate)
After Encoding (34 kbps bit-rate)
After Encoding (33 kbps bit-rate)
Resolution 352 x 240
pixels 320 x 240
pixels 320 x 240
pixels 320 x 240
pixels
320 x 240 pixels
Video quality 5 4 3++ 3+ 3
Audio quality 5 4 3++ 3+ 3
Video Codec - MPEG-4 MPEG-4 MPEG-4 MPEG-4
Audio Codec - ACELP.net ACELP.net ACELP.net ACELP.net
Frame rate 30fps 15fps 15fps 15fps 15fps
Video data rate
1152 kbps 46 kbps 26 kbps 25 kbps 24 kbps
Audio data rate
224 kbps 64 kbps 43 kbps 42 kbps 41 kbps
Average data rate of Audio and Video
- 55.5 kbps 34.9 kbps 33.7 kbps 32.8 kbps
File format MPEG Window Media Video
Window Media Video
Window Media Video
Window Media Video
File size 49.9 MB 2.20 MB 1.58 MB 1.54 MB 1.49 MB
Legend:
Quality of Audio Quality of moving Video and Text
5 = Complete relaxation possible; no effort required =Excellent 5 = Imperceptible =Excellent
4 = Attention necessary; no appreciable effort required = Good 4 = Perceptible, but not annoying = Good
3 = Moderate effort required = Fair 3 = Slightly annoying = Fair
2 = Considerable effort required = Poor 2 = Annoying = Poor
1 = No meaning understood with any feasible effort = Bad 1 = Very annoying = Bad
"++" = Slightly better quality than scaling (0.67), "+" = Slightly good quality than scaling (0.33)
57
the same quality and to check if a reasonable quality could be maintained after it was
compressed to a streaming media format. To measure the quality of audio and video, the
researcher used the measurement scales suggested by the ITU-T (International
Telecommunication Union – Telecommunication standardization sector)1 and ITU-R
(International Telecommunication Union – Radio communication sector)2. The
comparative result after encoding in streaming media format with different parameters is
shown in Table 9.
Window Media Encoder 7.1 was used to convert the MPEG format to Window
Media Video (.wmv) format. This software is free on the web. We can use Vegas video
but the unregistered version does not support MPEG video file for encoding.
Initially, the highest bandwidth data rate for 56 kbps modem which is 56 kbps
was used. The video file size was drastically reduced from 49.9 MB to 2.20 MB
retaining perceptible video quality but not annoying. It was 'necessary to pay attention
but no appreciable effort required' to hear audio. The video data rate was 46 kbps and the audio
data rate was 64 kbps. Although the audio data rate exceeded 56 kbps, average bandwidth data
rate of audio and video was 55.5 kbps which was achieved by the software. Still this was not
acceptable if we were to compare it to the recommended 34 kbps data rate for streaming media
in a 56 kbps modem. By reducing the bit-rate encoding the data rate was also reduced.
To test whether a lower bit-rate will produce acceptable results, the researcher re-
encoded the original video file in a 35 kbps bit rate. The 35 kbps data rate was chosen
based from the results in the 56 kbps data rate which produced a slightly lower data rate
average for audio and video which is 55.5 kbps. The average video and audio data rates
in a 35 kbps encoding might result to 34 kbps or lower. After re-encoding, the file size of
58
video was 1.58 MB which is smaller than the original video file (49.9MB). Bandwidth
data rate of the encoded video file was 26 kbps and the audio data rate was 43 kbps. The
average data rate of audio and video was 34.9 kbps which was nearly 35 kbps. The
quality of video and audio rate was 'slightly better' quality than the measuring scale of 3.
Again, this data rate exceeded the recommended data rate therefore the bit rate of 35 was
further reduced to 34 kbps. In this bit rate the video data rate was 25 kbps and audio data
rate of 42 kbps. The average data rate of audio and video was 33.7 kbps which is within
34 kbps. The quality of video was 'slightly good' than scale of 3 and the audio was
'slightly good' than 'moderate effort required'.
To test if the file still maintains similar quality of video and audio or if it
decreases the quality, the file was encoded in a lower bit rate which was 33 kbps. In this
bit-rate the video and audio data rate were 24 kbps and 41 kbps respectively and the
average data rate was 32.8 kbps. However, the quality of video and audio was lower than
media encoded in the 34 kbps bit rate. If we reduce the audio data rate, it increases the
quality of video and vice-versa. By reducing the data rate of audio and video it also
decrease the quality of audio and video.
The resulting video and audio quality evaluations of the media encoded in 35, 34,
and 33 kbps showed that they were within the same level of 3. Although the evaluation
of the qualities of video and audio were classified under the same level, there were still
slight differences in quality. Applying the modified scaling would show that the quality
of video and audio in media encoded in a 34 kbps bit rate was preferred over the quality
when it was encoded in a 33 kbps bit rate.
These results showed that it was possible to produce a streaming media using a
59
Table 10
Summary of Converted Streaming Media Formats
After Encoding (56 kbps bit-rate)
After Encoding (35 kbps bit-rate)
After Encoding (34 kbps bit-rate)
After Encoding (33 kbps bit-rate)
Acceptability
Not acceptable
(Exceed data rate)
Not acceptable
(Exceed data rate)
Acceptable
Not acceptable due to
quality loss
Average data rate of Audio
and Video
55.5 kbps 34.9 kbps 33.7 kbps 32.8 kbps
File format Window Media
Video
Window Media
Video
Window Media
Video
Window Media
Video
File size
2.20 MB
1.58 MB
1.54 MB
1.49 MB
low bit-rate encoding technique delivering audio-video effectively in normal
dial-up connection modem users and that the ideal encoding bit rate was at 34 kbps.
Table 10 shows that the 34 kbps bit rate was acceptable to develop a streaming media for
56 kbps modem users.
With the results, a 5-minute video presentation was developed using chroma-key
and that file was encoded in a low bit rate encoder within the 34 kbps bit rate.
4.1.1 Packaging Video-Audio and Presentation material. In combining the
video and presentation materials in a single frame, the chroma-key technique was used.
The chroma-key technique separated the objects in the foreground from a particular
background whose colors were usually blue or green. There were other possibilities and
techniques to combine the presentation and the video in the same frame. The idea
behind combining was to remove the background of the speaker’s video. Techniques
used to remove the background will greatly vary depending on the type of video taken
60
(like light settings, background etc.). Since the video for this study was taken in plain
background, the chroma-key technique was best suited for removing the background and
mixing the content with the presentation slide. In the absence of plain background, other
techniques like video segmentation could be used for the same purpose, which could be
taken as an alternative method. But this procedure was not used because it is a lengthy
and time consuming process. The video presentation was recorded in a plain
background.
In the video, a certain portion of each video frame was occupied by the
presentation slide with the speaker’s body movements occupying the small region and
frequently overlapping the slide. Due to this, some parts of the presentation slide could
not be read. The positioning of the video was then set on the lower right corner. It was
assumed that at the time of preparation of the presentation, the video on the lower right
corner was left blank so the speaker’s movement would not cover the presentation
content. The Adobe Premier 6.5 had been used to set the video positioning as well as
mixing the audio and presentation slide because in using this software there was no need
for other intermediate software for applying the chroma-key and synchronizing the
presentation material audio and video. The sequence of presentation slide and audio was
arranged by timing information when recording the video. The detailed procedure of
content-development would be discussed in section 4.3.
4.2 Technologies and Techniques of Low bit-rate Video-Audio Encoding
To deliver audio/video through the Internet, the streaming media is becoming the
de-facto standard. Streaming video quality is dependent in part upon the process of how
61
it was encoded for transmission and the amount of bandwidth required for it to be viewed
properly. It is possible, however, to prepare alternate video clips, which are of higher
video and audio quality, and which are specially meant for transmission to visitors who
are connected to the website at higher speeds. These audio and video formats use many
different types of codec for different bandwidths and the quality requirements.
The low bit rate encoder was used to effectively deliver video-audio and
presentation slide in the normal dial-up connection. The results for the different codecs
and software are shown in table 11, while the quality assessment is shown in table 12.
This study used four kinds of encoding software which are commonly used in
streaming media production. These were Window Media Encoder 7.1; Vegas Video 3.0;
Helix Producer Basic and Quick Time Pro 6.0. The codecs that were studied were the
following: MPEG-4; Real Video 8 and 9; Window Media Video and Audio 8;
ACELP.net and Qualcomm Pure Voice. Criteria for measurement included, streaming
data rate and comparisons of qualities in audio, text, and movement of the video. The
impairment scale was used to measure the quality of video and text while for audio, the
listening effort scale was used which is described in chapter 3. The frame was fixed to
320 x 240 pixel size because one objective of this study was to integrate the video-audio
with presentation material and to show the content clearly. If we reduced the frame size,
the presentation content could not be read. Frame rate was expressed as frames per
second (fps). Typically, 15 fps is the standard for streaming video. This rate allows for
smooth playback over the internet for 56 kbps modem users. Setting the frame rate
below 15 fps might make video playback appear slow and sluggish.3
After encoding the video-audio file in Vegas Video for 56 kbps modem users,
62
using ISO MPEG – 4 and ACELP.net 8.5 kbps mono for video and audio codecs under
the Window Media Video file format, the researcher achieved the 32 kbps streaming data
rate. This gives a 'no appreciable effort required' to understand the meaning of sentences;
'slightly annoying' image in the quality of text and moving video.
The use of Window Media Video 8 and Window Media Audio 8 as video and
audio codecs respectively under the same file format and encoding software resulted a
point higher bandwidth data rate than Vegas Video using ISO MPEG-4 and ACELP.net
codecs which was 33 kbps streaming data rate and having 1.42 MB file size, that gave
'slightly better' than 'moderate effort required' to understand meaning of sentences and
text was 'perceptible, but not annoying'. Moreover, the quality of moving video was
'slightly annoying'. Compared to Window Media Encoder having same codecs, the
Vegas Video software had reduced the file size as well as data rate when using ISO
MPEG-4 and ACELP.net codecs but the quality of video was less than that of Window
Media Encoder. For Real Video 8 and Real Audio 8 video and audio codecs respectively
under the file format Real Media the file size was 1.33 MB resulting to a streaming data
rate of 34 kbps and giving a 'no appreciable effort required' to understand the meaning of
sentences, while 'slightly annoying' in quality of text and moving video.
Under the file format Quick Time Movie using ISO MPEG – 4 and Qualcomm
Pure Voice as video and audio codecs, the file size was 2.57 MB. This resulted to 64
kbps streaming data rate and was higher than the 34 kbps recommended streaming data
rate. Quick time measures streaming data rate in bytes per second. RealNetworks and
Window Media, however, measure in bits per second. The quick time works as eight-
bits-equal-one bytes formula.4 It gives a 'slightly good' quality than 'moderate effort
63
required' to understand the meaning of sentences spoken, the quality of text was 'slightly
annoying' but 'slightly good' than a scale of 3, and for the moving video it was 'slightly
better' quality than a scale of 2.
Second, the Window Media Encoder 7.1 had been used to encode the file in
Windows Media Video (.wmv) format resulting to a streaming data rate of 34 kbps using
ISO MPEG – 4 and ACELP.net 8.5 kbps mono as video and audio codecs, the file size
was 1.48 MB. When using the codecs and file format a 'no appreciable effort required to
understand the meaning of sentences spoken' was observed but 'slightly good' than a scale
of 4; in the quality of text it was 'perceptible but not annoying' but 'slightly better' quality
than perceptible, while in the quality of moving video it was 'slightly annoying' but
'slightly better' quality than a scale of 3. Consequently, when Windows Media Video 8
and Windows Media Audio 8 were used as video and audio codecs under the file format
Windows Media Video the file size was 1.40 MB resulting to a streaming data rate of 34
kbps and gave a 'slightly good' quality than 'moderate effort required' to understand the
meaning of sentences in the quality of audio; the quality of text was 'perceptible, but not
annoying' and a 'slightly good' than 'annoying' moving video.
Third, Helix Producer Basic was used as encoding software. Real Video 9 and
Real Audio 8, 8.5 kbps voice were used as video and audio codecs under this software,
the file format was Real Media. The resulting file size was 2.49 MB and a streaming data
rate of 34 kbps. A 'no appreciable effort required' but 'slightly good' quality than a scale
of 4 to understand the meaning of sentences spoken was observed; the quality of text was
'slightly better' than 'perceptible but not annoying'; and the moving video was 'slightly
annoying' but 'slightly good' than a scale of 3.
64
Lastly, the Quick Time Pro 6.0 encoding Software that uses MPEG – 4 and
Qualcomm Pure Voice 8khz mono as video and audio codecs were used. Having the file
format Quick Time Movie under these codecs, the resulting file size was 2.11 MB and
the streaming data rate was 52.8 kbps. This was beyond the recommended streaming
data rate for a 56 kbps modem which was 34 kbps. The quality of audio was acquired
with 'moderate effort required' but 'slightly better' in quality than a scale of 3 to
understand the meaning of sentences, while for the quality of text it was 'slightly
annoying' but 'slightly good' than a scale of 3 and moving video was 'annoying' but
'slightly better' in quality than a scale of 2.
Among these combinations of software having ISO MPEG-4 and ACELP.net
video-audio codecs and .wmv file format, the Window Media Encoder was better because
it acquired a streaming data rate that was appropriate for 56 kbps modems users and gave
a good quality. However, Vegas Video was better with the same codecs (ISO MPEG-4
and ACELP.net) and file format (.wmv) in terms of bandwidth rate having 2 kbps lower
than Window Media Encoder. The Vegas Video was 1 kbps lower than Window Media
Encoder using same codecs (Window Media Video 8 and Window Media Audio 8) and
also the quality of moving video was higher than Window Media Encoder.
The Vegas Video and Helix Producer had the same file format (.rm) and audio
codecs. The video codecs were Real Video 8 and Real Video 9 respectively. The quality
of text, audio and moving video of Helix Producer was better than Vegas Video but
bandwidth rates were the same. The Vegas Video and Quick Time have the same video
and audio codecs, the bandwidth data rate in Quick Time was 11.2 kbps lower than
Vegas Video. The quality of moving video and text were the same but Quick Time's
65
quality of audio was 'slightly better' than Vegas Video. Quick Time movies are
commonly used for presentations played on portable computers, whether in presentation
software like Persuasion, or as stand-alone full-screen movies.5 These discussions and
comparisons are shown in tables 11 and 12.
Table 11
Test Results for 56 Kbps Modem Users
Encoding Software
Video Codec
Audio Codec File Format File Size Time Frame Rate per second
Image size in pixel
Bandwidth data rate
ISO MPEG-4
ACELP.net 8.5 kbits/s mono
Window Media Video 1.35 MB 5min 15 fps 320 x 240 32 kbps
Vegas Video 3.0
Window Media Video 8
Window Media Audio 8 Window Media
Video 1.42 MB 5min 15 fps 320 x 240 33 kpbs
Real Video 8
Real audio 8 Real Media 1.33 MB 5min 15 fps 320 x 240 34 kbps
ISO MPEG-4
Qualcomm Pure Voice
Quick Time Movie 2.57 MB 5min 15 fps 320 x 240 64 kbps
8 KBps* Window Media Encoder 7.1
ISO MPEG-4
ACELP.net 8.5 kbits/s mono Window Media
Video 1.48 MB 5min 15 fps 320 x 240 34 kbps
Window Media Video 8
Window Media Audio 8 Window Media
Video 1.40 MB 5min 15 fps 320 x 240 34 kpbs
Helix Producer Basic
Real Video 9
Real Audio 8, 8.5 kbps voice Real Media 2.49 MB 5min 15 fps 320 x 240 34 kbps
Quick Time Pro 6.0
ISO MPEG-4
Qualcom Purevoice 8 khz
mono Quick Time
Movie 2.11 MB 5min 15 fps 320 x 240 52.8 kbps 6.6 KBps*
* Quick time measures streaming data rate eight-bits-equal-one-byte formula but RealNetworks and Window Media measure in bits per second.6 66
Table 12
Quality Assessment of Video-Audio and Bandwidth
Encoding Software
Video Codec Audio Codec File Format Quality of Audio
Quality of Text
Quality of moving Video
Bandwidth Data Rate
Total Score
ISO MPEG-4 ACELP.net 8.5 kbits/s mono
Window Media Video 4 3 3 5 15
Vegas Video 3.0
Window Media Video 8
Window Media Audio 8
Window Media Video 3++ 4 3 4 14.67
Real Video 8
Real audio 8 Real Media 4 3 3 3 13
ISO MPEG-4 Qualcomm Pure Voice
Quick Time Movie 3+ 3+ 2++ 1 10.33
Window Media Encoder 7.1
ISO MPEG-4 ACELP.net 8.5 kbits/s mono
Window Media Video 4+ 4++ 3++ 3 15.67
Window Media Video 8
Window Media Audio 8
Window Media Video 3+ 4 2+ 3 12.67
Helix Producer Basic
Real Video 9 Real Audio 8, 8.5 kbps voice
Real Media 4+ 4++ 3+ 3 15.33
Quick Time Pro 6.0
ISO MPEG-4 Qualcom Purevoice 8 khz mono
Quick Time Movie 3++ 3+ 2++ 2 11.67
Legend:
Quality of Audio Quality of moving Video and Text 5 = Complete relaxation possible; no effort required =Excellent 5 = Imperceptible =Excellent 4 = Attention necessary; no appreciable effort required = Good 4 = Perceptible, but not annoying = Good 3 = Moderate effort required = Fair 3 = Slightly annoying = Fair 2 = Considerable effort required = Poor 2 = Annoying = Poor 1 = No meaning understood with any feasible effort = Bad 1 = Very annoying = Bad
"++" = Slightly better quality than scaling (0.67), "+" = Slightly good quality than scaling (0.33) 67
68
In conclusion, the Window Media Encoder was the best software for producing
streaming media in .wmv format using MPEG-4 and ACELP.net video-audio codecs
respectively based on total score. Helix Producer Basic was the best for .rm file format
having Real Video 9 and Real Audio 8 codecs followed by Vegas Video with Real Video
8 and Real Audio 8 codecs. The encoding software with their codecs were ranked as
shown in table 13 based on assessment of table 12.
Table 13
Summary Results of the Different Codecs and Software
Rank Encoding Software Video Codec Audio Codec File Format
1 Window Media Encoder 7.1
ISO MPEG-4
ACELP.net 8.5 kbits/s mono
Window Media
Video
2
Helix Producer Basic
Real Video 9
Real Audio 8, 8.5 kbps voice
Real Media
3
Vegas Video 3.0
ISO MPEG-4
ACELP.net 8.5 kbits/s mono
Window Media
Video
4
Vegas Video 3.0
Window Media Video 8
Window Media Audio 8
Window Media
Video
5 Vegas Video 3.0
Real Video 8
Real audio 8
Real Media
6
Window Media Encoder 7.1
Window Media Video 8
Window Media Audio 8
Window Media
Video
7
Quick Time Pro 6.0
ISO MPEG-4
Qualcom Purevoice 8 khz mono
Quick Time
Movie
8
Vegas Video 3.0
ISO MPEG-4
Qualcomm Pure Voice
Quick Time
Movie
69
4.3 Streaming Media Presentation Development
4.3.1 Hardware and Software used in the Development Procedure.
Throughout the study, the procedure to develop the streaming media, using low bit-rate
encoding techniques was recorded. The hardware and software that was used in the study
were the following:
Intel Celeron 600 MHZ processor
Microsoft Windows XP 128MB of RAM Video display adapter NVIDIA GeForce2 MX/MX 400 Pinnacle Video capture card
Panasonic Video Camera (CCD-TRV 21E Video 8 Handycam)
VideoMach 3.0
Vegas Video 3.0
Adobe Premiere 6.5
Helix Producer Basic
Window Media Encoder 7.1
Quick Time Pro 6.0
4.3.2 Development procedure for the Streaming Media Presentation. The
procedure followed to develop the low bit-rate coded audio-video of the speaker, suggested
the way to integrate the existing software tools. The block diagram representing the scheme
for the usage of various tools in the content development procedure is shown in figure 6.
Various processes designated as A, B, C, D and E in the figure are discussed below:
70
Process
Input, output
Final output
B
Sample Video with plain background
Video with speaker's position on the lower right corner
C
D
Synchronized video-audio and presentation material
A
E
Audio Video
Video-audio and presentation content in single frame
Figure 6
The Flow Chart of Streaming Media Presentation Development Procedure
Legend: A: Recording, Capturing and Digitizing
the Video B: Splitting the Video and Audio
Extraction C: Video Positioning Required Before
Superimposing Presentation Slide D: Compositing the slide and the Video
of the Speaker E: Low bit-rate Audio-Video Encoding
71
The presentation contents are developed in the following steps:
4.3.2.1 Recording and Capturing the Video (A). A 5 minute sample video was
recorded in the plain blue background shown in figure 7. The blue background was
chosen for the study because it’s the color most opposite to skin colors, providing the
least distortion, and most people do not wear obnoxious blue or green clothing. It is also
easy for keying which means electronically cutting out portions of a picture and filling
them in with another image as already described in section 2.5. Recorded video was
digitized by using Pinnacle capture card in MPEG-1 format. The file size was 75.5 MB
with a bit-rate of 2380 kbps, frame rate of 25 fps, and frame size of 320 x 240 pixels.
Figure 7
Sample Video with Plain Background
72
4.3.2.2 Splitting the Video and Audio Extraction (B). This step was essential to
throw away unnecessary redundant, audio-visual information from the lecture movie.
This defined the scope for further development steps.
The Video Mach software was used for this step. It was used to construct video
clips from still images, and to enhance recorded material or convert video, audio and
image files between many supported formats.7 It took the digital movie of captured
presentation in MPEG-1 format as input, and generated the audio and the video of the
presentation in two separate files. The audio containing the audio of the speaker was
generated in a .wav file. The video contained only the significant rectangular portion of
the visuals of the presentation.
The video output was generated in the form of the video file saved as a separate
AVI file format and audio saved as a .wav file which was shown in the flow chart.
4.3.2.3 Video Positioning Required Before Superimposing the Presentation Slide (C).
This step was essential for positioning the video. In the movie, a fixed portion of each
video frame was occupied by the presentation slide with the speaker’s body movements
occupying the small region and frequently overlapping the slide. Due to this, clients
could not read some parts of the presentation slide. The video position was set on the
lower right corner. At the time of preparation of the presentation material, blank spaces
were left for the video on the lower right corner. Adobe Premier 6.5 was used to set the
video positioning. The zoom value was fixed at 75% from the start to the end in the
time line, because it would not cover the presentation slide explained by the speaker
which is shown in figure 8. The value of zoom could be changed as needed.
73
Figure 8
Positioning the Video in Adobe Premiere 6.5.
Figure 9
Output of Repositioning the Video
74
If there were some differences in color value, the color similarity value can be
increased. The box of real-time preview option could be checked to test the movie. The
timeline was exported and saved in AVI format. The outcome of repositioning the video
is shown in Figure 9.
4.3.2.4 Compositing the Slides and the Video of the speaker (D). Compositing, also
known as superimposing is the process of combining two or more images to yield a
resulting, and enriched image. Compositing could be made with still or moving images.
Compositing or superimposing simply means playing one clip on top of another. Adobe
Premier 6.5 is used for this step. It supports the chroma-key and is used to select a color
range of color in the clip to be transparent.
The term matting and keying, in video and film production, refers to specific
compositing techniques. Keying uses different types of transparency keys to find pixels
in an image that match a specified color or brightness, and make those pixels transparent
or semitransparent. For example, if we had a clip of a weather man standing in front of a
blue-screen background, we could key out the blue using a blue-screen key, and replace it
with a weather map. Matting uses a mask or matter to apply transparency or semi-
transparency to specified areas of an image. By using keying or matting to apply
transparency, the portions of lower images are revealed. Adobe Premier 6.5 supported
these features and this keying concept was used to develop a mix presentation slide and
video of the speaker.
In order to composite video clip, audio and presentation slide, they were imported.
Before importing the PowerPoint presentation, all the slides were saved on a bitmap image
sequence because Adobe Premiere does not support the PowerPoint slide directly.
75
Therefore, it needed to be converted to a bitmap image. After importing the video
audio and presentation slide, the media was placed on a timeline. The video was placed
on track 2 and the presentation slide on track 1 as shown on figure 10.
The Setting for the video composition were as follows:
Compressor: None
Color Depth: millions
Frame Size: 320 x 240 pixels
Frame Rate: 15fps
Figure 10
Superimposing PowerPoint Slide on the Speaker's Video
76
Pixel Aspect Ratio: Square Pixel Ratio (1:0)
Editing Mode: Video for Windows
Time base: 30
Time Display: 30 fps Non Drop Frame Time code
To key-out the video background and replace it with a presentation slide, a
transparency setting was used and the background color (blue) was picked by color
picker and the key type chroma was selected. Output was shown in a sample frame
which is shown in figure 10. The time sequence of audio and presentation slide had
already been recorded at the time of video recording. The video-audio and presentation
slides were placed in a timeline as sequence of timing information, so that the slide would
change when speaker content changed. The outcome of superimposing PowerPoint slide
is shown in figure 11.
Figure 11
Superimpose of the PowerPoint Slide in the Back of the Video.
77
4.3.2.4.1 Problems encountered when presentation slide and video file were
treated separately. When the two entities (the presentation slide and the speaker who was
explaining the slides) were treated separately, it was not possible to make the background
of the video file of the speaker translucent. The main concept of this study was to
develop presentation contents namely the presentation file using the Microsoft
PowerPoint and the separate low bit-rate coded audio-video of the speaker available
through a web server. Additionally, the presentation file was made up of the timing
information for each slide already recorded during the presentation event.
In order to play the presentation, the player treated the presentation slides and the
speaker as two separate inputs and achieved the realistic synchronization between the
running slides and the speaker’s explanation of the slides on the remote client’s desktop.
There was an Intel’s RDX (Realistic Display Mixer) in the player which mixed
the slides and the movie frames and displayed each frame of the presentation in a realistic
fashion. This technology was based on Intel’s MMX technology but had already been
phased out by the Intel Corporation.8
Further, there was only one codec, which supported the transparency that was
Intel's Indeo-Video 5.09 but it was an AVI codec, which was not a streaming media file.
Therefore, the researcher had mixed the presentation slide and video in the Adobe
Premiere and an output was made.
4.3.2.5 Low Bit-rate Video-Audio Encoding (E). This was the final step essential
to generate the low bit rate encoded audio-video of the speaker as a single file. Different
video and audio codecs were used for encoding the speaker’s video. Window Media
Encoder, Vegas Video, Helix producer and Quick Time were used for encoding. These
78
software were useful to convert in streaming media format. The results were already
discussed earlier. Visual outcome of different codecs and software are shown in appendix
A, B, C and D. Window Media Encoder is free and is available from window media
website.10 The encoding process in Window media encoder is shown in figure 12.
Visual outcome of Window Media Encoder using ISO MPEG-4 and ACELP.net codecs
is shown in Figure 13.
Vegas Video 3.0 was used to encode in Real Media (.rm), Window Media Video
(.wmv) Quick Time Movie (.mov) format. It could render directly from the timeline to
all three of the major streaming media formats. Each streaming media platform had a
number of default templates. In addition, it could modify and create its own templates by
clicking on the custom button. This software is free11 in the Web in demo version.
Figure 12
Encoding the Video and Audio in Window Media Encoder
79
Figure 13
After Encoding the Video File in .wmv Format.
Windows media technology includes intelligent streaming and also uses the
concept of target audiences. By using window media encoder, the desired target
audiences could be selected and the frame size and audio codec be specified. Unlike Real
Networks Sure Stream technology, it could not specify different audio codecs for
different target audiences.12
When encoding the video-audio file, we could select the many target audiences in
Window Media Encoder as needed, but, if we selected one of the target audience under
80 kbps then, the encoder did not allow choosing a target audience above 300 kbps.
80
NOTES
1International Telecommunication Union, ITU-T (Telecommunication Standardization Sector of ITU: Methods for Subjective Determination of Transmission Quality. <http://www.doc.ua.pt/arch/itu/rec/product/p.htm> June, 2002. Accessed in June, 2003.
2International Telecommunication Union, ITU-R (Radio Communication Sector of ITU):
Methodology for the Subjective Assessement of the Quality of Television Pictures. <http://www.itu.int/itudoc/itu-r/archives/rsg/1996-97/rsg11/34813.html> 2000. Accessed in June, 2003.
3Crowell, Nancy. “How Do Your Media Add Up?” <http://www.workz.com/cgi-bin/gt/tpl_page.html,template=1&content=1897&nav1=1&> 2003. Accessed in March, 2003.
4Video over the Internet. <http://www.matrox.tv/includes/pdf> Accessed in May, 2003.
5<http://www.siggraph.org/education/materials/HyperGraph/video/codecs/PureVoice html> 2002. Accessed date: June, 2003.
6 Ibid. 7<http://www.videomach.com/VideoMach.html> Accessed date: October, 2002. 8Intel® Realistic Display Mixer (RDX). <http://www.intel.com/labs/archive/rdx.htm>
Accessed in January, 2003. 9Intel Indeo® Video 5. <http://www.siggraph.org/education/materials/HyperGraph
/video/codecs/indeo_v5/overview.htm> (1997). Accessed in January, 2003. 10<www.microsoft.com/windowmedia> Accessed in July, 2002. 11Steve Mack, Streaming Media Bible, Hungry Minds, Inc. NewYork, NY 10022. (2002). 12<http://www.sonicfoundry.com/PRODUCTS/showproduct.asp? PID=612&FeatureID=5447> Accessed in August, 2002.
81
CHAPTER 5
Summary, Conclusion and Recommendations
The study focused primarily on the video-audio streaming technology. It,
specifically, aimed to develop the video-audio with presentation slides in a single frame
maintaining the quality of the slides and delivering effectively in the streaming media
format which is supported by the player.
5.1 Summary of Findings
A five-minute VCD video footage in MPEG format was converted into streaming
media format using Window Media Encoder having ISO MPEG-4 and ACELP.net
codecs. The video footage was encoded in 34 kbps bit-rate that gives an average data
rate of 33.7 kbps i.e., within 34 kbps bit rate. The quality of video was ‘slightly good’
quality than ‘slightly annoying’ and the quality of audio is ‘slightly good’ than ‘moderate
effort required’ to understand the speaker. Video-audio and presentation materials were
packaged in a single frame by the use of the chroma-key tool.
Low bit-rate encoding and chroma-key technique were used to develop and
deliver a presentation material over the low bandwidth connection in the Internet. A five-
minutes presentation movie (in MPEG-1 format) had been used as sample.
The size of the movie was 75.5 MB with a bit-rate of 2380 kbps, frame rate of 25 fps,
and frame size of 320x240 pixels. Using this movie, the presentation contents in the
82
form of a .wmv file was developed, representing the audio-video of the speaker, and
presentation slide (made by using the Microsoft PowerPoint). The size of the resulting
.wmv file was 1.48 MB, with a bit-rate of 34 kbps when encoded in Window Media
Encoder while using Vegas Video with ISO MPEG-4 and ACELP.net codecs the bit-rate
was 32 kbps and file size was 1.35 MB.
In Real Media format using Helix Producer Basic, the bit-rate was 34 kbps and
file size 2.49 MB. When using Vegas Video the bit-rate was 34 kbps and file size was
1.33 MB. Besides the Windows Media (.wmv), other file formats were also studied with
the same encoding schemes (like MPEG-4 codec with QuickTime) and found different
results. Among other file types, Real audio-video format with real video 9 codec for
video and real audio 8 codec showed the next satisfactory result. Studying the result,
there were some limitations found in the current implementation which are as follows:
• The decoded video frame representing the speaker’s body shows some blocking
artifacts at the boundary of the speaker’s body, thus showing the chroma-key (in
this case, blue color) instead of smooth edge, at the boundary. This is clear as
shown in figure 16. This may be due to the light setting and camera. The video
footage was taken by normal Panasonic camera in natural light.
• The audio of the speaker appears different due to the distance of the speaker and
the microphone of camera. To achieve better quality it will be better to use
microphone in a controlled manner.
• The quality of presentation slide, which was explained by the speaker, was not in
original quality because the video and audio are combined and encoded together.
At this time, the researcher did not have enough resources (like real time display
83
mixing and necessary software) to study the other possibilities of mixing
technologies.
5.1.1 Visual Outcome at Various Stages of the Content Development. The
content of the media was developed on Windows XP platform (with Celeron 600 MHZ,
128 MB RAM) and the low bit-rate encoded audio-video in the Window Media Video
(.wmv); RealMedia (.rm) and Quick Time movie (.mov) format was generated.
For complete content development task, the machine took 2GB disk space. The
following are the intermediate results, represented in the form of one video frame of the
presentation, generated by different steps of the content development as discussed in
section 4.3.
Figure 14 shows a video frame in the original digital movie of the frame size 320
x 240 pixels. It represents the input to the first step of the content development.
Figure 14
Video Frame in the Original Digital Movie
84
Figure 15 shows the video frame when repositioning the video of the speaker. It
represents the outcome of the first step.
Figure 16 shows the video frame when the PowerPoint slide is superimposed with
the video of the speaker. It represents the outcome, of the third step, saved in AVI format.
Figure 16
Superimpose of the PowerPoint Slide in the Back of the Video.
Figure 15
Output of Repositioning the Video
85
Figure 17 shows the video frame when encoded with the video-audio file. It
represents the outcome of the final step, saved in .wmv format.
Figure 17
After Encoding the Video File in .wmv Format.
The following are the achievements made by the test:
When the video-audio file was tested using the window media player, quick time
and real media, the presentation showed no interruption. In terms of quality, Window
Media Video (.wmv) file format had given best result of video and audio quality followed
by Real Media. The video-audio playing data rate in Window Media Video format,
Quick Time and Real Media format were 32 and 33, 64, 34 kbps and file size was 1.35
and 1.42, 2.57, 1.33 MB respectively when encoded in Vegas Video as shown on table
11. Smoothness of video was observed in real media. Using Window Media Encoder
and Helix producer the data rate was 34 kbps but the combination of codec (MPEG-4 and
ACELP.net) was best in Window Media Encoder than Helix Producer Basic in terms of
overall quality. In quick time the data rate was 52.8 using Quick Time Pro 6.0 which
86
exceeded the acceptable data rate and quality of audio was slightly better than Vegas
Video but text and moving video were the same, where both softwares used ISO MPEG-
4 and Qualcomm Pure Voice codecs.
When the video-audio file is encoded with different format and with the different
compression technology, the streaming data rate and file size varied accordingly.
Therefore, the researcher used the low bandwidth compression technology. This
is helpful to achieve the 34 kbps data rate for 56 kbps modem users.
In addition to this study, the chroma-key helped to superimpose the presentation
slide in presentation video where the speaker is explaining the slide in front of the white
board by developing such contents. Consequently, it was found that it is possible to
provide the clients real experience of viewing a real live presentation video.
5.2 Conclusion
Streaming multimedia is a logical step in the development of the Internet, moving
from text-base content, then graphics and animation, to downloadable video and audio
files. It enhances the traditional text and image base presentation and makes rich
environment for viewing experience.
Based from the current study of different audio/video compression technology
and based on low bit-rate encoding techniques, it can be concluded that rich media can be
developed and delivered for the low bandwidth connection using low bit-rate encoder
such as MPEG-4 and ACELP.net. The chroma-key tool can be used to package video-
audio and presentation material in a single frame to deliver in a web server.
87
MPEG-4 codec for video and ACELP.net codec for audio is the best low bit-rate
codec, and Real Video 9 for Video and Real audio 8 for audio is the second best codec.
In terms of coding software, Window Media Encoder is the best encoder software for
.wmv format and Helix Producer Basic is the best for .rm format.
Likewise, to develop the streaming media presentation material with video-audio
in a single frame, video can be recorded in plain background color especially blue and
green. Video and audio quality also depends on camera, light setting, capture card,
microphone, etc. After digitizing the video and audio, split the video and audio in
separate file and throw away unnecessary redundant, audio-visual information from the
presentation video. Before superimposing the presentation material in video, it is
necessary to set the speaker's position using video editing software to avoid the speaker's
body movement overlapping the presentation slides. To superimpose the presentation
material in the video, the chroma-key tool is used. The presentation material and video-
audio is synchronized with time information after using chroma-key tool and
superimposing the presentation material. Lastly, video-audio presentation is encoded in
video-audio encoder software in desired streaming format using low bit-rate encoding
technique.
The use of the chroma-key technique in removing the background of video and
superimposing the presentation slide helps to reduce the file size and maintain the quality
of video for the streaming media. If there is one color in the video background, the
compression ratio is more effective than multiple color background. In this case, it
enables us to deliver the media contents to the users connected with low bandwidth
connection.
88
5.3 Recommendations
The researcher believes that the study provides early support and guidelines for
the development of the streaming media technology. It also provides guidelines in the
use of the chroma-key technology for streaming presentation content development and
delivery for the low bandwidth connection.
Some of the limitation faced in this study like unsmooth edges of the speaker's
video and reduced audio quality can be further improved by taking the video footage in a
well lit room with good sound equipments as found in studios. If recording and mixing
of audio and video is handled by the professionals, rich quality can be achieved.
The type of content developed for this study, which are presentation slides with
the speaker (or instructor) together in one frame looks suitable for the business
presentation and the lecture materials for the distance education. Based on the possibility
to develop such content for low bandwidth connection, it is highly recommended for
other researchers to study the possibility or feasibility by using this type of contents in the
distance education and business presentation context.
Due to lack of resources, the quality of audio-video was measured in a subjective
way. The laboratory testing for audio-video quality is recommended.
To get better quality of presentation material, it is highly recommended to use real
time mixing technology with audio-video and presentation material.
89
B I B L I O G R A P H Y
90
BIBLIOGRAPHY
I. BOOKS
Bates, A.W. Technology, Open Learning, and Distnace Education. Routledge 1995. Jose F. Calderon and Expectacion C. Gonzales. Methods of Research and Thesis Writing.
National Book Store. 1993.
Mack, Steve. Streaming Media Bible. Hungry Minds, Inc. NewYork, NY 10022. (2002).
II. Articles & Periodicals
Banks, S., and Mc Connell, D. On-line learning using broadcast materials: Case study of the BBC On-line Learning Pilot Programme in Women’s Health. proceedings of the seoncd international networked learning coference pp 374-380, Lancaster University (2000).
Dapeng Wu, Yiwei Thomas Hou, Wenwu Zhu, Ya-Qin Zhang, Jon M. Peha. Streaming
Video over the Internet: Approaches and Directions. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, No. 1. February 2001.
Dapeng Wu, Yiwei Thomas Hou and Ya- Qin Zhang., Transporting Real-time Video
Over the Internet: Challenges and Approaches. Proceeding of the IEEE, Vol. 88, no. 12, Dec. 2000.
Di Zhong and Shih-Fu Chang. AMOS: AN ACTIVE SYSTEM FOR MPEG-4 VIDEO
OBJECT SEGMENTATION. 1998 International Conference on Image Processing, October 4-7, 1998, Chicago, Illinois, USA
M.Kass, A.Witkin, D. "Snakes: Active contour models" International Journal of
Computer Vision (1988): 321-331.
Micke O,Donoghu, Michael Barber and Steve Childs. The Role of Streaming Media in the delivery of distance learning. Pre-Submission Draft – Lancaster University, August 2000.
O’ Donoghue, M. and Thily, H. Interactivity beyond Belief. Interfaces, Vol 8, Paris VIII
universite, 1995.
91
O. Egger, E. Reusens, T. Ebrahimi and M. Kunt. Very Low Bit Rate Coding of Visual Information. Swiss Federal Institute of Technology at Lausanne, Switzerland. 1996.
Rahkila, Martti and Huopaniemi, Jyri. Real Time Internet Audio-Problems and
Solutions. AES 102nd International Convention, Munich, Germany, March 22-25, 1997.
Sircar, J., Streaming Media Technology: Laying the Foundation for Education Change.
Syllabus, 14 (3), 2000 p. 26. T. Ebrahimi, E. Reusens, W. Li, and P. Cicconi. New Trends in Very Low Bitrate Video
Coding. Proceedings of the IEEE, July 1995.
III. Internet
Acquiring and Digitizing Media. <http://www.doit.wisc.edu/services/streaming/tutorial/transcripts/transcripts6.htm> 2001. Accessed in September, 2002.
"A review of Video Streaming over the Internet". SuperNOVA Project. DSTC Teaching
Report. <http://archieve.dstc.edu.au/RDUstaff/Jane-hunter/videostreaming.html>, 10
August 1997. Accessed in August, 2002. Adobe Premier. <http://www.adobe.com/prodcts/premier/overview.html> 4/24/2002.
Accessed in December, 2002. Adobe premiere 6.5 help file. Cunningham, David and Francis, Neil. "An introduction to Streaming Video". Cultivate
Interactive, Issue 4. <http://www.cultivate-int.org/issue4/video/> 7May, 2001. Accessed in August,
2002. Chromakey techniques: Advanced. <http://www.mvcc.net/comm/Tips/shtm> (2001).
Accessed in August, 2002. Capturing from Digital Sources.
<http://www.microsoft.com/windowsxp/expertzone/columns/bridgman/02february18.asp> : posted February 18, 2002. Accessed in December, 2002.
92
Crowell, Nancy. "How Do Your Media Add Up"?. <http://www.workz.com/cgi-
bin/gt/tpl_page.html,template=1&content=1897&nav1=1&> 2003. Accessed in March, 2003.
Divx Networks. <http://divxnetworks.com/,2001> (2001). Accessed in September, 2002. Dario Luparello, Sarit Mukherjee and Sanjoy Paul. "Streaming Media Traffic: An
Empirical Study", <www.bell-labs.com/user/sanjoy/streaming-media-edgix.doc> (2000). Accessed in September, 2002.
Helix Producer. <http://www.realnetworks.com/products/producer/> Accessed in
February, 2003. Intel® Realistic Display Mixer (RDX). <http://www.intel.com/labs/archive/rdx.htm>
Accessed in January, 2003. Intel Indeo® Video 5. <http://www.siggraph.org/education/materials/HyperGraph
/video/codecs/indeo_v5/overview.htm> (1997). Accessed in January, 2003. International Telecommunication Union, ITU-T (Telecommunication Standardization
Sector of ITU: Series P: Telephone Transmission Quality-Methods for objective and subjective assessment of quality. <http://www.doc.ua.pt/arch/itu/rec/product/p.html> June, 2002. Accessed in June, 2003.
International Telecommunication Union, ITU-R (Radio Communication Sector of ITU):
Methodology for the Subjective Assessement of the Quality of Television Pictures. <http://www.itu.int/itudoc/itu-r/archives/rsg/1996-97/rsg11/34813.html> 2000. Accessed in June, 2003.
Interactive Model-Based Coding for Face Metaphor User: Interface in Network
Communciations. <http://www.iuiconf.org/97pdf/1997-002-0036.pdf> (1997). Accessed in August, 2002.
John D. Micheletti and Malachi J. Wurpts, "Applying Chorma-Keying Techniques in a
Virtual Environment". <http://www.tss.swri.edu/pub/2000AEROSENSE_HMD.htm> (2000). Accessed in July, 2002.
Larson, Don. (1996), "Does Multimedia Have a Dark Side"?
<http://wwwwebdeveloper.com/multimedia/mutimedia_dark_side.html> (1996). Accessed in November, 2002.
Microsoft Windows Media Player. <http://www.microsoft.com/windows/mediaplayer>
June 2002. Accessed in July, 2002.
93
Nielsen, Jakob, "Video and Streaming Media,"
<http:www.useit.com/alertbox/990808.html> August 8, 1999. Accessed in October, 2002.
P. Westerink, L. Amini, S. Veliah W. "A Live Intranet Distance Learning System Using
MPEG-4 over RTP/RTSP," IEEE, 601-604. <http://www.informatik.uni-trier.de/~ley/db/conf/icme2000html> (2000). Accessed in August, 2002.
Radosevich, Lynda and Fitzoff, Emily, "Damming The Stream,".
<http:www.britannica.com/bcom/magazine/article/0,5744,212643,00.html> March 2, 1998. Accessed in August, 2002.
Streaming Methods. Web Server Vs. Streaming Media Server. <http://www.microsoft.com/Windows/windowsmedia/compare/webservvstreamse
rv.asp> Last updated Thrusday, March 21, 2001. Accessed in August, 2002. Sehgal, Rajeev. "Net Video’s Obstacle to a steady stream," <http://news.com.com/2100-
1023-900617.html> November, 2002. Accessed in November, 2002. Speech Coding. <http://cslu./cse.ogi.edu/HLT survey/ch10node4html> Accessed in
October, 2002. Tricia Gill, "An Introduction to Windows Media Encoder 7.1," <http://msdn.microsoft.com/library/default.asp?url=/library/en-
us/dnwmt/html/encode71.asp> May 16, 2001. Accessed in September, 2002. Tricia Gill, "An Introduction to Windows Media 8 Encoding Utility," <http://msdn.microsoft.com/library/default.asp?url=/library/en-
us/dnwmt/html/wmencodutil.asp> March 16, 2001. Accessed in September, 2002. Very Low Bitrate Coding of Virtual Human Animation in MPEG-4.
<http://www.research.att.com/projects/tts/papers/2000_ICME/Coding pdf> (2000). Accessed in September, 2002.
Video over the Internet. <http://www.matrox.tv/includes/pdf> Accessed in May, 2003. <http://www.zdnet.com> updated may 31, 2000. Accessed in July 2002. <http://service.real.com/help/library/guides/productiongiq/HTML/htmfiles/
realsys.htm #63065>. Accessed in October 2002. <http://www.cs.csustan.edu/~framirez/video.html> updated January 31, 2001. Accessed
in December, 2002. <http://www.viewcast.com> 2000. Accessed in February, 2003.
94
<http://www.tigerdirect.com/applications/Category/category_slc.asp?Id=2806> 2002. Accessed in January, 2003.
<http://www.apple.com/finalcutpro> 2003. Accessed in January, 2003. <http://www.sonicfoundary.com/PRODUCTS/minisites/vegas3/video-eit.htm> Accessed
in December, 2002. <http://support.jp.dell.com/docs/video/dazzle/sw/en/Index.htm> Initial release on
November, 2000. Accessed in February, 2003. <http://www.realnetworks.com/solutions/leadership/realvideo.html> Accessed in
December, 2002. <http://www.videomach.com/VideoMach.html> Accessed in October, 2002. <http://www.siggraph.org/education/materials/HyperGraph/video/codecs/
PureVoice.html> 2002. Accessed in June, 2003. <www.microsoft.com/windowmedia> Accessed in July, 2002. <http://www.sonicfoundry.com/PRODUCTS/showproduct.asp?
PID=612&FeatureID=5447> Accessed in August, 2002.
A P P E N D I C E S
96
APPENDIX – A
Sample Video Frame Encoded in Window Media Encoder
Encoded in Window Media Encoder (file format .wmv, ISO-MPEG-4 codec)
Encoded in Window Media Encoder 7.1
(file format .wmv, Window Media Video 8 codec)
97
APPENDIX – B
Sample Video Frame Encoded in Vegas Video
Encoded in Vegas Video 3.0 (file format .wmv, ISO MPEG-4 codec)
Encoded in Vegas Video 3.0 (file format .wmv, Window Media Video 8 codec)
98
APPENDIX – C
Sample Video Frame Encoded in Helix Producer Basic and Vegas Video
Encoded in Window Helix Producer Basic (file format .rm, Window Media Video 9 codec)
Encoded in Vegas Video (file format .rm, Real Video 8 codec)
99
APPENDIX – D
Sample Video Frame Encoded in Quick Time Pro and Vegas Video
Encoded in Vegas Video (file format .mov, ISO MPEG-4 codec)
Encoded in Quick Time Pro (file format .mov, ISO MPEG-4 codec)
100
APPENDIX – E
Methodology for the Subjective Assessment of the Quality of Television Pictures Rec. ITU-R BT.500-10
The ITU Radiocommunication Assembly,
considering a) that a large amount of information has been collected about the methods used in
various laboratories for the assessment of picture quality; b) that examination of these methods shows that there exists a considerable measure
of agreement between the different laboratories about a number of aspects of the tests;
c) that the adoption of standardized methods is of importance in the exchange of information between various laboratories;
d) that routine or operational assessments of picture quality and/or impairments using a five-grade quality and impairment scale made during routine or special operations by certain supervisory engineers, can also make some use of certain aspects of the methods recommended for laboratory assessments;
e) that the introduction of new kinds of television signal processing such as digital coding and bit-rate reduction, new kinds of television signals using time-multiplexed components and, possibly, new services such as enhanced television and HDTV may require changes in the methods of making subjective assessments;
f) that the introduction of such processing, signals and services, will increase the likelihood that the performance of each section of the signal chain will be conditioned by processes carried out in previous parts of the chain,
recommends 1 that the general methods of test, the grading scales and the viewing conditions for
the assessment of picture quality should be used for laboratory experiments and whenever possible for operational assessments;
2 that, in the near future and notwithstanding the existence of alternative methods
and the development of new methods, Recommendation should be used when possible; and
3 that, in view of the importance of establishing the basis of subjective assessments,
the fullest descriptions possible of test configurations, test materials, observers, and methods should be provided in all test reports.
101
Description of assessment methods
1 Introduction
Subjective assessment methods are used to establish the performance of television systems using measurements that more directly anticipate the reactions of those who might view the systems tested. In this regard, it is understood that it may not be possible to fully characterize system performance by objective means; consequently, it is necessary to supplement objective measurements with subjective measurements.
In general, there are two classes of subjective assessments. First, there are assessments that establish the performance of systems under optimum conditions. These typically are called quality assessments. Second, there are assessments that establish the ability of systems to retain quality under non-optimum conditions that relate to transmission or emission. These typically are called impairment assessments.
To conduct appropriate subjective assessments, it is first necessary to select from the different options available those that best suit the objectives and circumstances of the assessment problem at hand.
The purpose of this Annex is limited to the detailed description of the assessment methods. The choice of the most appropriate method is nevertheless dependent on the service objectives the system under test aims at. The complete evaluation procedures of specific applications are therefore reported in other ITU-R Recommendations. 2 Common features
General viewing conditions for subjective assessments are given. Specific viewing conditions, for subjective assessments of specific systems, are given in the related Recommendations.
2.1 General viewing conditions
Different environments with different viewing conditions are described.
The laboratory viewing environment is intended to provide critical conditions to check systems. General viewing conditions for subjective assessments in the laboratory environment.
The home viewing environment is intended to provide a means to evaluate quality at the consumer side of the TV chain. General viewing conditions reproduce a near to home environment. These parameters have been selected to define an environment slightly more critical than the typical home viewing situations.
3 Selection of test methods
A wide variety of basic test methods have been used in television assessments. In practice, however, particular methods should be used to address particular assessment problems. A survey of typical assessment problems and of methods used to address these problems is given in Table 1.
102
TABLE 1
Selection of test methods
4 The double-stimulus impairment scale (DSIS) method (the EBU method) 4.1 General description
A typical assessment might call for an evaluation of either a new system, or the effect of a transmission path impairment. The initial steps for the test organizer would include the selection of sufficient test material to allow a meaningful evaluation to be made, and the establishment of which test conditions should be used. If the effect of parameter variation is of interest, it is necessary to choose a set of parameter values which cover the impairment grade range in a small number of roughly equal steps. If a new system, for which the parameter values cannot be so varied, is being evaluated, then either additional, but subjectively similar, impairments need to be added.
The double-stimulus (EBU) method is cyclic in that the assessor is first presented
with an unimpaired reference, then with the same picture impaired. Following this, he is asked to vote on the second, keeping in mind the first. In sessions, which last up to half an hour, the assessor is presented with a series of pictures or sequences in random order and with random impairments covering all required combinations. The unimpaired picture is included in the pictures or sequences to be assessed.
The method uses the impairment scale, for which it is usually found that the stability of the results is greater for small impairments than for large impairments.
Assessment problem Method used Description
Measure the quality of systems relative to a reference
Double-stimulus continuous quality-scale (DSCQS) method(1)
Rec. ITU-R BT.500, § 5
Measure the robustness of systems (i.e. failure characteristics)
Double-stimulus impairment scale (DSIS) method(1)
Rec. ITU-R BT.500, § 4
Quantify the quality of systems (when no reference is available)
Ratio-scaling method(2) or categorical scaling (under study)
Report ITU-R BT.1082
Compare the quality of alternative systems (when no reference is available)
Method of direct comparison, ratio-scaling method(2) or categorical scaling (under study)
Report ITU-R BT.1082
Identify factors on which systems are perceived to differ and measure their perceptual influence
Method under study Report ITU-R BT.1082
Establish the point at which an impairment becomes visible
Threshold estimation by forced-choice method or method of adjustment (under study)
Report ITU-R BT.1082
Determine whether systems are perceived to differ
Forced-choice method (under study) Report ITU-R BT.1082
Measure the quality of stereoscopic image coding
Double stimulus continuous quality-scale (DSCQS) method(3)
Rec. ITU-R BT.500, § 5
Measure the fidelity between two impaired video sequences
Simultaneous double stimulus for continuous evaluation (SDSCE) method
Rec. ITU-R BT.500, § 6.4
Compare different error resilience tools Simultaneous double stimulus for continuous evaluation (SDSCE) method
Rec. ITU-R BT.500, § 6.4
(1) Some studies on contextual effects were carried out for the DSCQS and the DSIS methods. It was found that the results of the DSIS method are biased to a certain degree by contextual effects.
(2) Some studies suggest that this method is more stable when a full range of quality is available. (3) Due to the possibility of high fatigue when evaluating stereoscopic images, the overall duration of a test session should be
shortened to be less than 30 min.
103
Although the method sometimes has been used with limited ranges of impairments, it is more properly used with a full range of impairments.
4.2 General arrangement
The way viewing conditions, source signals, test material and the observers and the presentation of results are defined or selected.
The generalized arrangement for the test system should be as shown in Fig. 1.
FIGURE 1
FIGURE 0500-02 = 9 CM
The assessors view an assessment display which is supplied with a signal via a timed switch. The signal path to the timed switch can be either directly from the source signal or indirectly via the system under test. Assessors are presented with a series of test pictures or sequences. They are arranged in pairs such that the first in the pair comes direct from the source, and the second is the same picture via the system under test.
4.3 Presentation of the test material
A test session comprises a number of presentations. There are two variants to the structure of presentations, I and II outlined below.
Variant I: The reference picture or sequence and the test picture or sequence are presented only once as is
shown in Fig. 2a). Variant II: The reference picture or sequence and the test picture or sequence are presented twice as is
shown in Fig. 2b).
104
Variant II, which is more time consuming than variant I, may be applied if the
discrimination of very small impairments is required or moving sequences are under test.
FIGURE 2
FIGURE 0500-03 = 13 CM
Phases of presentation
T1 = 10 s Reference picture T2 = 3 s Mid-grey produced by a video level of around 200 mV T3 = 10 s Test condition T4 = 5-11 s Mid-grey
Experience suggests that extending the periods T1 and T3 beyond 10 s does not improve the assessors' ability to grade the pictures of sequences.
4.4 Grading scales The five-grade impairment scale should be used:
a) Variant I
b) Variant II
105
5 imperceptible 4 perceptible, but not annoying 3 slightly annoying 2 annoying 1 very annoying Assessors should use a form which gives the scale very clearly, and has numbered boxes or some other means to record the gradings. 4.5 The introduction to the assessments At the beginning of each session, an explanation is given to the observers about the type of assessment, the grading scale, the sequence and timing. The range and type of the impairments to be assessed should be illustrated on pictures other than those used in the tests, but of comparable sensitivity. It must not be implied that the worst quality seen necessarily corresponds to the lowest subjective grade. Observers should be asked to base their judgement on the overall impression given by the picture, and to express these judgements in terms of the wordings used to define the subjective scale. 5 The double-stimulus continuous quality-scale (DSCQS) method 5.1 General description A typical assessment might call for evaluation of a new system or of the effects of transmission paths on quality. The double-stimulus method is thought to be especially useful when it is not possible to provide test stimulus test conditions that exhibit the full range of quality. The method is cyclic in that the assessor is asked to view a pair of pictures, each from the same source, but one via the process under examination, and the other one directly from the source. He is asked to assess the quality of both. In sessions which last up to half an hour, the assessor is presented with a series of picture pairs (internally random) in random order, and with random impairments covering all required combinations. 5.2 General arrangement The way viewing conditions, source signals, test material, the observers and the introduction to the assessment are defined or selected. The generalized arrangement for the test system should be as shown in Fig. 3.
FIGURE 3
106
5.3 Presentation of the test material
A test session comprises a number of presentations. For variant I which has a single observer, for each presentation the assessor is free to switch between the A and B signals until the assessor has the mental measure of the quality associated with each signal. The assessor may typically choose to do this two or three times for periods of up to 10 s. For variant II which uses a number of observers simultaneously, prior to recording results, the pair of conditions is shown one or more times for an equal length of time to allow the assessor to gain the mental measure of the qualities associated with them, then the pair is shown again one or more times while the results are recorded. The number of repetitions depends on the length of the test sequences. For still pictures, a 3-4 s sequence and five repetitions may be appropriate. For moving pictures with time-varying artefacts, a 10 s sequence with two repetitions may be appropriate. Where practical considerations limit the duration of sequences available to less than 10 s, compositions may be made using these shorter sequences as segments, to extend the display time to 10 s. In order to minimize discontinuity at the joints, successive sequence segments may be reversed in time (sometimes called “palindromic” display). Care must be taken to ensure that test conditions displayed as reverse time segments represent
107
causal processes, that is, they must be obtained by passing the reversed-time source signal through the system under test. 5.4 Grading scale
The method requires the assessment of two versions of each test picture. One of each pair of test pictures is unimpaired while the other presentation might or might not contain an impairment. The unimpaired picture is included to serve as a reference, but the observers are not told which is the reference picture. In the series of tests, the position of the reference picture is changed in pseudo-random fashion.
The observers are simply asked to assess the overall picture quality of each
presentation by inserting a mark on a vertical scale. The vertical scales are printed in pairs to accommodate the double presentation of each test picture. The scales provide a continuous rating system to avoid quantizing errors. The associated terms categorizing the different levels are the same as those normally used; but here they are included for general guidance and are printed only on the left of the first scale in each row of ten double columns on the score sheet. Figure 4 shows a section of a typical score sheet. Any possibility of confusion between the scale divisions and the test results is avoided by printing the scales in blue and recording the results in black.
FIGURE 4
Portion of quality-rating form using continuous scales*
108
6 Adjectival categorical judgement methods
In adjectival categorical judgements, observer assign an image or image sequence to one of a set of categories that, typically, are defined in semantic terms. The categories may reflect judgements of whether or not an attribute is detected (e.g. to establish the impairment threshold). Categorical scales that assess image quality and image impairment, have been used most often, and the ITU-R scales are given in Table 2 and 3. In operational monitoring, half grades sometimes are used. Scales that assess text legibility, reading effort, and image usefulness have been used in special cases.
TABLE 2
ITU-R quality scales
Five-grade scale
Quality Score
Excellent Good Fair Poor Bad
5 4 3 2 1
TABLE 3
ITU-R impairment scales
Image Impairment Score
Imperceptible
Perceptible, but not annoying
Slightly annoying
Annoying
Very annoying
5
4
3
2
1
This method yields a distribution of judgements across scale categories for each
condition. The way in which responses are analysed depends upon the judgement (detection, etc.) and the information sought (detection threshold, ranks or central tendency of conditions, psychological “distances” among conditions).
6.1 Non-categorical judgement methods
In non-categorical judgements, observers assign a value to each image or image sequence shown. There are two forms of the method.
109
In continuous scaling, a variant of the categorical method, the assessor assigns
each image or image sequence to a point on a line drawn between two semantic labels (e.g. the ends of a categorical scale as in Table 3). The scale may include additional labels at intermediate points for reference. The distance from an end of the scale is taken as the index for each condition.
In numerical scaling, the assessor assigns each image or image sequence a number that reflects its judged level on a specified dimension (e.g. image sharpness). The range of the numbers used may be restricted (e.g. 0-100) or not. Sometimes, the number assigned describes the judged level in “absolute” terms (without direct reference to the level of any other image or image sequence as in some forms of magnitude estimation. In other cases, the number describes the judged level relative to that of a previously seen “standard” (e.g. magnitude estimation, fractionation, and ratio estimation). Both forms result in a distribution of numbers for each condition. The method of analysis used depends upon the type of judgement and the information required (e.g. ranks, central tendency, psychological “distances”).
110
APPENDIX – F
Methods for Subjective Determination of Transmission Quality 1 Scope
This Recommendation contains advice to Administrations on conducting subjective tests of transmission quality in their own laboratories. It does not however deal with types of tests described in detail in other ITU–T Recommendations and documentation, namely:
a) determination of Reference and Relative Equivalents – see Handbook on
Telephonometry, Geneva, 1993; b) determination of Loudness Ratings – see Recommendation P.78; c) determination of Articulation Ratings (A.E.N. values) – see Handbook on
Telephonometry, Geneva, 1993.
Neither does it deal with the various kinds of specialized tests used in the course of developing items of telephone equipment, for the purpose of diagnosing faults and shortcomings, such as Diagnostic Rhyme Tests [1] and other tests dedicated to the study of specific aspects of speech output.
This Recommendation gives the approved methods which are considered to be
suitable for determining how satisfactorily given telephone connections may be expected to perform.
The methods indicated here are intended to be generally applicable whatever the
form of degradation factors present. Examples of degrading factors include: loss (often frequency dependent); circuit noise; transmission errors (random bit errors as well as erased frames that occur in systems such as mobile communications); environmental noise; sidetone; talker echo; non-linear distortion of various kinds including low bit-rate encoding; propagation time; harmful effects of voice-operated devices; distortions of the time scale arising from packet switching; and time-varying degradations of the communication channel, including those arising in loudspeaking sets. Combinations of two or more of such factors also have to be catered for. Further guidance for specific applications is available in Recommendations P.830 (digital speech codecs), P.84 (DCME/PCME), and P.85 (speech output devices).
2 References
The following Recommendations and other references contain provisions that, through reference in this text, constitute provisions of this Recommendation. At the time of publication, the editions indicated are valid. All Recommendations and other references are subject to revision; all users of this Recommendation are therefore encouraged to investigate the possibility of applying the most recent edition of the
111
Recommendations listed below. A list of the currently valid ITU–T Recommendations is regularly published.
– IEC Publication 1260: 1995, Electroacoustics – Octave-band and fractional – Octave-band filters. – IEC Publication 581-5: 1981, High fidelity audio equipment and systems; Minimum performance
requirements – Part 5: Microphones. – IEC Publication 651: 1979, Sound level meters. (Amendment 1-1993) (Corrigendum March 1994). – ISO 266: 1975, Acoustics – Preferred frequencies for measurements. – ISO 1996-1: 1982, Acoustics – Description and measurement of environmental noise – Part 1: Basic
quantities and procedures. – ISO 1996-2: Acoustics – Description and measurement of environmental noise – Part 2: Acquisition
of data pertinent to land use. – ISO 1996-3: 1987, Acoustics – Description and measurement of environmental noise – Part 3:
Application to noise limits. – ITU-T Recommendation G.113 (1996), Transmission impairments. – CCITT Recommendation G.722 (1988), 7 kHz audio-coding within 64 kbit/s. – CCITT Recommendation G.726 (1990), 40, 32, 24 and 16 kbit/s Adaptive Differential Pulse Code
Modulation (ADPCM). – CCITT Recommendation G.728 (1992), Coding of speech at 16 kbit/s using low-delay code excited
linear prediction. – ITU–T Recommendation G.729 (1996), Coding of speech at 8 kbit/s using Conjugate-Structure
Algebraic-Code-Excited Linear-Prediction (CS-ACELP). – ITU–T Recommendation P.10 (1993), Vocabulary of terms on telephone transmission quality and
telephone sets. – ITU–T Recommendation P.11 (1993), Effect of transmission impairments. – CCITT Recommendation P.48 (1988), Specification for an intermediate reference system. – ITU–T Recommendation P.56 (1993), Objective measurement of active speech level. – ITU–T Recommendation P.78 (1993), Subjective testing method for determination of loudness ratings
in accordance with Recommendation P.76. – ITU–T Recommendation P.810 (1996), Modulated Noise Reference Unit (MNRU). – CCITT Recommendation P.82 (1984), Method for evaluation of service from the standpoint of speech
transmission quality. – ITU–T Recommendation P.830 (1996), Subjective performance assessment of telephone-band and
wideband digital codecs. – ITU–T Recommendation P.84 (1993), Subjective listening test method for evaluating digital circuit
multiplication and packetized voice systems. – ITU–T Recommendation P.85 (1994), A method for subjective performance assessment of the quality
of speech voice output devices. 3 Definitions
For the purposes of this Recommendation, the following definitions apply:
3.1 dBov: dB relative to the overload of a digital system. 3.2 Q: The ratio, in dB, of speech power to modulated noise power in the Modulated Noise Reference Unit, as described in Recommendation P.810.
112
4 Conventions
Subjective evaluation of telecommunications equipment and systems may, in principle, be conducted using listening-only or conversational methods of subjective testing. As a practical matter, listening-only tests may be the only feasible method of subjective testing during the development of new transmission equipment or telecommunication services. 5 Recommended methods 5.1 Opinion scale
The following opinion scales are those recommended by the ITU-T.
5.1.1 Conversation opinion scale
Various five-point category-judgement scales may be used for different purposes. The layout and wording of opinion scales, as seen by subjects in experiments, is very important, and should follow the standard arrived at through years of experience. The following opinion scale is the most frequently used for ITU-T applications and equivalent wording should be used depending on language which might result in small variations to the original English text.
This is a category rating obtained from each subject at the end of each
conversation. Opinion of the connection you have just been using Excellent
Good
Fair
Poor
Bad
The experimenter allocates the following values to the scores: Excellent = 5; Good = 4; Fair = 3; Poor = 2; Bad = 1
113
5.1.2 Difficulty scale
This is a binary response obtained from each subject at the end of each conversation.
Did you or your partner have any difficulty in talking or hearing over the connection?
Yes
No
The experimenter allocates the following values to the scores: Yes = 1 No = 0 The quantity evaluated (percentage of "yes" responses) is called percentage Difficulty or per cent "Difficult", and is denoted by the symbol %D. The corresponding simple proportion is denoted by the symbol d; in other words, %D = 100d.
NOTE – It is often the case that the nature of the difficulty is required and then it is usual for the experimenter to ask the subject to describe in his/her own words their perception of the difficulty.
5.2 Opinion scales recommended by the ITU-T
Various five-point category-judgement scales may be used for different purposes. The layout and wording of opinion scales, as seen by subjects in experiments, is very important, and should follow the standard arrived at through years of experience. The following opinion scales are those most frequently used for ITU-T applications and equivalent wording should be used depending on language which might result in small variations to the original English text: a) Listening-quality scale Quality of the speech Score
Excellent 5
Good 4
Fair 3
Poor 2
Bad 1
114
b) Listening-effort scale The heading of the listening-effort opinion scale is particularly important. Without it, the other descriptions are liable to be seriously misunderstood. Effort required to understand the meanings of sentences Score Complete relaxation possible; no effort required 5
Attention necessary; no appreciable effort required 4
Moderate effort required 3
Considerable effort required 2
No meaning understood with any feasible effort 1
c) Loudness-preference scale
Loudness preference Score Much louder than preferred 5
Louder than preferred 4
Preferred 3
Quieter than preferred 2
Much quieter than preferred 1
CURRICULUM VITAE
Sushil Kumar Sharma Hekuli – 4, Mouli, Dang, Nepal
Email: [email protected]
PERSONAL INFORMATION
Date of Birth : 18 August, 1971
Mailing Address : G.P.O. Box 4513,
Kathmandu, Nepal
EDUCATIONAL BACKGROUND
Master of Science in Information
Technology
: Saint Louis University
Baguio City, Philippines
August, 2003
Master Degree in Economics : Tribhuvan University
Kathmandu, Nepal
June, 1994
Bachelor of Arts : Tribhuvan Unversity
Kathmandu, Nepal
July, 1992
WORK EXPERIENCES
Administrative Officer : Shiva Nirman Company
Kalimati, Kathmandu, Nepal
January 1998 to October, 2000
Administrative Assistant : National Forensic Science Laboratory
Khumaltar, Lalitpur, Nepal.
February, 1996 to December, 1997