On automatic generation of multimedia presentations

~ Applications NORTfl - I-IOUAN~

On A~atomatie Generation of Multimedia Presentations

CHI-bl ING C H U N G

and

TIMOTHY K. SHIH

Department of Computer Science and Information Engineering, Tamkang University, Tamsui, Taiwan, Republic of China

Communicated by Louis R. Chow

ABSTRACT

We propose a mechanism and system for the automatic generation of interactive multiraedia presentations from their specifications. A presentation specification contains three parts: the resources, the temporal information, and the spatial information. We u:~e an ICON programming technique, and a graphical user interface allows the presentation designer to quickly specify what h e / s h e wants. Our system takes these requiiements and relies on inference rules written in Prolog to generate interactive presentations. These inference rules are based on interval temporal logic and important issues in multimedia presentations,1 such as the hardware limitations of an ordinary personal computer and the properties of a multimedia resource. Our prototype system runs under MS Windows. The early experience of using the system shows that it is feasible to use logic inference rules to assist the design of good multimedia presentations. ©Elsevier Science Inc. 1997

1. INTRODUCTION

A~ multimedia technologies largely increase communication effective- ness between humans and computers, the importance of efficient multime-

This work was supported by NSC Grant NSC 85-2213-E-032-011 of the Republic of China.

INFORMATION SCIENCES 97, 293-321 (1997) © Elsevier Science Inc. 1997 655 Avenue of the Americas, New York, NY 10010

0020-0255/97/$17.00 PII S0020-0255(96)00195-8

294 C.-M. CHUNG AND T. K. SHIH

dia authoring tools is being brought to the attention of both researchers and software vendors. Many presentation or authoring tools were developed for presenters or artists in various fields. Some researchers have developed domain specific presentations using artificial intelligence techniques. For example, COMET (COordinated Multimedia Explanation Testbed) [10, 11] uses a knowledge base and AI techniques to generate coordinated, interactive explanations with text and graphics that illustrates how to repair a military radio receiver-transmitter. We also propose a system [6, 18-20] that uses object-oriented techniques to incorporate Expert System inference mechanisms into multimedia presentation designs. The system supports interactive designs of presentation layouts and navigation, incorporated with an underlying inference and learning system that deduces useful outputs for intelligent presentations. In this paper, we propose some new results of our research, especially in multimedia inter- stream synchronizations. Work related to solving multimedia synchronization problems using, interval temporal relations/temporal logic includes that discussed in [2, 13, 8, 7, 15, 14]. Other researchers [16, 1, 14] used timed Petri nets. Our approach, based on the 13 relationships between two time intervals proposed in [2], considers multimedia resource properties as important issues in order to make good multimedia presentations. Other authoring systems that discuss the synchronization of multimedia streams can be found in [24, 9].

The related works addressed above are mostly academic research. On the other hand, we also looked at some commercial products related to multimedia authoring or presentation designs:

1. Authorware Professional by Macromedia, Inc. 2. Multimedia Viewer by Microsoft 3. Multimedia Toolbook by Asymetrix Corporation 4. Hypermedia System by ITRI 5. Action! by Macromedia, Inc. 6. Audio Visual Connection by IBM 7. Astound by Gold Disk Inc. 8. Director by Macromedia, Inc.

Authorware uses an event control flow diagram allowing the presenter to specify presentation objects and controls, which can be decomposed into several levels in a hierarchy structure. The system also provides a simple script language for calculation and data manipulation. Other systems (i.e., 2, 3, and 6 above) also provide script languages and API (application

GENERATION OF MULTIMEDIA PRESENTATIONS 295

program interface) functions. Hypermedia System, Action!, and Director use a time line table allowing actions or objects to be dropped in a particular time slot. Most systems allow users to cut and paste presentation objects or actions via button click and drawing. Multimedia Viewer also provides a set of medium editing tools. Presentation objects produced by these l:ools can be linked together by a script language supporting functions, data structures, and commands.

Even the above presentation design systems provide fancy user interface and flexible presentation control mechanisms; these systems, however, have same drawbacks:

• At the beginning of a presentation design, it is possible that the designer may not have a big picture of what the detail presentation schedule should look like. For instance, when a designer uses the Director in the early stage of a design, it is difficult for he / she to precisely lay out the temporal relations among all multimedia resources. As a result, the designer has to modify the timetable many times. Each modification costs some changes of timing of other re.sources not directly related to the modification. This will certainly increase the burden of the designer.

• If an user uses Authorware to design a presentation, it is difficult for tile user to precisely specify the synchronization point between two resources, or among three or more resources. As a result, it is difficult to design, for instance, a presentation with a special sound effect which occurs at a particular time point of the presentation.

The purpose of our presentation generator is to reduce a designer's load as much as possible while still allowing the designer to make multimedia presentations. We support the stepwise refinement of a presentation design~. As long as the user knows the relation between two resources, he / she can start the initial design. Other resources are added one by one to the initial design. We also provide an efficient mechanism for the user to precisely control the synchronization points by means of a "synchronizes" specification statement. Multimedia presentations can be designed in ou~r proposed specification language. Or, for the convenience of the user, 'we also develop an ICON programming graphical user interface. A multimedia presentation specification, according to our system, contains three parts:

• Resource Specif ication carried information of multimedia resources to be used in the presentation, which is obtained from our multimedia resource database via a resource browser.

296 C.-M. C H U N G AND T. K. SHIH

• Temporal Specification describes the temporal relations among resources, which is specified in a predicate format.

• Spatial Specification provides a layout of a presentation.

Our presentation generator takes as input the above specifications, uses logic inference rules defined in our system, and generates presentation implementation information for a presentation frame [6, 19] (e.g., screen layouts, presentation schedule, or possible error diagnosis). A presentation contains a number of presentation frames. These frames activate each other via message passing. The navigation of a presentation is based on the user's interaction which introduces messages. The interaction, as well as the temporal synchronization requirements and the spatial layouts of multimedia resources, are the three important elements construct a multimedia presentation.

This paper is organized as follows. Section 2 introduces some temporal operators and functions in order for us to represent the temporal information of a multimedia resource within a presentation. Section 3 presents our specification language, as well as a number of inferences rules to generate presentation implementations from specifications. In Section 4, we discuss an ICON programming mechanism and interface for the user to design the presentation easily. Section 5 addresses some implementation issues, and a sample specification is given. Finally, Section 6 highlights our contributions, and points out some difficult problems for our further research direction.

2. T E M P O R A L OPERATORS AND FUNCTIONS

In order to represent the temporal information of multimedia presentation resources, a representation of a time model is necessary. The time model of our presentation is discrete. That is, a presentation consists of a number of continuous time intervals (or cycles). Next, we need a representation of embed presentation resources within these time intervals. Thus, a number of temporal operators and functions are defined in our inference system for the representation of resource temporal information. The concatenation operator . . . . is to connect two sequential resource streams. The silent operator "_ , " while applying to a number, denotes a silent stream of many cycles. For instance, "_ 10" is a stream of no action which lasts 10 cycles. Note that a silent can be concatenated with another stream

G E N E R A T I O N OF M U L T I M E D I A P R E S E N T A T I O N S 297

using the concatenation operator. The extension operator . . . . extends a resource stream, according to the synchronization extent of that resource (e.g., repeat, keep the last frame, no extensions, etc.). The synchronization extent of mult imedia resources describes how the resources should be presented after its regular ending. We discuss this concept in Section 3. The truncate operator, associated with a number, can be used in two ways. " r 10!" says that resource r is played for the first 10 cycles only. And " ! 1 0 r " denotes that r is played after cutting the first 10 cycles. These two operators can be applied together to a resource if the total cutting t ime is smaller than or equal to the duration of that resource. Otherwise, the presentation of that resource is omitted. The concurrent function $ is an overloaded function that accepts one or more parameters. All resources with tZheir names specified as parameters of the concurrent function start concurrently. The sequential function, denoted by " - , " is also an overload function. The resources specified as parameters of a sequential function are presented one by one. There is no semantic difference between the sequential function and the concatenation operator. However, sequential functi,ons serve as the principle functors of the final representation of a multimedia presentation. The concatenation operators are used in the intermediate process, or used as parts of the final representation of a presentation. The last function is the identical function " # . " This function is similar to the concurrent function with a further restriction indicating that all resources end at the same time as well. The following table sumnlarizes these operators and functions.

Temporal Operators and Functions

Operators/ Name Functions Usages

Concatenation rl ~r2 Silent _ _ 10 or _o~ ]Extension r- 10 or r-oo Truncate ! r 10! or !10 r Concurrent $ $(rl), $(rl, r2) or $(rl, r2, r3), etc. :~equential - - (rl), - (rl, r2) or - (rl, r2, r3), etc. Identical # #(rl), #(rl, r2) or #(rl, r2, r3), etc.

T h e fol lowing tab le shows s o m e e x a m p l e s of using these operators / funct ions .

298 C.-M. C H U N G A N D T. K. S H I H

Examples of Temporal Arithmetic and Meanings

Examples Presentation meanings

rl^r2^r3 rl^$(r2,r3) $(r l '~ ,r2^_10)

$(rl 10!, !10 r2)

- ( r l , r2) #(rl , r2, r3)

Sequential presentation of rl, r2, and r3. Play rl first, and then play r2 and r3 concurrently. Play rl, extend it to the end; at the same time, also

play r2, and keep the stream of r2 silent for 10 cycles. Note that the presentation lasts the duration of r2 plus 10 cycles.

Play rl for 10 cycles, cut rl, and play r2 after cutting the front 10 cycles concurrently.

Play rl first, then play r2. rl, r2, and r3 start and end at the same time.

3. P R E S E N T A T I O N S P E C I F I C A T I O N S

In this section, we p ropose a n u m b e r of presenta t ion specification s ta tements as well as some inference rules for the au tomat ic genera t ion of mul t imedia presentat ions . The specification s ta tements are used in our system as internal representa t ions . They are quite difficult to be used directly by the user. In Section 4, we will discuss how our graphical user interface is able to make the design of p resen ta t ions much easier.

3.1. MULTIMEDIA RESOURCE PROPERTIES

To crea te a high-quali ty mul t imedia presenta t ion , not only is a good presen ta t ion designing env i ronmen t essential, but good mul t imed ia resources are key. Mul t imedia resources are recorded or cap tu red via camera , tape recorder , or v ideo camera , conver ted to their digital formats , and saved on the disk. These resource files can be used in different presentat ions . A resource is associated with a n u m b e r of at tr ibutes. We consider the following at t r ibutes for a mul t imed ia resource:

• N a m e : n a m e of the resource. • Keyword: one or m o r e keywords are used as the descript ion of a

mul t imedia resource. For instance, n a m e of the city is a keyword of the b i t m a p p e d picture of Paris.

• Usage: how the resource is used (e.g., background, navigation, or focus).

G E N E R A T I O N OF M U L T I M E D I A PRESENTATIONS 299

• M e d i u m : what multimedia device is used to carry out this resource (e.g., sound, video, MPEG, or picture).

• l d o d e l : how the resource is presented (e.g., table, map, chart, or spoken language).

• T e m p o r a l e n d u r a n c e : how long the resource lasts in a presentation (e.g., 20 seconds or permanent).

• S y n c h r o n i z a t i o n to lerance: how a participant feels about the synchronization delay of a resource (e.g., small, medium, or large). For instance, a user usually expects the immediate response after pushing a button for the next page of text. But the user might be able to l:olerate a video play being delayed for 2 seconds.

• S y n c h r o n i z a t i o n extent: what kind of action the system should take after the presentation of a resource is finished. For instance, the ~system can repeat a piece of music, keep the last frame of a video, or simply drop a picture.

• Detec tabi l i ty : how the resource attracts a listener (e.g., high, medium, or low).

• S tar tup delay: the time between a message being used and the corresponding resource being presented, especially when the resource is on a remote computer connected via network.

• H a r d w a r e l imi ta t ion: what kind of hardware is essential for carrying out the resource (e.g., MPC level 1, level 2, and level 3, or other limitations).

• Vers ion: the version of this resource file. • Da te : the date and time this resource file is created. • R e s o l u t i o n : the resolution of this resource file, specified by X X Y (for

0 x 0 for sound) screen units. • S t a r t ~ e n d t ime: for nonpermanent resources, the starting cycle and

the ending cycle of a video play, sound play, or other resources, especially used as a presentation resource. A cycle can be a second, one-tenth of a second, or a frame number of a video/animation.

• R e s o u r c e descriptor: a logical descriptor to a physical resource data segment on the disk.

• A s s o c i a t i o n l inks: pointers to other resources who have the coexis- tence relation with the current resource)

Since each resource has a number of attributes, for the user to find a suitzble resource for h i s /he r presentation, if a query needs to contain all

1Association links are primitive connectors that serve as tools to group resource objec, ts into a reusable object in our multimedia database.


of these attributes, it will be tiresome. Thus, we propose an intelligent mechanism that makes a query easier. The database is built with a number of inference rules. Each rule describes an i f - then relation between two attributes. The following are some of the rules used in our database:

If usage : focus then detectability : high

If model : illustration then medium : picture

If medium = picture then temporal_endurance = permanent

If medium = MPEG then hardware_limitation = MPEG_card

If model = map then medium = picture

If ...

Some unspecified attributes in a query can be deduced from others. Thus, a user does not need to specify all attributes of a resource while using a query. A discussion of a multimedia resource browser who facili- tates queries is given in Section 4.

3.2. RESOURCE SPECIFICATION

The kinds of properties we consider here are essential for presentation generations. For instance, properties related to time and space are included. Other properties, such as keywords, are not included while the user is issuing the resource specification. Only those resources used in a presentation are specified in the resource specification by the resource browser. And only those attributes related to the automatic generation of presentations will be included in the resource specification statements. Each resource is given a unique name, which maps to a resource descriptor (e.g., the file name or a database entry). Temporal endurance, indicated by an integer as the number of cycles, specifies how long a resource lasts in the presentation. The reserved word 09 represents a permanent temporal endurance. For example, a picture can last as long as possible until it is dropped from its presentation window. Synchronization extent specifies how a resource is extended if requested. Some resources may not be extended. And some resources could end with a fade-out effect} Detectability indicates how sensitively a resource attracts the user. Re-

eWe also consider other special effects such as "rain," "title," etc.


source:~ that occupy screen space will be given their resolutions. Otherwise, a resolution of 0"0 is given (e.g., for sound or music). Note that we introduce a silent resource as a time slot holder which takes no action in a presentation.

Resource Proper t ies

Tempora l Sync Resource Type Name endurance extent Detectabil i ty Resolu t ion descr ip tor

Sound integer no I repea t I s I m I 1 0"0 Uni ts "fn.wav" fade out

Animat ion integer last f rame I s I m ] 1 X * Y Uni ts "fn.fli" fade out

Video resource integer last f rame ] s ] m I 1 X*Y Uni t s "fn.avi" fade out

Picture names oo yes I fade out s I m I1 X*Y Uni t s " fn .bmp" Text co yes [ fade out s I m [1 X * Y Uni ts "fn . r t f" MIDI integer no [ repea t I s I m [ 1 0"0 Uni ts " fn ,mid"

fade out Silent integer yes none 0"0 Uni ts " fn , tmp"

A presentation consists of multiple streams carrying multiple resources. Some streams, due to the limitation of hardware, may not be played concurrently. On some occasions, it is not appropriate to play two resources at the same time since it is hard for a person, for instance, to watch two videos simultaneously. Or, according to everyday presentation experiences, some streams are encouraged to be played concurrently. For instance, it is nice to have an MIDI music background for an animation play. Our approach is to reduce the load of the user by giving these good suggestions. However, if the user is strongly against our suggestion, it is still possible for he / she to change the final presentation generated. Given two ~tpes of resources in a specification, we consider the following relations:

• m u t u a l exc lus ive (represented by an x): two resources cannot be played at the same time

• pos s ib l e c o n c u r r e n t (represented by 7): two resources could be played at the same time; however, it is the decision of the user to play them simultaneously

• m u t u a l inc lus ive (represented by an ©): two resources are encouraged ~Lo be played concurrently.


The following table summarizes these relations used between two types of resources.

Resource Relations

Sound Animation Video Picture Text MIDI

x o ? /o o o ? /x ? /x ? /x o o o

X 0 O ? / 0

0 0 0

0 0

X

Sound Animation Video Picture Text MIDI

Note that, in some situations, we suggest more than one relation (e.g., ? /0 ) . In this case, the user is asked to provide the decision first. If a decision is not given, the second relation (i.e., O) is used.

When two types of resources are presented in the same interval, if the hardware is able to carry out only one, we need to decide which resource to play. We consider that different types of resources have different presentation priorities. Generally, dynamic resources, such as sound and video, have higher priorities since they can easily catch the attention of the user. Also, resource detectability (i.e., the degree of the resource that attracts a listener) is considered in presentation priority calculations. Generally, if a resource has a high detectability, it should not be omitted from the presentation. The use of priority is to assist the inference rules to make a better presentation. The following table gives the priorities of resource types. The higher the number, the higher is the priority.

Resource Priorities

Resource types Resource type priorities

Video 2 Animation 2 Sound 1 MIDI 1 Picture 0 Text 0 Silent 0

The detectability of a resource, specified in the resource database by the user, can be high, medium, or low. Given two resources, r l and r2 ,


the priority checking expression (denoted by r l > > r 2 ) is decided by the following:

if the detectability of rl is greater than r2 then

rl >> r2 is true

else if the detectability of rl is equal to r2 then

if the resource type priority of rl is greater than

r2 then

:~i >> r2 is true

else

rl >> r2 is false

else

rl >> r2 is false

Resource specifications are given in a Prolog predicate format contain- ing several parameters. The following are examples of resource specifications given as the input to the generator.

resource

{sound, "Explain," 20, no, large,

resource

(animation, "Guide," 40, last_ medium,

frame,

resource

Ivideo, "Tour," 20,no, large,

resource

4picture, "WallPaper," oo, yes, small,

re:3ource

text, "General," oo, yes small,

resource

imidi, "Musicl," 25, repeat, medium,

resource

~silent, "Quiet," i0, yes, no,

"0x0," "explain,wav").

"400x200," "guide.fli").

"400x200," "tour.avi").

"800x400," "wallpap.bmp").

"300x300," "general.rtf").

"0x0," "musicl.mid").

"0,0," "quiet.tmp").

Note that some restrictions may be applied to resource specifications. For instance, there is no screen resolution for sound and MIDI music. 3 Moreover, the temporal endurance for pictures and text files, theoretically,

3Even sound can be recorded in different sampling frequencies and 8-bit or 16-bit options; we are not considering the difference in the generated presentation as long as the underlying hardware works properly.


should be infinite. However , we also allow the user to specify a finite temporal endurance for a picture or text. The not ion of screen resolution is important . Not only does it decide the screen layouts, but the resolution informat ion also works with the temporal information in that any screen region cannot be occupied by two or more resources at the same time for our current system. 4

3.3. TEMPORAL SPECIFICATION

The research discussed in [2] proposes 13 types o f relations between two tempora l intervals. The 13 relations cover all possible situations. However , mul t imedia presenta t ion synchronizat ion needs precise timing information of resources. Some modificat ions to the relations are necessary in order for us to achieve mult imedia resource synchronization. Some statements are given additional a rguments to explicitly indicate a synchronizat ion point. The following table shows our revised relations.

Revised Interval Temporal Relations

Relations Meanings

always(rl, n) meets(r1, r2) before(rl, r2, n) starts(rl, r2) finishes(rl, r2) overlaps(rl, r2, n)

embraces(rl,r2,n)

equal(rl, r2)

simultaneous(r1, nl, r2, n2)

Always present rl for n cycles. r2 is presented right after rl finishes. r2 is presented n cycles after rl finishes. rl and r2 are synchronized at the beginning. rl and r2 are synchronized at the end. rl overlaps r2, rl starts first, r2 starts n

cycles after rl starts, rl ends before r2. rl embraces r2, rl starts first, r2 starts n

cycles after rl starts, r2 ends before rl. rl and r2 carry the same duration

concurrently. The nlth cycle of rl and the n2th cycle of r2

happen at the same time.

Our purpose is to reduce the load of the user by means of automation. We try to make specification s tatements as general as possible while still maintaining all possible temporal relations between two mult imedia resources. Considering our revised relations, it is possible to combine some of them by adding extra parameters . For example, by introducing a delay parameter , we can combine the "mee t s" and the "be fo re" relations. If the

4Overlap regions will be allowed in our next version of software.

G E N E R A T I O N O F M U L T I M E D I A P R E S E N T A T I O N S 305

de lay p a r a m e t e r in the " s equen t i a l " speci f ica t ion is zero, the speci f ica t ion indica tes a " m e e t s " re la t ion . Otherwise , it indica tes a " b e f o r e " re la t ion . No te tha t all t iming p a r a m e t e r s in our speci f ica t ions are nonnega t ive . T h e fol lowing are spec i f ica t ion s t a t emen t s used in ou r system.

Temporal Specification Statements

Specifications Meanings

always(rl, n)

sequential(rl, r2, n) intersects(rl, r2, n)

s,ynchronizes(rl, nl, r2, n2)

The system always presents r l for n cycles. This statement applies to picture or text resources only.

r2 is presented n cycles after r l finishes. r l and r2 intersect each other, r l starts first,

and r2 starts after n cycles. The nlth cycle of r l and the n2th cycle of r2

are synchronized.

O u r speci f ica t ion s t a t emen t s a re the gene ra l i zed vers ion of the revised relat:ions based on Al l en ' s results . Since the t iming a rgumen t s a re p rov ided by the user and the t e m p o r a l e n d u r a n c e o f r e sources can be o b t a i n e d f rom the da tabase , we can m a k e a compa r i son be twe e n our s t a t emen t s and the revised re la t ions . No te tha t " r l . T E " deno te s the t e m p o r a l e n d u r a n c e of resource r l .

Comparison of Temporal Specifications and Relations

Specification statements Conditions Revised relations

always(rl,n) sequential(rl,r2,n) sequential(rl,r2,n) intersects(rl,r2,n) intersects(rl,r2,n) intersects(rl,r2,n) intersects(rl,r2,n)

intersects(rl,r2,n)

intersects(rl,r2,n)

synchronizes (rl ,nl,r2,n2)

synchronizes (rl ,nl ,r2,n2)

synchronizes (rl, nl, r2, n2)

synchronizes (rl, nl, r2, n2)

n = 0 n > 0 n = 0 A rl .TE = r2.TE n = 0 A rl.TE #: r2.TE n > 0 A rl .TE = r2.TE n > 0 A rl.TE ~ r2.TE

A rl.TE < n + r2.TE n > 0/x rl .TE ~ r2.TE

A rl.TIE = n + r2.TE n > 0 A rl.TE ~ r2.TE

A rl.TE > n + r2.TE nl = 0 A n 2 = 0

n l = 0 A n 2 > 0

n l > 0 A n 2 = 0

n l > 0 A n 2 > 0

none meets(rl, r2) before(rl, r2, n) equal(rl, r2) starts(rl, r2) overlaps(rl, r2, n) overlaps(r 1, r2, n)

finishes(rl, r2)

embraces(rl, r2, n)

starts(rl, r2)

intersects(rl,r2,n2)

synchronizes (r2, n2, rl , nl)

simultaneous (rl, nl, r2, n2)


Note that, other transformations are also possible. For instance, synchronizes(rl, rl.TE, r2, r2.TE) = finishes(rl, r2). Also, the "intersects" and the "synchronizes" statements in the third column of the above table can be further translated to relations.

In order to generate temporal implementation from temporal specification, we have a number of inference rules defined in our system to carry out the derivation. Two types of inference rules are designed, specification inference rules and implementation inference rules, to generate interme- dia representations from the specification and to derive the final implementation from the intermediate representations. We use a shorthand notation to extract the temporal endurance (i.e., TE) from a source (designated by r , " r l , "r2, etc.). And r l > > r 2 is the priority checking expression comparing the presentation priorities of resources r l and r2. The relation between two resources (i.e., mutual exclusive, possible concurrent, or mutual inclusive) is represented by an expression like " r l op r2" where "op" can be an "x," a "7," or a "0" for the three types of relations, respectively. In the inference rules, we use the temporal operators and functions discussed in Section 2. The following table gives the specification inference rules.

The Specification In~rence Rules

Inference rules and Specifications intermediate representations

always(rl, n) if rl.TE>=n then

-(rl n!) else

-(rl-n-rl.TE)

-(rl, r2) -(rl, _n, r2)

if rl X r2 then

declare_error(mutual_exclusive)

else if rI.TE>r2.TE

then if rl >> r2

then if rl O r2

then $(rl, r2-rI.TE-r2.TE)

else $(rl, r2 ^ _rI.TE-r2.TE)

else $(rl r2.TE!, r2)

meets(rl, r2) before(rl, r2, n)

starts(rl, r2)


The Specificationln~renceRules--con~nued

Inference rules and Specifications intermediate representations

finishes(rl, r2)

overlap(rl, r2, n)

embrace(rl, r2, n)

else if rl >> r2

then $(rl, r2 rI.TE!)

else if rl O r2

then $(rl-r2.TE-rI.TE, r2)

else $(rl ^ _r2.TE-rI.TE, r2)

if rl X r2 then declare_error(mutual_exclusive)

else if rI.TE > r2.TE

then $(rl, _rI.TE-r2.TE ^ r2)

[ $(!rI.TE-r2.TE rl, r2)

else $(_r2.TE-rI.TE ^ rl, r2)

[ $(rl, !r2.TE-rI.TE r2 if rl X r2 then if rl >> r2 then meets(rl, !rl.TE-nr2 else meets(rln!, r2)

else if rl >> r2 then

starts(rl, _n ^ r2)

[ finishes(rl, !rl.TE-n r2)

else if rl O r2

then

finishes(rl-r2.TE+n-rl.TE, r2)

else starts(!nrl, r2)

I finishes(rl ^ _r2.TE+n-rI.TE, r2)

if rl X r2 then if rl >> r2 then

starts(!nrl, r2) [ finishes(rl n+r2.TE!, r2) else

meets(rln!, r2) & meets(r2, !rI.TE-n-r2.TE rl)


The Specification Inference Rules--continued

Inference rules and

Specifications intermediate representations

equal(rl,r2

simultaneous

(rl,nl,r2,n2)

else

if rl >> r2

then

if rl 0 r2

then

finishes(rl, r2-rl.TE-n-r2.TE)

else

starts(rl, _n ^ r2)

I finishes(rl, r2 ^ _rI.TE-n-r2.TE)

else

starts(!nrl, r2)

I finishes(rl n+r2.TE!, r2)

if rl X r2

then declare_error(mutual_exclusive)

else $(rl,r2)

if rl X r2

then

declare_error(mutual_exclusive)

else

if nl > n2

then starts(rl, _nl-n2 ^ r2)

else starts(_n2-nl ^ rl, r2)

A specification shown on the left side of the table is translated, according to the rules specified on the right, to some intermediate representations. These intermediate representations are translated again to the final form of presentation using some implementation inference rules (to be discussed). Note that it is possible for a rule to generate two or more intermediate forms from a single specification statement. In this case, we use the "&" operator to represent the situation. Also, alternatives are possible, which is denoted by the "[" operator in the inference rules. The declare-error function raising exceptions depends on the exception signal (i.e., mutual-exclusive, possible-concurrent, or mu- tual_inclusive) received. In the exceptional case, the user has to modify his /her presentation specifications to avoid the abnormal situation.

After presentation specifications are translated to their intermediate form, the system takes a further step to simplify the intermediate representations. The following are some of the implementation inference rules used in our system to generate final temporal implementations.

G E N E R A T I O N OF M U L T I M E D I A P R E S E N T A T I O N S

The Implementa t ion Inference Rules

309

Intermediate Equivalent

representations representations

$(rl, r2) & $(r2, r3)

_n2 ^ _n2

rl ^ r2 & r2 ^ r3

$(rl, r2) & rI.TE : r2.TE

$(rl, r2) & rl ^ r3 & r2 ^ r3

#(rl, r2) & r2 ^ r3

#(rl, r2) & r3 ^ r2

rI.TE > r2.TE & $(rl, r2) & r2 ^ r3

S(rl, r2) & r2.TE - rI.TE = n

S(rl ^ r2, rl ^ r3)

S(rl ^ r2, r3 ^ r2)

etc.

9(rl, r2, r3)

_nl+n2

rl ^ r2 ^ r3

#(rl, r2)

#(rl, r2)

rl ^ r3

r3 ^ rl

rl r2.TE! ^ r3

rl-n I r2 rI.TE[

rl ^ $(r2, r3)

$(rl, r3) ^ r2

etc.

The final temporal implementation of a presentation is a sequence of temporal operat ions/ funct ions . Our system needs to take as input the sequence and carries out the actual presentation. A presentation schedule can be represented by a time chart. The X-axis represents time cycles, and the Y-axis represents resource streams. Two types of assertions, the X-assertion and the Y-assertion, are used in constructing the presentation time chart. An X-assertion extends the duration of a stream; a Y-assertion adds a new stream to the presentation.

Algorithm for an X-Assertion

Given a stream S, and - ( R 1 , R2 ) or - ( R 1 , R 2 , R3 ), etc., where Rn represents a resource, the system checks for each R with S. If there exists an :{n in S such that Rn and R n + l are parameters of the sequential func, tion (i.e., - ( . . . . RN, R n + l . . . . )), the system adds Rn and its appending Rs to steam S. If there is a collision in the process of Y-a:ssertion occurs.

Examples:

S = R2, R3, R4, R7, R8

given -(R8, R9),

S = R2, R3, R4, R7, R8, R9

given -(RI, R2),


S : RI, R2, R3,

given -(R4, RS, R6)

S - RI, R2, R3,

S'=

R4, R7, R8, R9

R4, R7, R8, R9

R5, R6 (a Y-assertion)

Algorithm for a Y-Assertion

Given a s t r eam s , and $ (R1, }{2 or $ (R1, R2, R3) , etc., whe re Rn rep resen t s a resource , the system checks for each R with S. I f the re exists an Rn in S such tha t Rn and Rn+ 1 are p a r a m e t e r s o f the concur ren t funct ion (i.e., $ ( . . . . Rn, R n + l . . . . )), the system adds new s t reams within which Rn and Rn+l are ~ n c h r o n i z e d .

Example:

S = RI, R2, R3

given $(R2, R4, R5),

S = RI, R2, R3

S'= R4

S"= R5

The ident ical funct ion # ( . . . . Rn, Rn+l . . . . ) is t r ea t ed the same as the concur ren t funct ion. In the init ial process , one o r m o r e s t reams are c r ea t ed d e p e n d i n g on the given s ta tement . F o r each sequence o r concurrent s t a t emen t in the inference result , the system appl ies one o f the assert ions. I f a s t a t emen t conta ins only resources that do not occur in any o f the existing s t reams in the giant chart , the s t a t emen t r ep resen t s some iso la ted resources . The system signals an except ion cond i t ion to tel l the user to p rov ide a new speci f ica t ion for the i so la ted resources to be in t eg ra t ed to the existing s t reams. The i m p l e m e n t a t i o n g e n e r a t e d mus t mee t the fol lowing ha rdware l imi ta t ion for o rd ina ry PCs:

• one Sound s t ream • one or m o r e A n i m a t i o n s t reams • one V i d e o s t ream 5

5We implement video play using Microsoft Video for Windows. Even Quick Time for Windows supports two video resources to be played concurrently, and the future version of Video for Windows may support multiple video streams; we still suggest that one video stream attracts the user's focus better than two streams.


• one MIDI stream • one or more Picture steams • one or more Text streams.

So far, we have discussed the temporal issues of a presentation. On the other side, the spatial issues of a presentation are also very important. However, using the spatial specification of a presentation to generate the presentation layout automatically is a difficult task. The process needs a not of heuristics in the inference rules, especially for solving the region overlap problem. We leave this topic for our future research.

3.4. E V A L U A T I O N OF G E N E R A T E D P R E S E N T A T I O N S

The presentation generator generates more than one presentation implementation with respect to a specification due to the alternatives allowed in the specification inference rules. The system evaluates these implementations based on the following criteria, and choose the best implementation for the user:

• less cu t : the less cut, the better an implementation • less ex tens ion: the less extension of a resource, the better an imple-

mentation • less s t r eam: the fewer the number of streams, the better an implemen-

tation.

The user is also given an option of previewing each presentation generated and is encouraged to select the best one.

4. AN ICON PROGRAMMING GRAPHICAL USER INTERFACE

A presentation specification is a set of temporal relations among multimedia resources with the spatial information for those resources. Even though presentation specifications have their representations and intermediate forms, it is quite difficult for a presenter who is not familiar with computer programming to design a presentation using presentation specification statements. For the convenience of the user, we develop an ICON programming technique as well as its graphical user interface.

W]aen a presentation designer is about to design h is /her presentation, after the presentation topics and a rough schedule are designed, the designer will collect resources, convert them to some digital forms, and store these resources with the properties discussed in Section 3.1 in our multimedia database [18]. Figure 1 shows that a resource browser user


Eile Edit Temporal ~oatial Slow Insert .Update Delete Help

Video

Resource Icons

JP Intmm.avl Intmm2.avl intmm3.avi

Select ion Criteria: Key Words:

TempomJ Endurance:

Stm'tup Delay:

Hardware Limitation:

Resolution:

Medium:

I~ideo I W Usage: Model:

Ibackgo,u.d I It,,ble ]nmligation ] map

l intelligent multimedia

20 sec

1 sec

I MPC2

280 x 210

Detectability:

I medi.m I [ ]

S o u n d / M I D I

I ~ l i M ~ l m l l m l I I I I 1

Sync. Tolerance:

Start/End Time:

Fig. 1. The mul t imedia resource browser.

G E N E R A T I O N OF MULTIMEDIA PRESENTATIONS 313

interface allows the presenter to inser t /dele te resources, and to retrieve multimedia resources according to properties given in separated text or list boxes. When these properties are decided, the user is able to push the S e l e c t button which names a number of resource ICONs shown in the resource ICONs subwindow. The Video, the Picture/Text/ Animation, and the Sound/MIDI subwindows are to preview these resources selected before and user pushes the Use button to add these resources to a presentation. These resources, while passed to our presentation generator, are represented in the resource specification format discussed in Section 3.2.

Next, the presentation designer has to use our ICON programming technique to design the schedule of the presentation. We have four temporal ICONs with respect to the four temporal ICONs with respect to the four temporal specification statements discussed in Section 3.3. In Figure 2, we propose four ICONs. A box with four surrounding thick lines in an ICON represents a place holder for a multimedia resource (indicated by its name " r , " " r l , " or " r2" ) . A box with thin lines is for the user to fill in a timing parameter (shown as "n," " n l , " or "n2"). After the user selects the resources that h e / s h e plans to use from our multimedia resource browser, in Figure 3, when the user pushes one of the temporal ICONs listed in the tool bar, the user is able to drag and place the temporal ICON in the Temporal Specification Design area. Next, the user has to fill in the resources and timing parameters by dragging resource ICONs from the Multimedia Resource area and typing in integers in the temporal ICON. After a temporal ICON has all of its parameters filled, the ICON is highlighted. The user thus can drag this temporal ICON to the Temporal Specification Deposit area which contains the final presentation schedule desigm

At the same time, the user may want to design the presentation layout. To specify the spatial information, the user has to use the Spatial Specifi- cation Editor shown in Figure 4. The Multimedia Resource area contains resourcer ICONs slightly different from those shown in Figure 3. Each resource ICON, annotated by a check box, allows the user to display the

always(r, n)

o 1~3

sequential(rl, r2, n) inters¢cts(rl, r'2, n)

Fig. 2. T e m p o r a l spec i f i ca t ion I C O N s .

synchronizcs(rl, nl, r2, n2)


m o,>) ~ Muic~

Temporal Specification Design

P i l ~ m

0 '

i

Temporal Specification Deposit

Sequential

IJP*" II I

Fig. 3. The Temporal Specification Editor.

- [ -

m F"fl

spatial information of a picture/video, or the duration of a sound/MIDI resource, in the Spatial Specification Design area. While the check box is selected the first time, the system displays a box (for picture, video, etc.) with eight resize control points on the upper-left corner of the Spatial Specification Design area. The user is able to move the box, or resize the box in the X-axis direction, the Y-axis direction, or both directions within the design area. Using this technique, the presentation designer is able to specify the layout of a presentation.

After the presentation designer gives the temporal and spatial specifications, a Presentation Story Board appears, as shown in Figure 5. The story board has a timing ruler showing the presentation cycles. The small circles below the ruler indicate that the presentation has three entry points. Each entry point starts a segment of the presentation upon receiving a specific message. Messages are generated by a push button or a logic inference


,patial Specification Edito

Ble EdR Browser Iempro~d S%ory Help

j MuRimedla Resources [] [] [] [] [] [] [] []

m Waf'alper IntK:du~e Guide M~i~l P~t~lt ~ n e m l Tour SkFp~ng [] [] []

;patial Specification Desi9 r

I W,~W:*a;xir 7.

I l Pi~m

i l

I

1

m " ~ e, erie ud " •

Idusi~ 1:25 Inboduoe:30

Fig. 4. The Spatial Specification Editor.

result [18]. The small triangles below the ruler are break points. While spec.ified, these break points stop a presentation and wait for the user's interaction, which may send a message to continue the presentation.

In Figure 5, each stream is represented by a series of horizontal boxes with resource names embedded. A horizontal line in a series represents a silent event. A box surrounded by dashed lines represents a resource extension. And a box filled with slant lines to the upper-right (or upper-left) side of a resource box means that the resource is truncated. After the user is satisfied with the presentation design, he / she can play the presentation by issue the P l a y command in the menu bar.

The temporal and spatial specification editors produce a presentation specification based on presentation resource attributes, the temporal spec-


Jr~ts~:l~lati~rt ~;h)l'¢ tJ(i;Jrl

File Play Browser Spatial Temporal Help

0 10 20 30 40 50 60 70 , i

, .

,A,

Wallpaper

pl.u.o I To., I "° .

Go.o.., i =.op.,.0 U

Explain I O

¢dnl~ i!i;i!ii!i!jil ~! Viii;?~ ii ii A i!iliiii'~il ii~iJii i iiiii iii!i iS,!:ii!':~l~

Fig. 5. The Presentation Story Board.

ification ICONs the user used, and the layout the user designed. The presentation generator (to be discussed in Section 5) takes this presentation specification, and generates a presentation implementation. The presentation schedule shown in Figure 5 is a graphical result based on the presentation implementation.

5. AN IMPLEMENTATION OF THE PRESENTATION GENERATOR

We have a prototype system implemented under MS Windows. The ICON programming graphical user interface is implemented in Visual Basic. The presentation generator is implemented in Arity Prolog. Also, some stub functions implemented in Visual C + + serve as the bridge to pass parameters between the presentation generator and the graphical user interface. The presentation generator, written in Prolog, has two parts. The first phase generates the temporal relations from four kinds of temporal specification statements. The second phase takes these relations and generates temporal implementations specified by the "sequential," and "concurrent," and the "identical" functions discussed in Section 2.


Resource specification and temporal specification are represented as Prolog predicates. Spatial specification is not used yet in our prototype generator. Spatial information is resources is kept internally in the user interfzce. For the later version of our generator, spatial inference rules will be', incorporated. The following are some example specifications.

Examples of Resource Specifications

resource "WallPaper," oo, yes, small,

(picture,

resource "Picture," oo, fade_out, small,

(picture,

resource "Map," oo, yes, small,

(picture,

resource "Tour," 20, no, large,

(video,

resource "General," oo, yes, small,

(text,

resource "Shopping," oo, yes, small,

(text,

resource "Musicl," 25, repeat, medium

(midi,

resource "Music2," 25, repeat, medium

(mi£i,

resource "Explain," 20, no, large,

(sound,

resou:-ce "Introduce," 30, no, large,

(sound,

resou:~ce "Guide," 40, last medium

(animation, frame,

"800x400,"

"300x300,"

"400x300,"

"400x200,"

"300x300,"

"300x300,"

"0x0,"

"0x0,"

"0X0,"

"0x0,"

"400x200,"

"wallpap.bmp")

"picture.bmp")

"map.bmp").

"tour.avi").

"general.rtf")

"shopping.rtf").

"musicl.mid").

"music2.mid").

"explain.wav").

"intro.wav").

"guide.fli").

Examples of Temporal Specifications

always(wall_paper, oo).

sequential(picture, tour, 0).

sequential(tour, map, 0).

sequential(general, shopping, 0).

intersects(picture, musicl, 0).

intersects(map, music2, 0).

intersects(general, explain, 0).

synchronizes(wall_paper, 0, picture, 0) .

synchronizes(picture, 0, general, 0) .

Our generator relies on the pattern matching and the depth first searching mechanisms of Prolog, and generates the following temporal relations and functions.

318 C.-M. C H U N G A N D T. K. SHIH

Examples of Temporal Relations and Functions Generated

Temporal relations Temporal functions

generated (Phase i) generated (Phase 2)

always(wall_paper, oo) .

meets(picture, tour).

meets(tour, map).

meets(general, shopping).

equal(picture, musicl).

equal(map, music2).

equal(general, explain).

starts(wall_paper, picture).

starts(picture, general).

-(wall_paper oo!) .

-(picture, tour).

-(tour, map).

-(general, shopping).

$(picture, musicl) .

S(map, music2) .

$(general, explain).

$(wall_paper, picture).

$(picture, general).

Finally, the time chart of the presentation is derived according to the algorithms for the X-assertion and the Y-assertion.

An Example of a Presentation Time Chart

oo170 Wa I 1 Paper ..............................................

ool20 Picture ...............

ool30 Map ....................

2O

Tour ...............

00140 General ...............................

o0130 Shopping

20 6

Musicl ............... \\\\\\

Music 2

2O

Explain ...............

24 6

In the time chart, "oo[2 0" represents that the resource is displayed for 2 0 cycles, "\ \ \ \ \ \" with superscript 6 means the last six cycles of the resource are truncated, and " . . ." with superscript 6 represents that the resource is extended for six cycles. In the presentation, W a l l P a p e r is in stream one, Picture, Tour, and Map are in stream two, General and S h o p p i n g are in stream three, M u s i e l and M u s i c 2 are in stream four, and E x p l a i n is in stream five. W a l l p a p e r is the background picture of the presentation window. H u s i c l is the background music for P i c t u r e . After M u s i c l finishes, a video resource (i.e., Tour) is played, tMad


Music2 serves as the background music of Map starts after the video. G e n e r a l and Shopp ing are text resources explain P i c t u r e / T o u r and Map, respectively. Finally, E x p l a i n is a human sound resource explains Picture.

Us.ing the backtracking technique available in a Prolog system, it is possible for the generator to generate more than one set of the above functions. In this case, our graphical user interface will take the multiple copies of presentations, and suggests that the user choose the one he/she prefers.

6. CONCLUSIONS

We propose a mechanism and a system for the automatic generation of interactive multimedia presentations. The mechanism uses interval temporal logic inference rules as a tool to achieve multimedia resource synchronization. These rules incorporate issues such as hardware limitations, properties of multimedia resources, and good presentation principles. The specification statements we proposed cover all temporal relations given in [2]. To precisely specify synchronization points, we annotate the relations with timing parameters. An ICON programming user interface and the presentation generator are developed. We use our prototype system to generate some simple presentations, such as city tour and an introductory lecture to multimedia PCs.

However, it is very difficult to construct good inference rules to assist a presentation design in the layout of a presentation, especially in solving the resource boundary overlap problems. In the prototype system, we provide an editor for the designer to arrange the presentation layouts. We are ,;till seeking good heuristics for spatial inference rules.

Another important issue of presentation generation is to allow the user to change the presentation generated. And the system can take the change from the user and synthesize a new specification. Moreover, the use of mullimedia resources is interdependent. For instance, a picture showing the city of Paris and a text file describing the history of the city are generally used together. The dependency of important when reuse of resources is considered. We have also developed a multimedia resource and a browser to support the reuse of resources and presentations. But this topic is outside the scope of this paper.

Consequently, our contributions in the paper are: first, we propose four useful temporal specification statements by showing the mapping between the :~tatements and the interval temporal relations [2]. Second, a number of inference rules are developed and a presentation generator is imple-


mented. Third, an I C O N programming interface is designed for the convenience o f the user. Using our system, the presenta t ion designer is able to specify what h e / s h e wants instead of telling the compute r exactly how the presenta t ion show be scheduled.

R E F E R E N C E S

1. Y. Y. Al-Salquan et al., MediaWare: On multimedia synchronization, in: Proc. Int. Conf. on Multimedia Computing and Systems, Washington, DC, May 1995, pp. 150-157.

2. J. F. Allen, Maintaining knowledge about temporal intervals, Commun. ACM 26(11) (1983).

3. D. P. Anderson and G. Homsy, A continuous media I / O server and its synchronization mechanism, IEEE Computer 51-57 (Oct. 1991).

4. D. S. Backer, Multimedia presentation and authoring, in: J. F. K. Buford (Ed.), Multimedia Systems, ACM Press, 1994, pp. 285-303.

5. H.-Y. Chen et al., A novel audio/video synchronization model and its application in multimedia authoring system, in: Proc. 1994 HD-MEDIA Technical and Applications Workshop, Taipei, Taiwan, Oct. 1994.

6. C.-M. Chung, T. K. Shih, J.-Y. Huang, Y.-H. Wang, and T.-F. Kuo, An object- oriented approach and system for intelligent multimedia presentation designs, in: Proc. 2nd ICMCS'95 Conf., 1995, pp. 278-281.

7. Y. F. Day et al., Spatio-temporal modeling of video data for on-line object-oriented query processing, in: Proc. Int. Conf. on Multimedia Computing and Systems, Washington, DC, May 1995, pp. 98-105.

8. A. J. M. Donaldson and P. L. Simeonov, Addressing real time with temporal logic in multimedia communications, in: Proc. ISATED/ISMM Int. Conf. on Distributed Multimedia Systems and Applications, Stanford, CA, Aug. 1995, pp. 215-219.

9. S. Eun, E. S. No, H. C. Kim, H. Yoon, and S. R. Maeng, Eventor: An authoring system for interactive multimedia applications, Multimedia Systems 2:129-140 (1994).

10. S. K. Feiner and K. R. McKeown, Automating the generation of coordinated multimedia explanations, in: M. T. Baybury (Ed.), Intelligent Multimedia Interfaces, American Association for Artificial Intelligence, 1993, pp. 117-138.

11. S. K. Feiner et al., Towards coordinated temporal multimedia presentations, in: M. T. Maybury (Ed.), Intelligent Multimedia Interfaces, American Association for Artificial Intelligence, 1993, pp. 139-147.

12. W. A. Halang and M. Wannemacher, High-precision temporal synchronization in distributed multimedia systems, in: Proc. ISATED/ISMM Int. Conf. on Distributed Multimedia Systems and Applications, Stanford, CA, Aug. 1995, pp. 221-223.

13. P. Hoepner, Synchronizing the presentation of multimedia objects--ODA extensions, ACM SIGOIS 19-31 (1991).

14. T. D. C. Little and A. Ghafoor, Synchronization of storage models for multimedia objects, IEEE J. Selected Areas in Commun. 8(3):413-427 (1990)..,

15. T. D. C. Little and A. Ghafoor, Interval-based conceptual models for time-depen- dent multimedia data, IEEE Trans. Knowledge and Data Engineering 5(4):551-563 (1993).

16. B. Prabhakaran and S. V. Raghavan, Synchronization models for multimedia presentation with user participation, Multimedia Systems 2:53-62 (1994).

G E N E R A T I O N O F M U L T I M E D I A P R E S E N T A T I O N S 321

17. J. Schnepf et al., Doing FLIPS: Flexible interactive presentation synchronization, in: Proc. Int. Conf. on Multimedia Computing and Systems, Washington, DC, May 1995, pp 213-222.

18. T. K. Shih, An artificial intelligent approach to multimedia authoring, in: Proc. 2nd IASTED/ISMM Int. Conf. on Distributed Multimedia Systems and Applications, Stamford, CA, Aug. 1995, pp. 71-74.

19. T. K. Shih, On making a better interactive multimedia presentation, in: Proc. Int. C~nf. on Multimedia Modeling, Singapore, 1995.

20. T. K. Shih, C.-H. Kuo, and K.-S. An, Multimedia presentation designs with database support, in: Proc. NCS'95 Conf., Taiwan, 1995.

21. N. Shivakumar, C. J. Sreenan et al., The Concord Algorithm for synchronization of networked multimedia streams, in: Proc. Int. Conf. on Multimedia Computing and Systems, Washington, DC, May 1995, pp. 31-40.

22. R. Steinmetz, Synchronization properties in multimedia systems, IEEE J. Selected Areas in Commun. 8(3):401-412 (1990).

23. M. Vazirgiannis and C. Mourlas, An object-oriented model for interactive multimedia presentations, The Computer J. 26(1) (1993).

24. T. Wahl et al., TIEMPO: Temporal modeling and authoring and interactive multimedia, in: Proc. Int. Conf. on Multimedia Computing and Systems, Washington, DC, May 1995, pp. 274-277.

Received 1 January 1995; revised 30 May 1995, 22 June 1996

Documents

On automatic generation of multimedia presentations