Editing out video editing

This article outlines aparadigm shift inmedia production:the advent ofcomputationalmedia productionthat will automatethe capture, editing,and reuse of videocontent. Byintegratingmetadata creationand (re)usethroughout themedia productionprocess, we’ll enablethe masscustomization ofvideo.

For the majority of people to not justwatch but make video on a daily basis,the current media production processmust be transformed from a mechani-

cal process to a computational one. To investi-gate this transformation, let’s begin with athought experiment—consider these four simplequestions:

1. How many of you reading this article readtext on a daily basis (including email, Webpages, newspapers, and so on)? Answer: All ofyou.

2. How many of you write text on a daily basis(including email, notes, and so forth)?Answer: Very nearly all of you.

3. How many of you watch video on a dailybasis (including all media forms of movingimages with sound—movies, television, andvideos). Answer: Many more of you than youmight care to admit.

4. How many of you make video on a daily basis(including all media forms of moving imageswith sound—movies, television, and videos)?Answer: ?

Although a majority of IEEE MultiMedia read-ers work on digital media technology, probablyonly a few of you answered “yes” to the fourthquestion. In this article I describe the technol-ogy and a vision for computational media pro-duction that will enable us to answer “yes” tothe fourth question. I initially articulated thisvision of the widespread daily production andsharing of video content for the 50th anniver-sary edition of the Communications of the ACM.1

In that article, the key technological challenges

I identified that would enable a new “garagecinema” were tools for accessing and manipu-lating content. While the challenges of makingmedia accessible and manipulable are still withus, my further research has revealed that forvideo to become a daily means of communica-tion we need to invent new forms of computa-tional media production that don’t merelycomputerize existing media production toolsand methods.

What’s wrong with media productiontoday?

The current paradigms for media productionarose within the context of social, economic,legal, and technological factors that differ great-ly from a situation in which video could be awidespread, daily medium of many-to-manycommunication in the way that text is today.

It takes teams of highly trained people to cre-ate professionally produced video content (includ-ing movies, television, and videos) for massaudiences. The professional video productionprocess usually requires three distinct phases:

1. Preproduction: concept formation, scriptwrit-ing, storyboarding, and production planning.

2. Production: video and audio recording.

3. Postproduction: video and audio editing, spe-cial effects, soundtrack composition, and re-recording video and audio.

This media production methodology is expen-sive in time, money, and human resources. It’sespecially time-consuming and can range frommany hours (television news) to many years(Hollywood feature films). The current media pro-duction process requires a variety of expertise ateach step as well as expensive personnel and equip-ment. The media production process itself is alsooften wasteful of effort and lossy of information.

Rarely do we orient current media productiontoward creating reusable media assets. Because ofthe difficulties of finding and reusing appropriatemedia assets, new production often occurs whenmedia reuse could have been an option. In addi-tion, we create far more footage in professionalmedia production than we use in the final edit(from 10 to 1 to 100 to 1, depending on the typeof production). Furthermore, almost all metada-ta created during the various media productionphases are neither easily available to subsequent

54 1070-986X/03/$17.00 © 2003 IEEE Published by the IEEE Computer Society

Editing outVideo Editing

Marc DavisUniversity of California at Berkeley

Computational Media Aesthetics

phases, nor easily accessible after finishing pro-duction. As a result, most research on creatingvideo metadata through signal analysis attemptsto recreate metadata that was available (but lost)at various points in the production process. Theexpense and inefficiencies of current profession-al media production are symptomatic of it stillbeing in the craft mode of production (for moreabout the history of media production, see thesidebar “From Mechanical to Computational”).

Amateur video production by consumersusing video camcorders is fraught with its own

difficulties because the tools and methods avail-able to consumers are modeled after those of pro-fessional video production, while consumersusually possess neither the time, money, norexpertise that professional production methodsrequire. Quite simply, far more amateur video isshot than watched, and people almost never editit. Introducing supposedly consumer-friendlyvideo editing tools such as Apple’s iMovie does-n’t solve the problem of consumer video pro-duction except for the devoted hobbyist andenthusiast. For consumers to quickly, easily, and

55

Ap

ril–June 2003

To envision the future of media production, it’s helpful tounderstand its past. The invention of motion pictures occurredin the late 1890s and while the technologies for motion picturecameras and editing systems have changed, the underlyingparadigms of capturing and manipulating motion pictureshaven’t (see Figure A). First let’s examine the history of motionpicture capture technologies. This history is detailed and com-plex,1 but for our purposes, a thumbnail sketch will suffice.

In 1895, the Lumière brothers invented the Cinématographe,a portable film camera that also functioned as a film developingunit and projector. In 1951, the first videotape recording wasinvented for use in television broadcasting, and it took another20 years—until the invention of the Port-a-pak—for a video cam-era connected to a videotape recording unit to becomeportable.2 With the advent of the video camcorder in the 1980s,we’ve nearly come full circle to the capabilities of the LumièreCinématographe. While modern video camcorders use compu-tation to automate many important functions related to cap-turing focused, properly exposed video, the paradigm of videocapture (like that of video editing) has remained unchanged.Video camcorders encode almost no metadata about whatthey’re recording and don’t proactively assist in capturing videoassets that we can easily edit and reuse—they simply recordmoving images and sound.

Motion picture editing technology underwent three majortechnological phases in the last century:

❚ physical film cutting,

❚ electronic videotape editing, and

❚ digital nonlinear editing.3

Film editing began with the direct manipulation of the film reel: acut was literally that, the cutting of the film reel at one point andthe taping together of it and another length of film at the cutpoint. The invention of electronic videotape editing in the early1960s, while a cost and time savings over film editing, was actu-

ally a step backward in the manipulability of motion picture con-tent given the difficulty of insert editing and the invisibility of allbut the current frame being used. The advent of nonlinear edit-ing in the late 1980s was in many ways a return to the affordancesof early film editing with its ability for the editor to both see andmanipulate sequences of frames. While the technological devel-opments in motion picture editing in the last part of the 20th cen-tury involved computers, the paradigm of video editing remainedand remains mechanical rather than computational. The mostadvanced video editing software today still manipulates motionpicture content in much the same way as the first film editors.They provide direct manipulation interfaces for cutting, pasting,and trimming sequences of frames, while both the representationof the motion picture content and the methods of its restructuringreside in the editor’s mind. What we need is technology that canembody knowledge of both motion picture content and struc-ture into a new computational media production process.

I’d like to further situate the technologies and methods ofmedia production within the history of industrial production

From Mechanical to Computational

Figure A. While motion picture technology has changed in the

last 100 years, the underlying paradigms for motion picture

capture and editing haven’t. (1) Lumière Cinématographe.

(2) Viewcam camcorder. (3) Moviola film editing machine

(courtesy of the University of Southern California’s Moving

Image Archive). (4) Nonlinear editing suite.

continued on p. 56

(1)(4)

(2) (3)

regularly produce and share video, we need anew mode that isn’t based on the current profes-sional or amateur production paradigm.

For the most part, adding computation tovideo production, both professional and ama-teur, has merely attempted to assist the currentmode of media production. Instead, we shouldrethink and reinvent the relationship betweenvideo and computation to create a new mode ofcomputational media production.

Computational media productionMotion pictures enable us to record and con-

struct sequences of images and sounds.Computation enables us to construct universalmachines that—by manipulating representationsof processes and objects—can create new process-es and objects, and even new machines.Computational media production isn’t about theapplication of computer technology to the exist-ing media production methodology; it’s aboutthe reinvention of media production as a com-putational process. To reimagine movies as pro-grams, we need to make motion picturescomputable. That is, rather than producing a sin-gle motion-picture artifact, we produce a com-puter program that can, based on input mediaand parameters, produce motion pictures.Consequently, media production shifts from thecraft-based production of single movies to theproduction of computer programs that can out-put mass-customized movies.

To illustrate the idea of mass-customized com-putational media, consider the famous Budweiser

Super Bowl commercial in which a sequence offriends asks each other “Wassup?” on the tele-phone. Imagine this same commercial personal-ized to the level of each individual viewer so thatwhen you see the ad on TV you see your friendsand yourself saying “Wassup?” and when I seemy version of the same commercial I see myfriends and myself. Current media productiontechnology and methodology would render thisscenario impossible because of the time andmoney required to custom shoot and edit thecommercial for each Super Bowl viewer. With acomputational media production process, soft-ware programs would assist consumers in theautomated capture of the required video assetsand automatically integrate them into the finalpersonalized commercial based on a profession-ally authored template. Rather than merelyhandcrafting a single commercial that millionsof viewers would see, for basically the sameamount of effort, computational media produc-tion methods could produce millions of person-alized and customized versions of thecommercial. We can envision similar examplesfor other commercials, movie trailers, musicvideos, and video greetings and postcards.

My project teams and I have been working onthe concepts and prototypes for computationalmedia production over the last decade. Initially,we conducted research at the MIT Media Lab,then at Interval Research, at Amova (http://www.amova.com), and now at my Garage CinemaResearch group (http://garage.sims.berkeley.edu)at UC Berkeley’s School of Information Manage-

56

IEEE

Mul

tiM

edia

processes. Industrial production underwent three major devel-opmental phases in the last 300 years. Before the IndustrialRevolution, skilled craftspeople custom-made goods for con-sumers. Production was individually tailored, but expensive intime and money. With the advent of standardized inter-changeable parts and mass production processes, goods wereno longer created for individual consumers, but could be pro-duced cheaply and quickly. In the late 20th century, mass cus-tomization processes began to combine the personalization ofthe customized production of goods with the efficiencies ofmass production methods.4 When we look at modern mediaproduction in light of the development of industrial produc-tion methods, we see that media production is still largely inthe mode of craft-based, customized production. Skilled crafts-people work to create one media production; they can thenuse mass production methods to reproduce and distribute the

media production. However, the production process itselfdoesn’t take advantage of the efficiencies of industrial produc-tion, especially of standardized interchangeable parts and pro-duction methods that don’t require skilled craftspeople. Formedia production to become a process of mass customization,it must be transformed from a mechanical process to a com-putational one.

References1. B. Salt, Film Style and Technology: History and Analysis, 2nd ed.,

Starword, 1992.

2. I. Fang, “Videotape,” The Encyclopedia of Television, H.

Newcomb, ed., Museum of Broadcast Comm., 1997.

3. J.T. Caldwell, “Video Editing,” The Encyclopedia of Television, H.

Newcomb, ed., Museum of Broadcast Comm., 1997.

4. B.J. Pine, Mass Customization: The New Frontier in Business

Competition, Harvard Business School Press, 1993.

continued from p. 55

ment and Systems (SIMS). As the diagram (Figure1) from our time-based media processing systempatent2 illustrates, computational media process-ing uses content representation and functionaldependency to compute new media content frominput media content and stored assets.

We used extensions to this simple computa-tional framework to systematically automatemany of the functions human editors perform:audio–video synchronization, audio substitution,cutting on motion, L-cutting for dialogue, andpostproduction reframing of shots. By reimagin-ing media editing as a computational process, wealso developed a new class of parametric specialeffects by which we can automatically manipu-late video and/or audio as a function of continu-ous parameters of other media. An example is toautomatically rumble video to a music sound-track by having the video’s scale and position bea function of the music’s amplitude.2 Note thatthese media automation functions (and struc-tures and applications built with them) aren’tused to make better or more automated editingtools for consumer video production. Rather, ourintent is to remove the editing process as weknow it from consumers and replace it with avariety of quick, simple, and satisfying ways thatusers can create, shape, and share video content.The most complex “editing” process a consumershould ever have to perform should resemble thesimple selection and control processes involvedin using a TV remote.

To make video computable, we need to repre-sent its content (especially its semantics and syn-tax). This effort must address the semantic gapbetween the representations of video contentthat we can derive automatically and the contentand functional knowledge that cinematogra-phers, directors, and editors use in the tradition-al media production process.3 We address thissemantic gap by

❚ developing metadata frameworks that repre-sent the semantic and syntactic structures ofvideo content and support computation withand reuse of media and metadata;

❚ integrating metadata creation and reusethroughout the computational media pro-duction process;

❚ redesigning media capture technology andprocesses so as to facilitate capturing metada-ta at the beginning of the production process

and proactively capturing highly reusablemedia assets; and

❚ inventing new paradigms of media construc-tion that automatically recombine and adaptmedia assets by using representations ofmedia content, semantics, and syntax.

Metadata and media reuseSince 1990, my project teams and I have devel-

oped a comprehensive system for video metadatacreation and reuse: Media Streams.4,5 MediaStreams provides a standardized iconic visual lan-guage using composable semantic descriptors forannotating, retrieving, and repurposing videocontent. Recent standardization efforts in MPEG-7 are also working to provide a uniform andreusable framework for media metadata.6 To cre-ate metadata that describe the semantic and syn-tactic content of motion pictures, we can draw onsemiotic7,8 and formalist9,10 film theoretic analysisof the structures and functions of motion pictures.Computational media aesthetics researchers3 havea key role—not only in digital technology, but infilm theory itself—in developing theoretical andpractical frameworks for describing computation-ally usable and generable media. We need todevelop frameworks that are not only analytic(designed to parse motion pictures into their struc-tural and functional components) but also syn-thetic (designed to support the production,combination, and reuse of motion picture con-tent).

By readjusting our focus on the synthesis ofmotion pictures, we can reenvision the opportu-nities for creating and using media metadata.While Media Streams supports retrieval-by-composition methods for media synthesis (assem-bling sequences from disparate database elementsin response to user queries), the system’s mainusage paradigm is for annotating video content(analysis) in postproduction. Our recent work hasfocused on reinventing the production process togenerate Media Streams’ metadata throughout

57

Ap

ril–June 2003

Media

Media

Media

Contentrepresentation

Contentrepresentation

Parser

Parser

Producer

Figure 1. Movies as

programs is the central

idea of the time-based

media processing

system.

the computational media production process. Weespecially focus on the point of capture, where somuch of the metadata that analysis seeks toreconstruct is readily available.

Rethinking captureWhat’s the purpose of video capture? What

type of production process is it designed to serve?Earlier in this article I alluded to the inherentlywasteful process of current media production:where producers shoot 10 times (or more) theamount of footage than they use in the final editand rarely reuse captured media assets in futureproductions.

How can integrating computation and videomake the media capture and production processmore efficient and yield reusable media assets?Researchers have been exploring creating data-cameras that encode camera parameters and, withhuman assistance, semantic information duringcapture.11,12 While these approaches correctlyattempt to frontload the annotation effort to thepoint of capture (and in some cases even earlierto the preproduction phase), they assume that theprocess and goals of video capture and produc-tion remain largely unchanged. The point of cap-ture is the best opportunity to encode muchuseful metadata (camera settings, temporal andspatial information, and so forth), yet we can cap-ture semantically richer metadata by rethinkingthe goals and processes of video capture.

Our work in computational media productionautomates direction and cinematography to cre-ate a new media capture process whereby thecapture device interacts with the user and theworld to proactively capture metadata and anno-tated reusable media assets in real time. The keyinsight is to invert, as Figure 2 shows, the currentmedia capture process. Rather than capturing 10(or more) minutes of footage to get one usableminute, we redesign the process to capture 1minute of (re)usable footage that could be usedin creating 10 (or many more) minutes of mediacontent.

To automatically capture a small set of highlyreusable video assets, we’ve developed a newmethod of video capture that we call active cap-ture. Active capture reinvents the media captureand production process by integrating capture,processing, and interaction (see Figure 3) into acontrol system with feedback connecting thecapture device, human agents, and the environ-ment shared by the device and agents. Activecapture overcomes the limitations of standardcomputer vision and audition techniques byusing human–computer interaction designs tosimplify the world and the actions that the vision(and audition) algorithms need to parse.

With active capture, the media capture

58

IEEE

Mul

tiM

edia

(a)

(b)

Figure 2. New capture paradigm for creating reusable media assets. (a) The

current capture paradigm involves multiple captures to get one good capture.

(b) With the new capture paradigm, one good capture drives multiple uses.

Processing

Capture Interaction

Activecapture

Computervision/

Audition

Human–computerinteraction

Direction/Cinematography

Figure 3. Active capture integrates capture, processing, and

interaction.

process can then become an interactive sessionbetween a recording device and its subject—much in the same way that a director and cine-matographer work with actors today or the waysamateur photographers prompt or instruct theirsubjects. Importantly, the active capture processrequires no special expertise on the part of thesubject being captured. This low threshold forthe subject of a computational capture process isimportant for its potential viability as a captureparadigm for consumers.

In portrait photography and motion picturedirection humans both prompt and evaluate thecapture subjects’ responses. In active capture,both prompting and response evaluation can beachieved by the device itself. However, an activecapture device uses audio and/or visual cues toprompt the capture subject to perform somedesired action (such as smiling and looking at thecamera). Through real-time audio and videoanalysis, an active capture device can determinethe fitness of the subject’s response in relation tosome predetermined capture parameters. If thecapture meets these parameters, the captureprocess is complete. If not, the active capturedevice can prompt the user again until it achievesa suitable response or the process has timed out.

Real-time audio and video analysis of the sub-ject’s responses can enable an active capturedevice to offer specific suggestions as to how toimprove the subject’s response. Figure 4 illus-trates an example of active capture’s controlprocess with feedback. This example depicts theactive capture routine for capturing a high-qual-ity and highly reusable shot of a user screaming.

To capture a shot of the user screaming, the sys-tem prompts the user to look at the camera andscream. The system has a minimal average loud-ness and overall duration it’s looking for, and likea human director, it can prompt the user accord-ingly (such as scream louder or longer) to capturea loud and long enough scream shot.

By using a recording device that can interactwith its subject (through verbal, nonspeechaudio, and visual cues), we can quickly and eas-ily capture reusable assets of people (reactionshots, focus of attention, and so on) that thesystem preannotates through the captureprocess. Metadata creation then is no longerpredominantly a postproduction process, but areal-time and interactive process to create meta-data and annotated reusable media assets at thepoint of capture. The active capture process cancreate an inventory of annotated reusablemedia assets that can serve as resources forautomated media production and mass cus-tomization of video.

New paradigms for media constructionIn designing and inventing new paradigms for

computational media construction, I turned tochildhood constructive play experiences thatseemed effortless and satisfying: Mad Libs andLego. These two toys let children quickly and eas-ily construct complex, coherent structures in textand plastic. They leverage simple actions withinthe context of constrained structures to generatecomplex and varied outputs. These toys also pro-vide two outstanding paradigms for computa-tional media construction.

59

Ap

ril–June 2003

"That was great,but I need you to

scream a littlebit longer. Let'stry it again, OK,

scream."

"That wasgreat!"

"Hi, we're goingto take a few

simple shots tomake an

automaticmovie.

Please look atthe camera."

"Now we'regoing to do

something reallysimple. What Iwant you to do

isscream."

"That was great,but I need you to

screamlouder!"

Figure 4. Active capture

process for capturing a

“scream” shot. Quotes

are verbal instructions

from the active capture

device. The green

arrows represent an

error-free path. The

yellow arrows are error-

correction loops.

In 1953, Leonard Stern and Roger Priceinvented Mad Libs while working as comedywriters for the popular Honeymooners TV show. InMad Libs, an existing syntactic structure (usual-ly a brief narrative) has parts of its structureturned into slots for semantic substitution. Toplay Mad Libs, a person or persons are promptedto fill in the empty semantic slots, without fore-knowledge of the syntactic structure in whichthey occur. Once all slots are filled, the complet-ed Mad Lib is read aloud. To illustrate the con-cept, I created a Mad Lib version of the firstsentence of the call for papers for this specialissue of IEEE MultiMedia (see Figure 5).

Lego blocks are known and loved worldwide.Invented by master carpenter and joiner Ole KirkChristiansen in the 1950s, Lego plastic blocks usea patented stud-and-tube coupling system thatlet the blocks be easily and securely snappedtogether and apart in a myriad of configurations.Lego blocks are a supreme example of combina-toric richness from constrained structure—justsix 8-studded Lego blocks can fit together in102,981,500 ways.13

Before turning to the explanation of apply-ing the Mad Lib and Lego construction para-digms to computational media, it’s important tonote that a different concept of authorship isembodied in the construction paradigms of MadLibs and Lego than underlies popular concep-tions of traditional media production andauthorship. With Mad Libs and Lego, skilledcraftspeople construct reusable artifacts whoseconstrained structures enable children to usethem to create new artifacts. While this brico-lage14 process is highly creative, saddled withour inherited 19th century Romantic notions ofcreativity (in which an artistic genius createssomething out of nothing), we can be blind tothe actual process of creativity.

In the creative process, we always begin with arepertoire of preexisting cultural and formal

materials within which we construct new arti-facts and meanings. The continued marketing ofcinematic auteurs (even though cinema is ourmost collectively produced art form) is sympto-matic of these centuries-old notions of author-ship, craft, copyright, and genius. In the 20thcentury, both art practice and theory haveoffered alternative notions of authorship (inwhich authorship is a process of recombiningexisting artifacts) that are more aligned with theconstruction paradigms of Mad Libs and Legoand help provide a framework for conceptualiz-ing and situating new construction paradigms forcomputational media production.15

While Mad Libs are made of text and Legoblocks of plastic, both construction paradigms canbe understood in semiotic terms. Semiotics16 is thestudy of sign systems. Semiotic systems, with lan-guage as the primary example, are our fundamen-tal means of human communication andsense-making with each other and the world.Semiotic systems are comprised of signs. Signs con-sist of two interrelated parts: the signifier (a mentalacoustic image like the perception of the sight orsound of the word “tree” or an image of a tree) andthe signified (the mental concept to which the sig-nifier refers). Signs gain their definition and valueby their differences from one another. Semioticsystems also have two major forms of organizationthat can be understood as orthogonal axes: the par-adigmatic and syntagmatic. Figure 6 illustrates theinterrelationship of these axes of organization.

The paradigmatic axis relates to signs that canbe selected from a set of signs and thereby substi-tuted for one another based on their function inthe syntagmatic axis (semantics is the linguisticform of paradigmatic organization). The syntag-matic axis relates to signs that can be combined insequential structures (syntax is the linguistic formof syntagmatic organization). For example, in thesyntagma “My daughter likes tofu,” “son” is anacceptable paradigmatic substitution for “daugh-

60

IEEE

Mul

tiM

edia

IEEE

Mul

tiM

edia

Computational Media Aesthetics

One of the _____________ hurdles facing _____________ media management systems is the semantic __________ betweenADJECTIVE ADJECTIVE NOUN

the _____________ meaning that users desire to ________________________ when they query to _______________________ADJECTIVE VERB (PRESENT TENSE) VERB (PRESENT TENSE)

and __________________________ media and the _____________________________of the content descriptions that we can VERB (PRESENT TENSE) ADJECTIVE +"NESS"

_______________________ compute today for media ______________________________ and search.ADVERB VERB ("ING" ENDING)

Figure 5. A Mad Lib

version of the first

sentence of the call for

papers for this special

issue of IEEE

MultiMedia.

ter,” while “eats” is not. Mad Libs and Lego blocksoffer users differing constraints and opportunitiesin the paradigmatic and syntagmatic dimensions.In playing with Mad Libs, users provide acceptableparadigmatic substitutes for missing elements ina fixed syntagmatic structure—users don’t writethe sentences of Mad Libs, they provide their ownwords to complete them. Playing with Lego blocksinvolves constructing syntagmatic structures froma fixed set of paradigmatic terms—users don’tmake their own Lego blocks, they make structureswith Lego blocks. These differing construction par-adigms offer us compelling user interaction mod-els for computational media production.

Video Mad LibsBased on the construction paradigm and user

experience model of Mad Libs, we developedadaptive media templates (AMTs) that support auto-matic paradigmatic substitution and adaptationof new media elements within a set syntagmaticstructure. In an AMT, the structure is largely fixed,while the content can be varied. AMTs have tem-plate assets (media that are referenced by the func-tional dependency network) and input mediaassets (assets that fill empty slots in the AMTand/or can be substituted for existing templateassets). AMTs coadapt template and input mediaassets based on the content of the media assetsand a set of functions and parameters to computeunique customized and personalized mediaresults. To date, we’ve developed systems forauthoring AMTs (MediaCalc written in MacintoshCommon Lisp and MediaFlow written in C++ forWindows) and specific instances of these tem-plates that use and produce television commer-cials, movie trailers, music videos, movie scenes,banner ads, and Macromedia Flash animations.Figure 7 shows a storyboard of an AMT for an MCIcommercial that incorporates and adapts thescream shot captured in Figure 5. The commercialillustrates the speed of an MCI networking prod-uct by showing a rocket car and its pilot. In thiscase, the AMT version of the commercial process-es the scream shot to substitute the subject of thescream shot for the pilot of the rocket car.

Template-based production has an importantplace in the history of consumer software. Whilemany people may think of programs like AldusPageMaker when remembering the desktop pub-lishing revolution, for average consumers in the1980s it was “The Print Shop” distributed byBrøderbund that revolutionized document pro-duction (especially of greeting cards, invitations,

posters, and banners). The Print Shop’s successwas due in large part to its simple user experiencethat leveraged a large library of templates (syntag-matic structures) and clip art designed for use inthese templates (paradigmatic elements).Similarly, computational media production meth-ods let us radically simplify the media construc-tion paradigm by offering users prebuilt AMTs andvariable template assets, as well as the ability toeasily capture input assets for use in templates.

Video LegoWhile still in the design stage, Video Lego will

use and extend the construction paradigm of Legoblocks. Ideally, Video Lego will be a set of reusablemedia components that know how to fit together.Unlike the fixed syntagmatic structures of Video

61

Ap

ril–June 2003

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 7. Adaptive

media template

storyboard for a

personalized MCI

commercial.

Paradigmatic axis

Syntagmatic axisD EBA

C´´

C

C´´´

C´

Figure 6. The

paradigmatic and

syntagmatic axes of

selection and

combination in

semiotic systems.

Mad Libs, Video Lego will facilitate and automatethe construction of a variety of syntagmatic struc-tures from paradigmatic media elements (seeFigure 8). Furthermore, unlike physical Legoblocks, a Video Lego system will support creatingnew paradigmatic components. With the creationof Video Lego, Video Mad Libs will become a spe-cial case of highly constrained Video Lego struc-tures made out of Video Lego components. Theevolution of software design also provides an inspi-ration for the design of Video Lego components:the invention of object-oriented programming andreusable software components.17

Mass customizationAccustomed as we are to the commercial prod-

ucts of a craft-based media production process, itmay seem counterintuitive that we could auto-matically mass customize movies from reusablecomponents. Two assumptions should be madeexplicit to understand this transformation: the roleof craft and the scope of media repurposability.While our technology envisions and enables theautomated mass customization of media, it doesso by redeploying existing knowledge inherent incraft-based media production in two ways. First,we encapsulate and represent media productionand postproduction expert knowledge in our soft-ware programs in terms of how we describe mediacontent and structure, and the functions and para-meters that guide media adaptation and recombi-nation. Second, the production process of AMTsinvolves craft-based production, not of singlemedia artifacts, but rather of machines that canmass customize them. As a result, we now use cin-ematic production knowledge and postproductionto make programs that make movies.

The question of media repurposability is more

complex. Work in film theory points to minimallevels of cinematic structure that form the basisof intrashot coherence, intershot montage, pre-narrative action sequences, and narrativesequences.7–10,18,19 While these low-level continu-ity systems provide an essential framework forrepurposing content to create coherent se-quences, the work of the avant-garde, musicvideo, and fan video production point towardother possible continuity systems for repurposing.Movies made out of parts of other movies won’tbe Hollywood feature films (though Dead MenDon’t Wear Plaid may be an important exception),but they’ll usher in new possibilities for humanexpression in much the same way that digitalsampling has begun to reshape popular music.

Existing professional motion picture productionpractice involves a certain amount of repurposingof elements (from the reuse of sets and props to theincorporation of stock footage). However, it’s in thepractice of television fans who repurpose programcontent to make new music videos and televisionscenes that we see the harbinger of things tocome.20 Television fans repurpose and personalizepopular television content to express and sharetheir personal and collective desires. Recom-bination of video elements happens on a variety oflevels including, in addition to traditional cuts edit-ing, the composition of various image planes andthe recombination of visual and audio elements(music soundtrack and voice-over narration can beespecially powerful repurposing techniques).Finally, the connective tissue of popular cultureand intimate social groups can overcome many ofthe limitations of traditional narrative and conti-nuity principles to engender new media forms thathybridize personal and popular content.

Finally, why not use 3D computer graphics to

62

IEEE

Mul

tiM

edia

Figure 8. Visualization

of MCI ad built with

Video Lego.

avoid the difficulties of recombination of motionpicture elements? The knowledge and technologyof paradigmatically and syntagmatically recom-bining media and subsequently automating themedia production process applies as well to 3Dcomputer graphics. Both for video and for com-puter graphics, we need semiotically informedtechnologies for computational media productionthat capture and automate the use of descriptionsof media content and structure.

Toward new user experiencesBy reinventing media production as a compu-

tational process, we can remove the obstacles toconsumer participation in motion picture produc-tion, sharing, and reuse. Rather than directlymanipulating media assets and attempting to cutand paste them into sequences, consumers can usenew computational media production paradigms.These systems enable automated capture and auto-matic assembly by integrating metadata through-out the production process (see Figure 9) and byapplying functions that can compute motion pic-tures according to representations of their content.

With computational media production meth-ods, users can

❚ interact with a capture device that guides thecapture process and automatically outputsreusable shots and edited movies;

❚ describe the sequence they want to see andview the results computed automatically;

❚ select media assets and see edited sequencesfeaturing them computed automatically;

❚ select media sequences and easily vary themedia assets within them; and

❚ play with media asset combinations and easi-ly construct well-formed sequences.

Future workTechnologies and methods for componentized

recombinant production can bring great eco-nomic and societal benefits.21 We must thereforework not only to address the technical challengesto making computational media production areality, but also to transform the barriers that existin the spheres of law and public policy for digitalmedia copyright and fair use.22

The key technical challenges that we mustaddress are the continuing integration of signalprocessing, human–computer interaction, andmedia theory and practice to facilitate the cre-ation and use of media metadata to automatemedia production and reuse.

We can achieve these research goals by crossingthe disciplinary divide that separates the variouscommunities of practice needed to design innova-tive solutions and thereby work together to bridgethe semantic gap that separates what our systemscan parse from the meaning people ascribe tomedia content. Through interdisciplinary researchin multimedia we’ll continue to invent new para-digms of computational media production that will

63

Ap

ril–June 2003

Annotation ofmedia assets

Reusable onlineasset database

Asset retrievaland reuse

Adaptive media engine

(b)

(c)

(d)

(a) Web integrationand

streaming mediaservices

Flash generator

HTML email

WAP

Print/Physicalmedia

Figure 9. Automated

computational media

production process. (a)

Automated capture, (b)

annotation and

retrieval, (c) automatic

editing, and (d)

personalized and

customized delivery.

64

IEEE

Mul

tiM

edia

help us quickly, easily, and pleasurably produce,share, and reuse video content every day. MM

AcknowledgmentsI want to give my deepest appreciation to the

brilliant and energetic colleagues who haveworked with me on this research at the MITMedia Laboratory, Interval Research, and Amova:Brian Williams, Golan Levin, Kenneth Haase,Henry Jenkins, Seymour Papert, Warren Sack,David Levitt, Gordon Kotik, Kevin McGee, DaveBecker, Steve Bull, Kevin Ramsey, John Szinger,Ron Levaco, Alan Ruttenberg, Neil Mayle, Mal-colm Slaney, Nuno Correia, Rachel Strickland,Dick Shoup, Meg Withgott, Diane Schiano,Mitch Yawitz, and Jim Lanahan.

References1. M. Davis, “Garage Cinema and the Future of Media

Technology,” Comm. ACM (50th anniversary

edition), vol. 40, no. 2, 1997, pp. 42-48.

2. M. Davis and D. Levitt, Time-Based Media Processing

System (US patent 6,243,087), Interval Research,

2001.

3. C. Dorai and S. Venkatesh, “Computational Media

Aesthetics: Finding Meaning Beautiful,” IEEE

MultiMedia, vol. 8, no. 4, Oct.–Dec. 2001, pp. 10-12.

4. M. Davis, Media Streams: Representing Video for

Retrieval and Repurposing, doctoral dissertation,

Massachusetts Inst. of Technology, 1995.

5. M. Davis, “Media Streams: An Iconic Visual

Language for Video Representation,” Readings in

Human–Computer Interaction: Toward the Year

2000, R.M. Baecker et al., eds., 2nd ed., Morgan

Kaufmann, 1995, pp. 854-866.

6. MPEG-7 Requirements Group, Overview of the

MPEG-7 Standard (version 6.0), ISO/IEC

JTC1/SC29/WG11 N4509, 2001.

7. C. Metz, Film Language: A Semiotics of Cinema, Univ.

of Chicago Press, 1974.

8. U. Eco, “Articulations of the Cinematic Code,”

Movies and Methods: An Anthology, B. Nichols, ed.,

Univ. of California Press, 1976, pp. 590-607.

9. N. Burch, Theory of Film Practice, Princeton Univ.

Press, 1969.

10. D. Bordwell and K. Thompson, Film Art: An

Introduction, 6th ed., McGraw Hill, 2001.

11. G. Davenport, T.G. Aguierre-Smith, and N.

Pincever, “Cinematic Primitives for Multimedia,”

IEEE Computer Graphics and Applications, vol. 11,

no. 4, July 1991, pp. 67-74.

12. F. Nack, “The Future of Media Computing: From

Ontology-Based Semiosis to Computational

Intelligence,” Media Computing: Computational

Media Aesthetics, C. Dorai and S. Venkatesh, eds.,

Kluwer Academic, 2002, pp. 159-196.

13. P. Keegan, “Lego: Intellectual Property Is Not a Toy,”

Business 2.0, Oct. 2001, http://www.business2.com/

articles/mag/0,1640,16981,00.html.

14. C. Levi-Strauss, The Savage Mind, Univ. of Chicago

Press, 1966.

15. R. Barthes, Image Music Text, Hill and Wang, 1977.

16. F. de Sausure, Course in General Linguistics,

McGraw-Hill, 1983.

17. L. Barroca, J. Hall, and P. Hall, “An Introduction and

History of Software Architectures, Components,

and Reuse,” Software Architectures: Advances and

Applications, L. Barroca, J. Hall, and P. Hall, eds.,

Springer-Verlag, 2000, pp. 1-11.

18. S.M. Eisenstein, Film Form: Essays in Film Theory,

Harcourt Brace Jovanovich, 1949.

19. R. Barthes, “The Sequences of Actions,” Patterns of

Literary Style, J. Strelka, ed., State Univ. of

Pennsylvania Press, 1971.

20. H. Jenkins, Textual Poachers: Television Fans and

Participatory Culture, Routledge, 1992.

21. H.R. Varian, “The Law of Recombinant Growth,”

The Industry Standard, 6 Mar. 2000,

http://www.thestandard.com/article/display/

0,1151,11884,00.html.

22. M. Davis, “From Pirates to Patriots: Fair Use for

Digital Media,” IEEE MultiMedia, vol. 9, no. 4,Oct.–Dec, 2002, pp. 4-7.

Marc Davis is an assistant profes-

sor at the School of Information

Management and Systems (SIMS)

at the University of California at

Berkeley where he directs Garage

Cinema Research (http://garage.

sims.berkeley.edu). Davis’ work focuses on creating the

technology and applications that will enable daily

media consumers to become daily media producers. His

research and teaching encompass the theory, design,

and development of digital media systems for creating

and using media metadata to automate media produc-

tion and reuse.

Readers may contact Marc Davis at the School of

Information Management and Systems, Univ. of

California at Berkeley, 314 South Hall, Berkeley, CA

94720-4600; [email protected].

Documents

Editing out video editing