18
Virtual studio systems began as experimental prototypes that extended traditional chromakeying. Now commercial products based on graphics supercomputers are commonly used for broadcast production. We discuss this evolution and consider extensions, alternative approaches, and issues facing broadcasters who introduce virtual studio systems. I n video production, virtual studios pro- vide seamless real-time compositing of live video with synthetic or natural imagery. This new technique, sometimes called third-person virtual reality, lets those observing the “mixed signal” see people and other physical objects combined with a virtual environment. The hardware and software needed to achieve this comprise a virtual studio system, while the imagery combined with the live video constitutes a virtual set. Figure 1 shows 3DK, a typical virtual studio sys- tem developed at GMD (German National Research Center for Information Technology). A set of cameras equipped with tracking systems provide information about camera movement. The cameras produce foreground (FG) video sig- nals, while an image generation system such as a graphics supercomputer produces corresponding background (BG) signals and, optionally, mask signals (Figure 2). The foreground and background are then composited, often by chromakeying, and the composited outputs (foreground+background) made available to downstream components such as mixers, special effects devices, monitors, and recorders. In addition, each composited signal is fed back to the corresponding camera’s viewfind- er to assist the camera operator in shot placement. As Figure 1 shows, virtual studio systems divide naturally into three main subsystems: tracking, rendering, and compositing. For real-time perfor- mance, each uses special hardware capable of video-rate operation (50-60 Hz). Background Virtual studios use several video production techniques. We’ll review them first and give more detailed technical descriptions later. Chromakeying Chromakeying, a staple of video production, provides a good starting point for understanding the historical development of virtual studios. In traditional chromakeying, the subject is shot against a constant background such as a blue cur- tain or screen. This “blue screen” shot then passes through a chromakeyer, where it is combined with a second shot containing the new back- ground. Conceptually, chromakeyer operation is simple: Replace the foreground with the back- ground in those places where the foreground con- tains a particular color known as the key color. (Strictly speaking, a key is associated with a region in color space rather than a single color.) The chromakeyer operator selects the key color or the chromakeyer itself may do so automatically. Unfortunately, simple comparison of fore- ground pixels to the key color produces unnatu- rally hard edges around foreground objects (such as hair with a sculpted appearance). Also, shadows of foreground objects falling on the blue screen will be clipped out. Modern chromakey systems therefore apply techniques to preserve foreground detail and suppress artifacts such as blue fringes around foreground objects. Despite chromakey systems’ sophistication, their operation imposes a fundamental constraint: The foreground camera (and background camera, if the background is a camera shot) cannot move—it must be “locked off” for the shot’s dura- tion. Consider a typical chromakeying situation, the weather announcer and map. If the camera shooting the announcer pans, the announcer appears to slide over the (fixed) background; if the camera moves forward or back, or zooms, the announcer appears to grow or shrink. In other words, the spatial relationships existing between the two layers (such as a foreground object appearing left of a background object) are not consistently maintained. Fixing camera position helps maintain the coherence of foreground and background layers but is extremely limiting. As a result, a number of techniques have been devel- oped to extend chromakeying’s usefulness. Synthevision In the late 1980s, researchers at the Japanese broadcast company NHK developed a form of 18 1070-986X/98/$10.00 © 1998 IEEE Virtual Studios: An Overview Simon Gibbs, Costas Arapis, Christian Breiteneder, Vali Lalioti, Sina Mostafawy, and Josef Speier GMD Institute for Media Communication Virtual Studios .

Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

Virtual studiosystems began asexperimentalprototypes thatextended traditionalchromakeying. Nowcommercial productsbased on graphicssupercomputers arecommonly used forbroadcastproduction. Wediscuss this evolutionand considerextensions,alternativeapproaches, andissues facingbroadcasters whointroduce virtualstudio systems.

In video production, virtual studios pro-vide seamless real-time compositing of livevideo with synthetic or natural imagery.This new technique, sometimes called

third-person virtual reality, lets those observingthe “mixed signal” see people and other physicalobjects combined with a virtual environment. Thehardware and software needed to achieve thiscomprise a virtual studio system, while theimagery combined with the live video constitutesa virtual set.

Figure 1 shows 3DK, a typical virtual studio sys-tem developed at GMD (German NationalResearch Center for Information Technology). Aset of cameras equipped with tracking systemsprovide information about camera movement.The cameras produce foreground (FG) video sig-nals, while an image generation system such as agraphics supercomputer produces correspondingbackground (BG) signals and, optionally, masksignals (Figure 2). The foreground and backgroundare then composited, often by chromakeying, andthe composited outputs (foreground+background)made available to downstream components suchas mixers, special effects devices, monitors, andrecorders. In addition, each composited signal isfed back to the corresponding camera’s viewfind-er to assist the camera operator in shot placement.

As Figure 1 shows, virtual studio systems dividenaturally into three main subsystems: tracking,rendering, and compositing. For real-time perfor-mance, each uses special hardware capable ofvideo-rate operation (50-60 Hz).

BackgroundVirtual studios use several video production

techniques. We’ll review them first and give moredetailed technical descriptions later.

ChromakeyingChromakeying, a staple of video production,

provides a good starting point for understandingthe historical development of virtual studios. Intraditional chromakeying, the subject is shotagainst a constant background such as a blue cur-tain or screen. This “blue screen” shot then passesthrough a chromakeyer, where it is combinedwith a second shot containing the new back-ground. Conceptually, chromakeyer operation issimple: Replace the foreground with the back-ground in those places where the foreground con-tains a particular color known as the key color.(Strictly speaking, a key is associated with a regionin color space rather than a single color.) Thechromakeyer operator selects the key color or thechromakeyer itself may do so automatically.

Unfortunately, simple comparison of fore-ground pixels to the key color produces unnatu-rally hard edges around foreground objects (suchas hair with a sculpted appearance). Also, shadowsof foreground objects falling on the blue screenwill be clipped out. Modern chromakey systemstherefore apply techniques to preserve foregrounddetail and suppress artifacts such as blue fringesaround foreground objects.

Despite chromakey systems’ sophistication,their operation imposes a fundamental constraint:The foreground camera (and background camera,if the background is a camera shot) cannotmove—it must be “locked off” for the shot’s dura-tion. Consider a typical chromakeying situation,the weather announcer and map. If the camerashooting the announcer pans, the announcerappears to slide over the (fixed) background; if thecamera moves forward or back, or zooms, theannouncer appears to grow or shrink. In otherwords, the spatial relationships existing betweenthe two layers (such as a foreground objectappearing left of a background object) are notconsistently maintained. Fixing camera positionhelps maintain the coherence of foreground andbackground layers but is extremely limiting. As aresult, a number of techniques have been devel-oped to extend chromakeying’s usefulness.

SynthevisionIn the late 1980s, researchers at the Japanese

broadcast company NHK developed a form of

18 1070-986X/98/$10.00 © 1998 IEEE

Virtual Studios:An Overview

Simon Gibbs, Costas Arapis, Christian Breiteneder, Vali Lalioti, Sina Mostafawy, and Josef Speier

GMD Institute for Media Communication

Virtual Studios

.

Page 2: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

19

camera1

Sirius1track1

FG1 FG1 + BG1

FG2 + BG2

BG1

mask1

keyer1pipe1

camera2

Sirius2track2

FG2

BG2mask2

keyer2pipe2

FG3 + BG3

Video mixeror switchSGI Onyx

camera3

Sirius3track3

FG3

BG3mask3

keyer3pipe3

Virtual studio system Other studio equipment

Rendering Tracking Compositing Monitoring recording, etc.

(a) (b) (c)

(d) (e) (f)

Figure 1. A virtual

studio system based on

the SGI Onyx and Sirius

video board.

Figure 2. Video signals used in a virtual studio system: (a) background, (b) foreground, (c) background plus foreground, (d) foreground

mask, (e) background mask, and (f) foreground mask plus background mask.

.

Page 3: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

chromakeying called Synthevision1 (now a prod-uct). Sensors attached to the foreground cameraproduce pan, tilt, zoom, and focus data. The cam-era motions are simulated on the backgroundlayer in real time, which involves altering thebackground perspective to match that of the fore-ground. The foreground and modified back-ground are then composited using a conventionalchromakeyer (see Figure 3). The background con-sists of an HDTV (high-definition television, or Hi-Vision) still frame, and the larger size gives moreimage area to simulate panning, tilting, andzooming.

Synthevision first appeared during the SeoulOlympics in 1988, and NHK uses it daily for news-casts. Although the background images are gener-ally still frames, the designers anticipated the truevirtual studio and noted that “by using CG imagesin backgrounds, studio sets that cannot possiblyexist can be used easily for program production.”1

NHK continues to experiment with camera track-ing and extensions to chromakeying. For exam-ple, they use multiple video layers to providerealistic motion parallax and have incorporatedreal-time computer-generated sets in several pro-ductions.

Camera gangingAn alternative to coupling a real and a virtual

camera is “ganging” two real cameras in a master-slave relationship (see Figure 4). Typically, theforeground camera operates as master, and thebackground camera duplicates any motion itmakes. This technique is common in special-effects production and proves particularly useful

if more than one layer contains live (as opposedto computer-generated) video. Possible forms ofganging include

❚ recording one camera’s moves and using thedata to drive a second camera,

❚ physically interconnecting two cameras so thatthe slave camera duplicates the master camer-a’s motion as the operator moves, and

❚ controlling cameras by an external source(such as a computer) that directs them eithersimultaneously or at different times through asequence of moves.

Prerendered virtual setsIf camera motion and settings are known, it is

possible to prerender the virtual set and run therendered material and the camera materialthrough a chromakeyer. BBC has used this tech-nique in newscasts,2 rendering the virtual set fieldby field to a hard disk recorder and then mixing itwith a live camera (showing the announcer) justprior to broadcast. This technique has two basicvariants:

❚ Record the foreground and either track thecamera or calculate camera motion via frame-by-frame analysis (not necessarily in real time).Then use the motion data to position the vir-tual camera and render the virtual set, thencomposite the prerecorded foreground andprerendered background.

❚ Specify camera motion, for example using ani-mation software, and prerender the back-ground and store it on disk or tape. Producethe foreground using a servo-control systemthat drives the foreground camera through thepredetermined move sequence. Playing backthe background in sync with the foregroundpermits live compositing (see Figure 5).

Early virtual studio systemsReal-time virtual sets were most likely first used

in Japan in 1991. NHK used a prototype virtualstudio system to produce “Nanospace”, a sciencedocumentary. The show was broadcast in 1992and, in hindsight, was ahead of its time. (For moreabout this work, see “An Image Compositing Sys-tem Based on Virtual Cameras” in this issue, p.36.)

NHK’s pioneering work contained the mainelements of a virtual studio system: real-time

20

BGprocessor Keyer

FG

FG + BG

BG

BGsource

Camera

Track

Figure 3. NHK’s

Synthevision.

Keyer

FG

FG + BG

BG

Camera

Track

Camera

Drive

Figure 4. Camera

ganging.

.

Page 4: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

background rendering (using an SGIVTX) coupled with real-time fore-ground tracking (using a sensor system developed in-house). Unfor-tunately, NHK’s efforts were ham-pered by the graphics hardwareperformance then available. Com-mercial virtual studio systems didnot arrive until after SGI’s 1993introduction of the RealityEngine2.

Two other early virtual studio sys-tems were developed in Europe: Platform by IMP,a company with expertise in real-time graphics forvideo production; and Electronic Set (ELSET),developed by the Mona Lisa project,3,4 a EuropeanUnion RACE project led by the Hamburg produc-tion company VAP. In 1995, Discreet Logicacquired Platform and now markets it as Vapour.ELSET is available from Accom.

Table 1, adapted from Hughes,5 summarizesvirtual studio development (our apologies for anyerrors or omissions). We’ve added the first publicshowing of our own virtual studio system, 3DK.

Table 1 clearly indicates that virtual studiossystems are proliferating. Broadcasters showstrong interest, and many are planning or already

engaged in test productions. They are also start-ing to realize that virtual studios form just part ofa much larger trend. Broadcasting industry issuesrange from the move from analog to digital tech-nology, the future of HDTV, and the role of com-pression and digital transmission systems tointeractivity and the Internet. Studios and pro-duction processes will likely undergo majorrestructuring over the next decade, and the adop-tion of virtual studio techniques constitutes justone element of this industry transformation.

Digital video technologyVirtual studio systems generally use the digital

video standard CCIR 601, which specifies an

21

January–M

arch 1998

Keyer

FG

FG + BG

BG

Camera

Drive

DDR

Figure 5. Prerendered virtual set. A

frame sequence is stored on a digital

disk recorder (DDR) and the

corresponding sequence of camera

motion parameters is stored in the

camera drive unit. The two sequences

are played back in sync to a reference

time code signal.

Table 1. Virtual studio chronology.

Date (Event) Company (Product)1988 (Seoul Olympics) NHK (Synthevision—2D background)

1991 NHK (in-house system)

1992 Ultimatte (prerendered virtual set)

1993 BBC (in-house system—prerendered virtual set)

March 1994 (National Association

of Broadcasters—NAB) IMP (Platform)

September 1994 (International

Broadcasting Convention—IBC) IMP (Platform), VAP (ELSET)

November 1994 GMD (3DK)

April 1995 (NAB) IMP (Platform), Accom (ELSET), RT-Set Ltd. (Larus, Otus)

ElectroGIG (Reality Tracking)

Softimage/INA (Virtual Theater/Hybrid Vision)

September 1995 (IBC) IMP (Platform), Accom (ELSET), RT-Set Ltd. (Larus, Otus)

ElectroGIG (Reality Tracking)

Softimage (Virtual Theater)

Orad (Virtual Set)

April 1996 (NAB) Discreet Logic (Vapour), Accom (ELSET),

RT-Set Ltd. / Chyron (Larus, Otus)

ElectroGIG (Reality Tracking)

Orad (Cyberset), Evans and Sutherland (MindSet)

Vinten(VideoScape), Radamec (Virtual Scenario)

1997 Primarily the above with PC-based versions now appearing

.

Page 5: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

uncompressed digital representation for analogYUV video. It uses a base sampling frequency ofapproximately 3.375 MHz and samples the ana-log components at various multiples of this fre-quency. For instance, “4:2:2” indicates that Y (theluminance component) is sampled at four timesthe base frequency while UV (the two chromacomponents) are sampled at two times the basefrequency. Each component’s sample size is 10bits (older equipment uses 8 bits), yielding a datarate of 270 Mbps for 4:2:2. Frame sizes are 720 ×576 for PAL (Phase Alternate Line, a primarilyEuropean television standard) and 720 × 486 forNTSC (National Television System Committee);unfortunately, the pixel aspect ratio is not 1:1.

An extension to CCIR 601 specifies an inter-connect standard (often called serial digital inter-face, or SDI) that uses normal coaxial cable tocarry digital video signals. This industry-support-ed interface allows the connection of a wide vari-ety of video equipment including digital videotape recorders (D1, D5, and digital Betacam), diskrecorders, mixers, routing switches, frame syn-chronizers, and converters. The MPEG-2 encodersfound in digital broadcast systems often have ser-ial digital interfaces.

The 4:2:2 format, and 4:0:0 for luma-only sig-nals such as keys, is a mainstay of video postpro-duction environments. The D1 and D5 video taperecorders and the new disk-based storage systemspermit building up complex multilayer compos-ites without the generation loss found in analogsystems. This format also works well for virtualstudio systems, since background generation istypically a digital process. In other words, the fore-ground, background, and mask signals used in avirtual studio system can all be based on the samedigital video format. No conversions between ana-log and digital are required (other than initial dig-itization of the camera signal). Of course, afterproduction and any editing or subsequent pro-cessing, the signal may be converted to analog fordistribution.

Blue room design and lightingSince virtual studio systems allow camera

movement, the design of the blue backingbecomes more important than with “locked off”cameras. For a weather announcer and map, onecan probably get away with a fairly small bluescreen (perhaps 2 × 1.5 meters) placed behind theannouncer. Once the camera is free to move, alarger blue screen gives better results. Further-more, long shots require blue not only behind the

announcer but also on the floor. For this reason,shooting with a virtual studio system often occursin a blue room (or “blue box”) consisting of oneor more blue walls and a blue floor.

Lighting is probably the most important con-sideration in blue room design. Good composit-ing depends on uniform background lighting.Adding a floor complicates lighting—because thefloor and walls have different orientations to lightsources, they appear as slightly different shades ofblue. This is particularly noticeable along theedges and may show through as shadowy bandsin the composite. Curved sections between wallsand the floor and between the walls soften theseedges, and using the same material for floor andwalls makes lighting easier. Although addingcurved sections diminishes the blue room’s usablearea, the virtual set can be much larger than thephysical set.

The physical set need not be blue—chro-makeyers and other compositing devices workwith red or green levels, luminance levels, or evenarbitrary colors. However, blue offers severaladvantages. Blue complements flesh tones and sobetter preserves them (at least by some composit-ing processes); also, a slight blue tinge aroundobjects is less objectionable than a green tinge.Actors probably find working in a blue room morepleasant than working in a strongly saturatedgreen or red room.

In addition to the need for overall uniformlighting on the backing, other factors influenceblue room lighting. First, a lot of blue lightreflects onto objects. This “blue spill” can beremoved during the compositing process butresults in a loss of side and back lighting on thesubject that should be compensated for withadditional lighting. Second, the intensity andtone of lighting on foreground objects shouldbalance that of the background—a brightly lit realobject placed amidst dimly lit virtual objects willlook unnatural. Third, some compositingprocesses, such as Ultimatte, can preserve shad-ows cast by objects in the blue room. This isextremely useful, since shadows “anchor” realobjects on the virtual floor. This requires specialattention to lighting, both to produce the shad-ows and to assure consistency with shadows castby virtual objects (they should at least fall inmore or less the same direction). Finally, setdesigners can place blue objects in the blue roomand then “cover” them with virtual objects,allowing virtual objects to support or occlude realobjects. Adding blue objects again complicates

22

IEEE

Mul

tiM

edia

.

Page 6: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

lighting: An unwanted shadowy outline mayappear around the blue object and disappear onlywhen the compositing process is adjusted toremove shadows completely.

Camera trackingObviously, to render the virtual background,

we need to know the real camera’s position andorientation. We must also obtain its zoom settingand, if simulating depth of field on the virtualbackground, the focus and aperture settings aswell. (In general, the aperture seldom changesand, if not tracked, can be entered manually intothe virtual studio system.) The tracking systemsused in virtual studios fall into two main cate-gories: electromechanical and optical. There existsno ideal camera tracking system as of yet, andboth approaches have deficiencies.

Electromechanical tracking. Developed firstand still more commonly used, electromechani-cal systems can be active or passive. In active sys-tems, servo-control mechanisms “drive” thecamera to allow, for instance, remote control. Inpassive systems, sensors detect the camera opera-tor’s actions. (The sensors are often opticalencoders that detect small markings on rings orother moving parts; despite this we refer to thesesystems as electromechanical.)

For both active and passive tracking, the sys-tem must determine the lens objective’s zoom andfocus ring positions plus the camera mount’s var-ious degrees of freedom. Possible mounts, withtheir degrees of freedom, include

❚ a fixed tripod: pan and tilt

❚ a stationary pedestal: pan, tilt, and elevation(see Figure 6)

❚ a moving pedestal or dolly: pan, tilt, elevation,and displacement in one (rolling along a track)or two dimensions

❚ a stationary boom: pan, tilt, and boom lift andswing (see Figure 6)

❚ a moving boom: the above plus displacementin one or two dimensions

Electromechanical tracking can be highly accu-rate, providing angular measurements to thou-sandths of a degree and displacements tohundredths of a millimeter.6 In addition to rulingout the use of hand-held cameras, however, it hasseveral drawbacks:

❚ calibration—accurate registration requiresdetailed measurement of lens characteristics.

❚ alignment—prior to shooting, the tracking sys-tem must be aligned using potentially time-consuming procedures to determine thecamera’s initial position and orientation andto level the camera mount.

❚ backlash and vibration—the camera mount isnot perfectly rigid and may bend slightly dur-ing rapid acceleration or deceleration or whenunder heavy load. This is particularly severe forlong boom arms.

The electromagnetic tracking systems used in

23

January–M

arch 1998

Pan

Elevation

Tilt

Tilt

Lift

Pan

Swing

Figure 6. Camera

mounts: a studio

pedestal and boom.

.

Page 7: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

many virtual reality applications are not suitablefor virtual studio work. Aside from interferencedue to the proximity of lighting and video equip-ment, questions of accuracy and stability arise.With a video camera fully zoomed in, the hori-zontal field of view may be as little as five degrees.Each pixel in a video frame then covers an angleof less than 0.01 degrees, and even a slight jitterin the tracking system can shift the backgroundby several pixels. A jumpy background is itselfannoying, but when composited with a stationaryforeground, the result is unacceptable. If one layerof the composite shifts unexpectedly with respectto the other, the image integrity breaks down andthe viewer becomes distracted.

Optical tracking. The alternative to electro-mechanical tracking—optical tracking—uses pat-tern recognition. With visible reference points orgrid lines in the blue room, image processing tech-niques can extract the camera’s position, orienta-tion, and field of view. Though extremely difficultto achieve in real time with the accuracy neededfor virtual studio work, this eliminates the needfor painstaking calibration of the lens system andcan be used with any camera mount, includinghand-held cameras. Optical tracking has somepotential pitfalls, though:

❚ The tracking system may become disorientedif markers are out of focus or if an insufficientnumber are in view.

❚ The markers’ real-world coordinates must beknown precisely.

❚ Processing demands may make the trackingsystem lag behind camera movement by sever-al frames, or keep operation below the videofield (or even frame) rate.

❚ Markers in a blue room may create com-positing difficulties. The markers must be dis-tinguishable from the blue backing butshould not be visible in the composite.

Tracking systems now on the market. TheUltimatte Memory Head, shown in Figure 7, wasthe first commercial camera tracking system suf-ficiently accurate for virtual studio work. Thisactive electromechanical system measures cam-era pan, tilt, zoom, and focus four times per frame(supporting frame rates of 24, 25, and 30 Hz).Tracking data can be recorded onto a floppy diskresiding in a control unit, sent directly to a com-puter over a serial line, or sent to a slave camera.In addition, the Ultimatte Memory Head canrespond to tracking data coming from a secondcamera or a computer and can thus be used forganging and with prerendered virtual sets (suchas the configuration shown in Figure 5).

The Institut National de l’Audiovisuel (INA) inFrance produced an early virtual studio systemcalled Hybrid Vision. It uses a passive electro-mechanical tracking system based on a Sachtlerpan/tilt head with two high-resolution motionsensors and similar sensors on the zoom and focusrings. A useful audio encoder/decoder convertstracking data to an audio signal and permitsrecording camera motion on video tape with theaccompanying camera image signal.

24

IEEE

Mul

tiM

edia

Figure 7. The Ultimatte

Memory Head (left) and

a Thoma System with

dolly and track (right).

.

Page 8: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

The Thoma System, a passive electromechani-cal system, consists of up to eight motion sensors.In addition to pan, tilt, zoom, and focus, the sys-tem can be configured for use with pedestal, dolly,and boom mounts (see Figure 7). It produces dataat four times the video frame rate, sending eachset of measurements in a 25-byte package over ahigh-speed serial line to the host computer.

The Radamec Virtual Reality Encoder Head isan active electromechanical system that providespan and tilt data. An optional lens encoder pro-vides zoom and focus. The precision is veryhigh—the zoom and focus resolution exceeds10,000 samples.

The Vinten TSM AutoCam, also an activeelectromechanical system, is a high-precisionpan/tilt head with an accuracy of 36 arc-seconds(about 0.01 degrees). A lens encoder can be addedfor zoom and focus control.

Milo6 is a boom and track system developed byMark Roberts Motion Control. Its extremely rigidconstruction permits fast moves along severalaxes. This active electromechanical system trackspan, tilt, zoom, and focus; boom lift and swing;and track displacement. (Head roll and an addi-tional tilt-like axis are optional.)

Orad Hi-Tech Systems’ optical tracking systemis currently the only product of its kind on themarket. It places a grid-like pattern in the blueroom and passes the camera signal through a spe-cially designed 30 gigaops-per-second digitalvideo processor. Orad’s pattern recognition soft-ware can then estimate camera position, orienta-tion, and zoom. The video processor uses severalTexas Instruments single-chip multimedia videoprocessors.

RenderingA virtual studio system’s rendering component

must produce the background layer and mask sig-nals in sync with camera movement. If producedat the video frame rate, the background willappear jerky during fast camera moves, hence ren-dering should match the video field rate. For CCIR601 video with PAL timing, the renderer mustthen generate a 720 × 576 image every 20 ms. Twobasic approaches to background rendering exist:the first uses a 2D background image, and the sec-ond requires a 3D model.

2D background rendering. Suppose we havean image, the background source, with a knowncenter of projection. Placing the virtual camera atthis point lets us simulate pan, tilt, and zoom (and

camera roll if needed) by applying a perspectivetransformation to a region of the backgroundsource image (see Figure 8). We can achieve theperspective change using a digital video effects(DVE) processor or a 3D graphics system:

❚ DVE processors perform various 2D and 3Dtransformations in real time; those with exter-nal control interfaces can collect the necessarytransformation parameters from the cameratracking system.

❚ A 3D graphics system permits applying thebackground source image as a texture on aplane and positioning the virtual camera at thecenter of projection. The virtual camera is thencoupled to the camera tracking system.

The background source image may be a pre-rendered view of a virtual set, a photograph, avideo sequence (the background source need notbe static, but the center of projection cannotchange), or a composite of these. Thus we canhave arbitrarily complex real and syntheticimagery in the background—for instance, an out-door camera shot combined with a renderedmodel of a spaceship.

25

January–M

arch 1998

Figure 8. 2D back-

ground rendering:

simulating a

pan and zoom.

Imag

e co

urte

sy V

OX

/GM

D

.

Page 9: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

This technique has several limitations. First, itrequires some planning, since the position and ori-entation of the camera used to produce the back-ground source must correspond to that of theforeground camera in the studio. Second, the fieldof view and resolution of the background sourcemay limit foreground camera pan, tilt, and zoom.This poses a problem when using a DVE becausethe background source is limited to video resolu-tion. Using a graphics system for background pro-cessing, we can render a very high-resolutionenvironment map that gives full pan and tiltmotion and much greater zoom range. Third, per-haps the main drawback, this technique constrainsforeground camera movement—the camera mustremain at the original center of projection. As we’llsee later, a pan or tilt also causes a camera dis-placement that, unless corrected for (by similarlydisplacing the virtual camera), will cause a regis-tration error between foreground and background.

3D background rendering. This approachmodels the virtual set in 3D and renders it with a

virtual camera having the same position, orienta-tion, and field of view as the foreground camera.There are no constraints on camera motion—thecamera operator is free to move and position shotsas needed (provided the motion can be tracked).

Constraints do affect model complexity. Withonly 20 ms available for rendering, virtual setmodels must be carefully designed and tuned.Models typically have a fairly low polygon countand make heavy use of textures. The virtual setdesigner should also consider performance opti-mization techniques such as spatial organizationfor efficient culling and multiple levels of detail.Another common technique is to pass the modelthrough a radiosity calculation (which tends toexplode the polygon count) and then apply thesolution using texture maps on the originalmodel.

Clearly, hardware support for texture mappingis essential; the hardware must also take care ofantialiasing. Currently, most virtual studio sys-tems that perform 3D rendering use either the SGIRealityEngine2 or the new InfiniteReality(although lower performance machines can beused to preview off-air cameras).

CompositingThe final stage of a virtual studio system com-

bines the live foreground signal with the renderedbackground signal. The video term for this is key-ing, since an intermediate signal acts as a “key-

26

Layer 1 Layer 2 Key Composite

+ + =

Figure 9. Compositing

with a key signal.

Figure 10. Compositing:

(a) color wheel, (b) color

wheel plus black

background

(chromakey), (c) color

wheel plus black

background (Ultimatte),

(d) blue screen shot,

(e) blue screen shot plus

gray background

(chromakey), (f) blue

screen shot plus gray

background (Ultimatte).

(a) (b) (c)

(d) (e) (f)

.

Page 10: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

hole” and determines where one video layershows through another (see Figure 9). Blue screencompositing derives the key signal from the fore-ground signal (self-keying). Examples of self-key-ing processes include luminance keying, where thekey depends on the signal’s luminance values, andchromakeying.

A binary key signal makes edges appear jaggedand cut out, and does not support transparencyeffects. To remedy this, allow the key signal tovary over a range and control the mixing of thetwo layers, much like alpha blending. This linearor additive keying gives more natural edges and pre-serves fine detail but tends to leave a blue fringearound objects. It also fails to compensate for“blue spill” and gives poor results with partiallytransparent objects such as glass and smoke.

Ultimatte, a more complex compositingprocess and a near de facto standard in virtualstudio systems, is not a chromakeyer, strictlyspeaking. Like a chromakeyer, however, it pro-duces composites from blue screen shots. An Ulti-matte keyer operates in three main steps. First, itfactors out the backing color’s blue level to leavethe foreground black where backing was presentand reduce blue elsewhere. Next, it generates a keysignal based on the amount of backing colorremoved. Finally, it mixes the processed fore-ground and background together using the keysignal. Ultimatte preserves foreground edge detailand can composite out-of-focus edges and semi-transparent objects; it also preserves shadows castby foreground objects onto the blue backing.

Figure 10 illustrates both chromakeying andUltimatte. To highlight their differences, we adjust-ed the chromakey so that the key is nearly binary(notice how the altered region of the color wheelin Figure 10b is clearly defined and there are hardedges around foreground objects in Figure 10e).

TechniquesA successful virtual studio depends not only on

powerful technology but knowing how best to useit. Here we identify some common techniques.

Camera calibrationWith blue room shots, it’s useful to think of

“anchor points,” places in the foreground thatshould appear to be in physical contact with thebackground. The feet of people standing in theblue room form anchor points, as do physicalprops. The absence of such points makes judgingforeground object position difficult and hidesslight errors in registration between the fore-

ground and background. Anchor points amplifyregistration errors by drawing attention to fore-ground objects that appear to “slide” over thebackground during camera movement. The loss ofcontinuity from such sliding is extremely dis-tracting and generally unacceptable, making accu-rate registration essential to virtual studio systems.The key to accurate registration is, in turn, accu-rate camera calibration. Depending on the type oftracking used—electromechanical or optical—dif-ferent calibration procedures apply.

Calibration for electromechanical cameratracking. Consider an electromechanical pan/tilthead with sensors on the focus and zoom rings;assume the camera is placed on a fixed tripod. Thetracking system generates a set of measurements<mp, mt, mz, mf> for each video field (referring to thepan, tilt, zoom, and focus measurements, respec-tively). Now we must deduce the parameters of thevirtual camera used to render the virtual set. At firstglance, since the position of the real camera isfixed, we need only rotate the virtual camera by thepan and tilt angles and set the field of view to thatindicated by the zoom measurement. However,this assumes that the real camera behaves like apinhole camera mounted at the intersection of thepan and tilt axes, which is unlikely. In general, forcameras with moving lens assemblies, the center ofprojection shifts as the various lenses move.

First-order or “thin lens” optics model a lenssystem using six parameters: a pair of focal points(F1, F2), principal planes (H1, H2), and nodal points(N1, N2). The nodal points are of particular inter-est because if a ray is traced through the first nodalpoint, the lens system displaces the ray so that itemerges from the second point (see Figure 11).

27

January–M

arch 1998

F1

F2

N1

N2

H1

H2

Figure 11. A lens system

(left) is modeled (right)

by a pair of focal points

(F1, F2), principal planes

(H1, H2), and nodal

points (N1, N2). The first

nodal point makes

incoming rays appear as

if originating from the

second nodal point.

.

Page 11: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

N1 thus behaves as a center of projection on whichthe virtual camera must be positioned to achieveconsistent registration of virtual and real images.A major difficulty is that N1 shifts along the opti-cal axis as the zoom and focus lenses move.

In addition to the nodal point, we need twoother pieces of information for registration: thefield of view (which varies with zoom and focussettings) and the first radial distortion coefficient(zoom lenses introduce considerable radial distor-tion, which the rendering system should correct).

To summarize, we must determine the follow-ing functions for the camera lens when using anelectromechanical tracking system:

❚ N1(mz, mf): position of first nodal point (alongoptical axis)

❚ fov(mz, mf): horizontal field of view

❚ k1(mz, mf): first-order radial distortion coefficient

These functions will vary from one lens assemblyto another but can be determined offline andsemi-automatically (for instance, using special tar-gets as shown in Figure 12). Figure 13 shows sometypical plots—note that the nodal point shifts bynearly one meter, and that radial distortion intro-duces errors of up to 30 pixels (or about four per-cent of frame width).

When the camera is mounted, two quantitiesmust be measured accurately: Oh, the height of theoptical axis above the tilt axis, and Lc, the distancefrom the pan axis to the point from which N1 ismeasured (see Figure 14). Knowing these quanti-ties and the camera mount position permits cal-culating the world coordinates of the nodal pointfor any <mp, mt, mz, mf> provided by the trackingsystem.

Calibration for optical camera tracking.Optical tracking systems match real-world pointsto image points. More specifically, given a set ofpoints for which the real-world coordinates (x, y,z) are known, the tracking system identifies thecorresponding point in image coordinates (u, v).With sufficient matches it is possible to derivecamera position and orientation. A popular modelused for this purpose, Tsai’s camera model,7 has11 parameters, including five internal parameters:

❚ f: effective focal length

❚ k1: first-order radial distortion coefficient

28

Figure 12. Lens

calibration targets:

nodal point and field of

view.

10,0008,000

6,0004,000

2,0000

Nod

al p

oint

shi

ft (

in m

illim

eter

s)

800

600

400

200

0

02,0001,000

3,000 4,0005,000 6,000

5,000 6,0004,0003,0002,0001,0000

Hor

izon

tal f

ield

of v

iew

(in d

egre

es)

60

50

40

30

10

20

8,00010,000

6,0004,000

2,0000

Max

imum

hor

izon

tal d

isto

rtio

nco

rrec

tion

(in p

ixel

s)

30

25

20

15

5

10

−5

0

Zoom

Zoom

Focus

Focus

5,000 6,0004,0003,0002,0001,0000

8,00010,000

6,0004,000

2,0000 Zoom

Focus

Figure 13. Nodal point shift, field of

view, and radial distortion plotted

against zoom and focus settings (data

for a 12:1 zoom courtesy Hendrik

Fehlis and Thorsten Mika).

Imag

e co

urte

sy o

f Hen

drik

Feh

lis a

nd T

hors

ten

Mik

a

.

Page 12: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

❚ Cu, Cv: center coordinates of the radial distor-tion

❚ su: scale factor

and six external parameters:

❚ p, t, r: camera pan, tilt, and roll with respect toworld coordinates

❚ x, y, z: camera position in world coordinates

The model also includes several constants (suchas total number of pixels in the u and v directions)that can be determined from manufacturers’ spec-ifications or measured directly.

Tsai’s calibration algorithm takes a set ofmatched points and solves for the model parame-ters. Generally, an optimized solution requires atleast 11 points, though if the real-world points arecoplanar, then as few as five suffice. As an exam-ple, one experimental optical tracking system8

uses eight points arranged as two coplanar groups(Figure 15).

SynchronizationFigure 16 (next page) shows a timing diagram

for a virtual studio system. Note that all compo-nents are “genlocked” to a master sync signal(provided by a video sync pulse generator). Gen-locking is useful for several reasons. First, videomixing equipment such as a chromakeyer or Ulti-matte requires that the foreground and back-ground signals be in video sync. Second, themaster sync provides a common time base for allcomponents to simplify detecting and correctingtiming problems.

In Figure 16, video fields are numbered sequen-tially, so fg0 indicates the first foreground field.Assume that the tracking system produces data atthe field rate; these measurements are indicatedby the sequence of mi. Also assume that the ren-derer is double-buffered and renders at the fieldrate. It will take some time for a measurement tobe transmitted to the renderer and the back-ground to be rendered, hence bg0 is not placed inthe front frame buffer until field period 1.

Next the renderer’s output (located in the frontframe buffer) is scan converted to a CCIR 601 sig-nal; assume this takes two field periods. The finalresult is that bg0 is not composited with fg0 butwith fg3, or, in other words, the background isdelayed by three fields—for PAL, a delay of 60 ms.

Visually, delay in a virtual studio system man-

ifests itself as loss of registration during cameramoves. For instance, if the operator pans the cam-era and comes to a rapid halt, the viewer sees a stationary foreground while the background con-tinues to pan for a few fields. The delay can behidden by low camera acceleration or decelera-

29

January–M

arch 1998

Pan axis

Optical axis

Nodal point

Tilt axis

Oh

Lc N1

Figure 14. Optical axis

and nodal point offsets

(Oh and Lc).

Figure 15. Optical

tracking: foreground

(blue room) shot on top,

composite on bottom.

Eight reference points

are used, four on the

podium and four on the

hanging blue frame.

.

Page 13: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

tion, but this constrains the camera operator. Ingeneral, virtual studio systems must compensatefor background delay. Some possible techniquesinclude

❚ Advancing the tracking data. Apply predictivefiltering to tracking measurements and extrap-olate the data forward in time.

❚ Delaying the camera head. Some active electro-mechanical tracking systems can be pro-

grammed to delay before acting on moverequests (though the move would be passed onto the rendering system without delay).

❚ Delaying the foreground. A video delay linecan be introduced between the camera and thecompositing system (see Figure 17).

A correctly compensated delay, no matter howlong, is not noticeable to someone viewing onlythe final composite. It may, however, still be

30

IEEE

Mul

tiM

edia

Keyer

FG

FG + BG

BG

Renderer

Camera

Track

Videodelay

Figure 17. A virtual

studio system

configured with a

foreground delay.

camera1

camera1 Sirius1

FG

BG

Camera

RGB buffer

Alpha buffer

Sirius

Track

Ext key(FG mask)

UltimateSystem 7

RealityEngine2

4:2:2

4:2:2

4:0:0

Figure 18. A virtual

studio system

configured with a

foreground mask.

Master sync

Tracking

Back framebuffer

Front framebuffer

Scan converion

Camera

Compositing

0 1 2 3 4 5

m0 m1 m2 m3 m4

bg0 bg1

bg0 bg1 bg2 bg3

fg1fg0 fg2 fg3 fg4

bg0 bg1

fg3 + bg0 fg4 + bg1

bg2 bg3 bg4

Figure 16. A virtual

studio timing diagram.

Time proceeds left to

right, and the

horizontal intervals

denote field periods.

.

Page 14: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

noticeable to people inthe studio. For exam-ple, the video delayline, the simplest toimplement, requires amatching audio delayto maintain lip sync. Adelay of more than afew frames can beextremely distractingto an actor should hehear both his ownvoice and the delayedsignal. Similarly, thecamera operator, ob-serving the delayedcomposite signal in theviewfinder, will noticethat the image does notrespond immediatelyto camera movements.

Advancing trackingdata or delaying thecamera head also pre-sent problems, in partic-ular when the delay isvery long (four framesor more). Advancing tracking data is error-proneand may be visible during abrupt camera move-ment, while adding a delay to the camera head canmake the camera difficult to handle.

Real-time effectsA main strength of virtual studio systems is that

they are “live”: rendering and compositing are donein real time rather than offline or during postpro-duction. Virtual studios can perform many anima-tions and special effects seen with offlinesystems—for example, props can move and changetheir size or morph from one form to another. Suchanimation is manually triggered and controlled or,in a tightly scripted production, triggered by someform of time code. Since these effects must be per-formed in real time, the modeler must be aware oftheir performance impact; thus virtual studio effectsare very “device dependent.” Here we discuss spe-cial effects that rely on features of the rendering andcompositing hardware, in particular the commonlyused Ultimatte System 7 (or System 8), the SGI Real-ityEngine2 (or InfiniteReality), and the SGI Siriusvideo board.

Foreground masks. The Ultimatte, like manychromakeyers, accepts an external key that is

combined with the key generated from the fore-ground signal. The external key identifies back-ground regions that should appear over theforeground, hence it masks out the foreground. Asan example, Figure 2 shows a globe (a backgroundelement) forced in front of the blue room (fore-ground) signal. This is easy to implement andforms the basis for effects such as people movingbehind virtual walls or furniture. Using a Reali-tyEngine2, the foreground mask is typically ren-dered in the alpha buffer, then output by theSirius board in 4:0:0 format and sent directly tothe keyer (see Figure 18).

Garbage masks. Suppose the blue room has alow ceiling or is too narrow for wide-angle shots.Knowing the geometry of the blue room, the posi-tion, orientation, and field of view of the camerapermits generating a “garbage mask” that forcesthe real ceiling (or side walls) of the blue room tobe keyed out. This can be accomplished by mod-eling a ceiling for the virtual set and creating aforeground mask, generating a horizontal band inthe alpha buffer, or controlling a keying windowin the compositing device (the latter two can beused even if the virtual set has no ceiling). Figure19 shows an example of the second method.

31

January–M

arch 1998

Figure 19. Garbage

masks: a) foreground,

b) garbage mask,

c) foreground+

background,

d) foreground+

background using

garbage mask.

(a) (b)

(c) (d)

.

Page 15: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

Z-mixing. A z-mixer combines two videosequences based on their depth values. Obtainingthe depth information proves difficult. VMix,9 ahardware device for z-mixing, takes the renderer’sz-buffer as a per-pixel source of depth values for thebackground; the foreground is assigned a constantz value. For a constant foreground z value, we canalso simulate z-mixing by rendering a foregroundmask with the far clipping plane set to the fore-ground z value, which is entered either manuallyor via a tracking system. For instance, Orad fixes asmall video camera above the blue room and deter-mines a representative foreground z value by iden-tifying objects in the blue room. How theforeground z value is obtained is important, sincemany values produce bizarre results (note how thepeople disappear into the globe in the left image ofFigure 20).

Video textures. The Sirius board supports full-frame-rate video textures, a powerful feature. Thetexture source can be any video device—usually avideo tape or hard disk recorder, although for tele-conferencing in the virtual studio, the sourcecould be a live camera satellite feed. The video tex-ture may appear as a simple flat “video wall” inthe virtual set or, like other textures, be mappedonto more complex geometry. If the video texturehas an accompanying key signal, the key can betreated as an alpha channel and parts of the videotexture can be made transparent. For instance, theannouncer appearing in the pyramid in Figure 21is actually a video texture with an alpha channelcreated by a chromakeyer.

Blue props. Placing blue objects of variousshapes and sizes in the blue room lets real objects“interact” with virtual objects. For instance, a bluebox aligned with a virtual table permits placingreal objects on or behind the table; or several blueboxes on the floor might allow an actor to sit onthe steps of a virtual staircase (see Figure 22).Using blue blinds lets actors disappear behind vir-tual walls and walk through virtual doorways.Although masks can achieve the same effect,blinds have the advantage that real shadows ontheir surfaces are preserved through compositingand will appear correctly placed on the corre-sponding virtual surfaces.

Virtual actors. Real-time animated charac-ters—virtual actors—provide another form ofinteraction between real and virtual objects. A vir-tual actor can respond to real actors’ movements

32

IEEE

Mul

tiM

edia

Figure 21. Video texture

with transparency.

Imag

e co

urte

sy W

DR/

GM

D

Figure 22. Sitting on

virtual steps.

Figure 20. Z-mixing

with poorly chosen

foreground z (top), and

with foreground mask

(bottom).

.

Page 16: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

and gestures in theblue room, and cancarry on conversationswith real actors orother virtual actors.One or more operatorstypically control a vir-tual actor’s movementsusing an input devicesuch as a body or facialmotion capture system,a dataglove, or simply ajoystick. One operatoralso provides the virtu-al actor’s voice. In alive broadcast, remoteviewers can be put incontrol of a virtualactor, for instanceusing telephone touchtones on a phone-inshow to “steer” the vir-tual actor about the set.

Shadows. A finalchallenge is addingshadows to virtual sets.Here we differentiatebetween real shadowson real objects, realshadows on virtualobjects, virtual shadows on virtual objects, andvirtual shadows on real objects. The first happensnaturally and depends on studio lighting; thethird is a traditional rendering problem. Uniqueto virtual sets are techniques for casting real shad-ows on virtual objects and virtual shadows on realobjects. As mentioned earlier, compositing hard-ware can generate a key that is only partiallytransparent in regions where shadows darken theblue room walls or floors. This also darkens thecorresponding areas of the background to createthe appearance of real shadows falling on the vir-tual background. However, this method requiresthat real and virtual geometry coincide and wouldnot give correct results if, for example, the blueroom floor is empty while the virtual floor is pop-ulated with objects.

For real shadows to actually wrap around vir-tual objects, requires multipass rendering tech-niques such as “projecting” the key signal as anadditional texture onto the virtual set. Techniquesfor casting virtual shadows on real objects are stillexperimental: possible approaches include physi-

cally projecting a rendered signal onto the blueroom and its occupants, or calculating a depthmap for the camera signal and providing this to az-mixer.

Commercial systemsTable 2 lists the virtual studio systems now on

the market (most Web page references also appearat the Virtual Studio Home Page, http://viswiz.gmd.de/DML/vstudioHome.html). In addition tothe system and company names, the table indicateswhether electromechanical or optical tracking isused, and whether background rendering is 2D or3D. Although most systems currently use electro-mechanical tracking, integrating optical trackingproves fairly simple. Also note that the 3D systemscan perform 2D rendering if needed.

The commercial 3D virtual studio systemsshow fairly similar functionality and performance(assuming similar hardware) but differ in accura-cy of tracking and registration, ease of use, andgeneral robustness. Rather than attempt to evalu-ate them (difficult to do without actually running

33

January–M

arch 1998

Table 2. Commercial virtual studio systems.

System Company Tracking Rendering3DK

http://viswiz.gmd.de/DML/vst/vst.html GMD Electromechanical 3D

Best

http://www.studio.sgi.com/Features/ Brainstorm Electromechanical 3D

VirtualSets/brainstorm.html

Cyberset I/O Orad Optical 3D

Cyberset D/E 2D

http://www.orad.co.il

Elset Accom Electromechanical 3D

http://www.studio.sgi.com/Features/

VirtualSets/accom.html

Hybrid Vision INA Electromechanical 3D

Larus, Otus

http://www.rtset.co.il RT-Set, Chyron Electromechanical 3D

MindSet Evans and Optical 3D

http://www.es.com/PR/pr04-15-96.html Sutherland

Reality Tracking ElectroGIG Electromechanical 3D

Synthevision NHK Electromechanical 2D

(Hi-Vision

background)

Vapour

http://www.discreet.com Discreet Logic Electromechanical 3D

VideoScape Vinten Optical, electromechanical Ganged cameras

Virtual Scenario Radamec Electromechanical 2D

Virtual Theater

http://www.softimage.com Softimage Electromechanical 3D

.

Page 17: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

them side by side), we’ll simply list some of theirmore useful features:

❚ support for multiple cameras

❚ preview of “off-air” cameras using less expen-sive rendering hardware

❚ interfaces for several tracking systems

❚ acquisition of lens calibration data

❚ simple camera alignment procedures

❚ multiple model import formats

❚ high-resolution textures

❚ real-time antialiasing and texture mapping

❚ real-time effects

❚ export of tracking data for offline rendering

The preview capability refers to a multicameraconfiguration where low-end hardware rendersthe background for off-air cameras and a high-endsystem runs with the on-air camera. This decreas-es the cost of a multicamera system and is fairlyeasy to implement (though switching must beproperly timed to lessen the danger of the newcamera being keyed with the old background fora frame or two after a cut). When setting up anelectromechanical tracking system, alignmentinvolves locating the camera in the virtual set’sworld coordinates (or the blue room’s coordinatesystem). Any automatic or semi-automatic proce-dure (such as placing a target of known dimen-sions at a reference point in the blue room) willsave much time during shooting.

DiscussionPerhaps because of virtual studios’ novelty,

many people either overlook or do not clearlystate their limitations. We therefore discuss someconcerns we have for the current generation, atleast, of virtual studio systems.

First, production personnel must be aware ofthe constraints virtual studio technology impos-es. The most obvious is the need to shoot in a blueroom. Blue rooms generally have bad acoustics(from the bare walls and curved corners), andactors find it difficult to position themselves orinteract with invisible objects. Also, blue room

lighting is difficult, and once the lighting is set,changing it (perhaps for some dramatic effect)poses yet another challenge.

Another constraint stems from virtual set com-plexity. The rendering system limits such thingsas polygon counts, texture sizes, and lightingeffects. The set designer, and any other peopleinvolved with constructing the virtual set, mustknow the limitations of the rendering system.

Second, virtual studio systems demand newskills for successful operation. Camera operatorsmust understand the tracking system’s capabili-ties and be comfortable with alignment proce-dures. Actors need training in blue roomtechniques, and directors must be aware of theoverall complexity of operation and the func-tioning of the various subsystems for rendering,tracking, and compositing.

Third, start-up costs, rapid obsolescence, andintegration with existing studio equipment makeit difficult to decide when best to acquire virtualstudio technology. This is especially true in lightof the broadcast industry’s current restructuring.

Virtual studio systems have many advantages,however. Many users note economic benefits suchas reduced storage costs (virtual sets do not takeup much physical space), reduced constructioncosts (fewer expensive props or materials, easy toset up and dismantle), reduced studio costs (oneblue room can be “configured” for different pro-ductions simply by switching virtual sets), andreduced design costs (easy to change a wall’s coloror floor’s texture).

Virtual studios also open up new creative pos-sibilities in set design and visual effects; they mayreduce production time and allow greater experi-mentation. For instance, since virtual studiosoperate in real time, those involved in a produc-tion can see results immediately without waitingthrough possibly extensive postproduction delays.Virtual studios may also lead to new forms ofaudience interaction and participation: Becausewe can now place actors into a virtual environ-ment in real time, viewers also may soon be ableto enter the virtual world.

Virtual studio systems will continue improvingtechnically as rendering and tracking systemsbecome more advanced. More flexible composit-ing is also likely. Exactly how real studios willchange to accommodate the virtual studioremains unknown, but this technology promisesto play a significant role in the future of broad-casting production. MM

34

IEEE

Mul

tiM

edia

.

Page 18: Background Virtual Studios: An Overviewivizlab.sfu.ca › arya › Papers › IEEE › Multimedia › 1998 › Jan › Virtual … · SGI Onyx or switch camera 3 Sirius 3 track 3

AcknowledgmentsWe would like to thank our colleagues in the

Visualization and Media Systems Design groupand members of Horz & Schnepf Co. for theirassistance in developing the virtual studio projectat GMD.

References1. S. Shimoda, M. Hayashi, and Y. Kanatsugu, “New

Chromakey Imaging Technique with Hi-Vision Back-

ground,” IEEE Trans. on Broadcasting, Vol. 35, No. 4,

Dec. 1989, pp. 357-361.

2. “BBC News Dives into Vertigo Reality,” Int’l Broad-

casting, April 1993, p. 9.

3. L. Blondé, “The Mona Lisa Project,” Proc. European

Workshop on Combining Real and Synthetic Image

Processing for Broadcast and Video Production,

Springer Verlag, Berlin, 1994, pp. 1-9.

4. L. Blondé et al.,”A Virtual Studio for Live Broadcast-

ing: The Mona Lisa Project,” IEEE MultiMedia, Vol. 3,

No. 2 , Summer 1996, pp. 18-29.

5. D. Hughes, “Virtual Studio—Ultimatte 8,” IRT Symp.

on Virtual Studio Technique, IRT, Munich, 1996.

6. Milo Technical Specifications, Mark Roberts Motion

Control Ltd., 1996.

7. R. Tsai, “A Versatile Camera Calibration Technique

for High-Accuracy 3D Machine Vision Metrology

Using Off-the-Shelf TV Cameras and Lenses,” IEEE J.

Robotics and Automation, Vol. RA-3, No. 4, Aug.

1987, pp. 323-344.

8. P. Wiβkirchen, K. Kansy, and G. Schmittgen, “Inte-

grating Graphics Into Video Image-Based Camera

Tracking and Filtering,” Proc. 3rd Eurographics Work-

shop on Virtual Environments, Springer Computer Sci-

ence, Berlin, 1996, pp. 74-84.

9. W. Schmidt, “Real-Time Mixing of Live Action and

Synthetic Backgrounds Based on Depth Values,”

Proc. European Workshop on Combining Real and Syn-

thetic Image Processing for Broadcast and Video Pro-

duction, Springer Verlag, Berlin, 1994, pp. 23-31.

Simon Gibbs is a staff engineer

with Sony Distributed Systems

Laboratory in Santa Clara, Califor-

nia. Before joining Sony he was a

senior scientist at GMD (the Ger-

man National Research Center for

Information Technology), where he was responsible for

the Digital Media Lab within the Institute for Media

Communication. Gibbs earned a PhD in computer sci-

ence from the University of Toronto in 1983. Research

interests include office information systems, computer-

supported collaborative work (CSCW), object-oriented

programming, and multimedia systems.

Constantin Arapis obtained an

MS and a PhD in computer science

from the University of Geneva in

1986 and 1992, respectively. From

1994 to 1996 he was research sci-

entist at GMD (the German

National Research Center for Information Technology).

Since September 1996 he has been Maitre d’enseigne-

ment et de Recherche at the University of Geneva. Cur-

rent research interests include tele-education

applications, teleconferencing systems, and Internet

multimedia applications.

Christian Breiteneder is an asso-

ciate professor at the Institute of

Applied Computer Science and

Information Systems at the Uni-

versity of Vienna, Austria. He

received a Dipl. Eng. in computer

science from Johannes Kepler University, Linz, Austria,

1978 and a PhD in computer science from the Universi-

ty of Technology, Vienna, in 1991. His current interests

include multimedia information systems, media pro-

cessing systems, and telepresence systems. Breiteneder

is a member of ACM, IEEE, and IEEE Computer Society.

Vali Lalioti is a research scientist

with the Visualization and Media

Systems Design group at GMD,

Germany. She received her BS and

MS from the University of Crete,

Greece, and a PhD in computation

from the University of Manchester Institute of Science

and Technology, UK. Her main research interests

include virtual studios and collaborative virtual reality

environments.

Contact Gibbs at Sony Distributed Systems Laborato-

ry, 2350 Mission College Blvd., Santa Clara, CA 95054,

email [email protected].

35

January–M

arch 1998

.