Improved Deep Image Compositing Using Subpixel Masksresearch.dreamworks.com/papers/Improved_Deep... · Improved Deep Image Compositing Using Subpixel Masks Jonathan Egstad * Mark

Improved Deep Image Compositing Using Subpixel Masks

Jonathan Egstad * Mark Davis † Dylan Lacewell ‡

DWA Technical Memo 2015-348, copyright DreamWorks Animation

Abstract

We present an improved method of producing and manipulatingdeep pixel data which retains important surface informationcalculated during the execution of the rendering algorithm forlater use during compositing, allowing operations normallyperformed in the renderer to be deferred until compositing. Theseinclude pixel-coverage calculation, pixel filtering, hard-surfaceblending, and matte object handling. Current methodologies forrepresenting and transmitting deep pixel data work well forcombining volumetric and hard-surface renders but are not verysuccessful at combining hard-surfaces. By retaining additionalsurface information a renderer’s final integration steps can bereconstructed later in compositing.

CR Categories: I.3.3 [Computer Graphics]: Deep Compositing:Compositing;

Keywords: deep compositing, compositing, rendering, subpixelmasks

1 Deep Image Compositing

A typical pixel produced by a cg renderer is normally a series ofseparate surface shading calculations combined, or merged,together. Most renderers sample surfaces at a higher rate than theoutput pixel resolution in order to reduce aliasing and integratesamples, possibly from adjoining pixels, through a pixel filter toproduce a smoother final result.At each subpixel location the renderer evaluates and shadesoverlapping surfaces and flattens the result before integrating thatsubpixel’s result with the surrounding ones. The final result istermed a flat render since the result is a flat 2D image. Surfacesare typically shaded front to back so that surfaces hidden behindan opaque one are ignored, saving render time. Unfortunatelyhidden surfaces are potentially useful during certain postprocessing operations.

*email: [email protected]†email: [email protected]‡email: [email protected]

To generate a deep image, the renderer outputs shaded surfacefragments as a series of deep samples contained within a deeppixel, without flattening, optionally retaining some hiddensurfaces. The final flattening step is performed at the very end ofthe post processes which is typically in a compositing package.Deferring flattening can avoid costly re-renders in production andone common use is rendering volumetric and hard-surfaceelements in separate render passes or completely different renderpackages, and combining them later with accuratepartially-transparent depth intersections.

2 Deep Workflow Challenges

The current industry standard workflow for rendering andhandling deep images was outlined in [Hillman 2013], andimplemented in the OpenEXR library starting in version 2.0[Kainz 2013]. The manipulation of deep data for compositing isoften performed in The Foundry’s Nuke compositing packagewhich provides a specialized tool set for manipulating deep datawhich conform to the OpenEXR 2.0 recommendations.In this workflow each deep sample contains at least one Z-depthvalue defining the sample’s distance from camera, or two Z values(termed Zfront and Zback) which define the depth range thesample covers. A depth range of 0 indicates a hard surface while>0 indicates a homogeneous volume segment. The color encodedinto such a volumetric sample is the color at Zback withlogarithmic interpolation being used to determine a color value inbetween Zfront and Zback.

While this workflow works well for combining volumetric andhard-surface samples or combining multiple volumetric samples,it does not work so well when combining hard-surface samples.This is primarily due to: a) lack of subpixel spatial information, b)no ability to pixel-filter deep samples, and c) only logarithmicinterpolation of samples is supported.Lacking additional surface information there is no way todetermine x/y correlation between samples so it’s impossible tocorrectly weight their respective contribution to the final flattenedpixel. One way around this is to multiply (pre-weight) the samplecolor & alpha by its pixel contribution, or coverage, but that onlyworks when the samples are kept isolated, and since flattening isperformed on the depth-sorted samples front-to-back bysuccessive under [Porter and Duff 1984] operations the weightingmust be non-uniform to compensate for the decreasingcontribution of each successive sample. When these pre-weightedsamples are interleaved with other samples during a deep mergeoperation there’s no guarantee the correct sample weighting willstill exist leading to visual artifacts.

Another common issue is the need to handle the merging ofmutually-cutout objects while accurately applying filter effectslike camera defocus (bokeh) in preparation for normal, flatcompositing. Mutual cutouts occur when two or more objects are

in close proximity, often overlapping or intersecting, but need tobe rendered separately from each other. Rendering the objects asdeep images can defer overlap/intersect resolution to the deepmerging and flattening steps and allow operations like defocus towork accurately as the algorithm has the information to resolvedepth intersections and reveal hidden surfaces. However keepingall rendered elements as deep data to defer the cutout issue isoften impractical due to the high memory and cpu cost ofinteractively compositing deep data and leads to a loss incompositing control since many common comp operations cannotbe performed on deep data. To be most flexible in production wegenerally still want to composite with flat 2D images but havethem pre-processed with proper cutouts, defocusing and ready tomerge.

3 Subpixel Masks

To illustrate the subpixel spatial problem let's take pixel filtering.No renderer can perform pixel filtering on output deep samplesand without it there's a perceptual difference between a flat renderand a deep render after flattening. Since pixel filtering mustalways be performed last we need a method of retaining andtransmitting the subpixel surface fragment information through allpost deep operations until flattening is finally performed.Outputting all subpixel surface fragments as deep samplesrequires a tremendous amount of memory and disk space whichare already stressed by existing deep compositing workflows. Andeven if we did we would still be missing their x/y locations withinthe pixel and thus unable to determine their relative distanceswithin the pixel filter disc. We could store the subpixel x/ycoordinate in an additional deep data channel, but we are stilloutputting all the surface fragments and they still need to becoverage pre-weighted. A brute-force solution is to scale up therender resolution to match the subpixel rate and only sample onceper-pixel producing subpixel deep samples that have implicit x/yalignment. Unfortunately the increased image resolution wouldneed to be retained though all deep compositing ops untilflattening / pixel-filtering / down-sampling is performed.

A better solution to retaining the subpixel spatial information andreduce deep sample count is by combining (collapsing) subpixelsurface fragments together while simultaneously building up asubpixel bitmask which is interpreted as a 2D array of bits. Thisbitmask provides x/y correlation and pixel coverage informationin a minimum of additional per-sample storage - see section 4 fordetails on collapsing. The bitmask size should be at least 8x8(64bits) to adequately capture high-frequency details like fur andhair. Larger masks could be used but their storage needs becomeprohibitive and supporting variable-sized masks severelycomplicates sample management. A 4x4 mask was first testedsince it fits nicely into 32-bits with bits left over for flag use butwas determined to be too low a resolution for production use.Deep pixel flattening is performed at each subpixel mask bit byfinding all deep samples that have that bit enabled, depth-sortingthem and merging front-to-back while handling sample overlaps.The subpixel's flattened result is integrated with other subpixelresults in a pixel filter to produce the final result. This producesmore accurate results from overlapping and common-edgesurfaces since the depth ordering of the deep samples at eachsubpixel location is handled uniquely. It also reduces aliasingalong surface edges and eliminates the incorrect color mixing ofoverlapping opaque samples (Figures 1a, 1b, 1c.) As mentioned in

section 1 it is important to keep surface opacity and pixelcoverage separated so that interleaving uncorrelated deep samplestogether do not produce weighting artifacts upon flattening. Thesurface color is still premultiplied by the surface opacity but is notpremultiplied by coverage which is captured in the subpixel maskpattern (Figure 2.)The final result from this flattening and pixel filtering is notexactly the same as the result from the renderer’s filter (forexample jittered subpixel locations have been lost,) but it issignificantly better than no filtering at all.

However while subpixel masks solve aliasing issues at the edgesof surfaces there will still be aliasing when flattening uncorrelateddeep samples since there are often no surface edges at those pixelsand the subpixel masks are saturated (all bits on.) This happenswhen separate hard-surface renders are deep merged together andthe renderer has collapsed the subpixel surface fragments into asingle deep sample with a saturated subpixel mask. For exampletwo walls at right angles to each other and intersecting but eachwall is rendered separately and deep merged. Since all subpixelbits in the mask share the same Z depth value the slope of onesurface relative to another cannot be determined at a subpixellevel. To anti-alias these types of hard-surface intersections wealso need the Z-depth range of the subpixel fragments before theywere collapsed - see section 5 for more details on how this ishandled.

While adding a 64-bit mask to each deep sample may at first seemvery expensive, in practice this is mitigated by the collapsingtogether of surface fragments. Each deep sample has a relativelyhigh memory cost and typically takes six 32-bit floats minimum(R, G, B, A, Zfront, Zback) totaling 24 bytes while the mask addsanother 8 bytes, so dropping every other sample will save 16bytes per sample. When written to an OpenEXR file with losslesscompression the mask data channels will usually highly compresssince the mask pattern typically only changes along the edges ofsurfaces. Most hard-surface renders will have little to no changein the pattern along a scanline. Furry/hairy objects are an obviousexception to this but they still enjoy a sample count reductionfrom the collapsing of subsamples - see section 4 and Appendix Bfor details.

For objects that don’t readily produce subpixel information suchas volumes the subpixel mask can be completely omitted. A maskvalue of 0x0 is handled as a special case and indicates fullcoverage, since a sample with zero coverage would not be outputin the first place. This means that legacy renders which do notprovide subpixel masks will be interpreted as having full coverageduring flattening.

Figure 1a: Relationship of the front and back surfaces for figures1b and 1c. In both figures the surfaces are separated in Z andhave one edge that lines up as viewed from camera. In figure 1cthe back sample is completely hidden by the front sample.

Figure 1b: Common-edge surface compositing results between arenderer subpixel algorithm, OpenEXR 2.0 method, and ourmodified method using subpixel masks. Note the final colorresulting from the subpixel mask method closely matches therenderer’s result.

Figure 1c: Overlapping surface compositing results between arenderer subpixel algorithm, OpenEXR 2.0 method, and ourmodified method using subpixel masks. Note the final colorresulting from the subpixel mask method closely matches therenderer’s result.

Figure 2: Comparison of flattening results. In (a) note thepresence of seams along common-edge surfaces due to thepremultiplication of samples by coverage in the renderer. In (b)no coverage-premultiplication resolves the seams but results in aloss of anti-aliasing. In (c) using subpixel masks to providecoverage restores anti-aliasing with no seams.

Figure 3: Comparison of pixel filtering kernels of varying sizes.The blackman-sinc and box filter implementations of the flattenermatch the production renderer implementation so the result is avery close match to the renderer's pixel filter output.

4 Sample Collapsing

As discussed the collapsing of surface fragments together whileproducing subpixel masks can significantly reduce the resultingdeep sample count, but what criteria can be used to associatesurface fragments together? In our experience there is no onecorrect way and a number of factors can be taken into account. Here are a few:

Primitive and/or surface ID - do the fragmentscome from the same geometric primitive orsurface?

Surface normal - do the fragments share the samegeneral orientation?

Z-depth - are the fragments close together in Z? Brightness - are the fragments generally the same

brightness?Some or all of these factors can be considered when collapsingfragments. However care must be taken with respect to subpixelbrightness changes (or any high-contrast color change) as a loss ofhigh-frequency detail can result if the contrast changes arepre-filtered out during fragment collapsing and are no longerreconstruct-able during flattening/pixel-filtering. In thoseinstances it’s better to generate multiple deep samples with correctsubpixel masks to capture the extremes of the contrast change. Agood example of this would be a furry character with glinting furhighlights.

The implementation of this scheme will be unique to everyrenderer but the basic operation of combining fragments involvesaveraging their color channels together and finding the min/maxdepth range for them all. Each fragment x/y subpixel location andshape is used to build up the final subpixel mask by enabling thecorresponding bit(s) in the mask. Since the sample rate of therenderer can be higher or lower than 8x8 more than one bit mayneed to be enabled so it's important that the sampling locations arestratified to guarantee at least one falls inside each bit bin,otherwise holes can occur in the mask producing visible cracks inthe flattened result.

At this point the final deep sample with the extra subpixel maskdata channels is written into the OpenEXR deep file - see section6 for more info.

For example in a REYES-style renderer, one or moremicropolygons will intersect a given pixel but only cover a subsetof the subpixels. Each micropolygon could be inserted into thedeep pixel as a discrete sample with its corresponding subpixelcoverage mask. However in the presence of motion blur and/orhighly tessellated geometry there could be as many samples asthere are subpixels. In order to mitigate this our renderer buildsclusters of micropolygon samples with the same geometry andmaterial ID and collapses them into a single sample. The samplecolor is derived from the average color of each micropolygon inthe cluster weighted by their respective subpixel contribution. TheZfront and Zback of the sample are derived from min/max Z of allthe micropolygons in the cluster.

Figure 6 illustrates two common scenarios for sample collapsing.The smooth-surfaced teapot will have a high sample reductionrate while the complex overlapping surfaces of the fur-coveredteapot have a much lower rate. When OpenEXR file sizes arecompared for renders with and without sample collapsing therenders with sample collapsing will typically show a reduction infile size since the byte cost of each removed sample is usuallygreater than the byte cost of the additional subpixel mask data,and the mask data tends to compress better. See Appendix B fordetails.

5 Surface Flags and Sample Interpolation

Deep merging of separate hard-surface renders will often producealiasing along any surface intersections due to the lack of surfaceslope information at the intersection points. Subpixel masks willnot help here as both surfaces likely have full pixel coverage atthese locations. The normal of the surface is of limited value asorientation alone does not provide enough information withoutcorresponding depth information. What is needed is the Z-depthrange that the surface covers in the deep pixel so that the visibleportion of each surface’s contribution can be linearly weightedrelative to the other surface's contribution resulting in ananti-aliased intersection (Figure 4.) This Z-blending effect can bevisually significant at the edges of rounded objects where theangle of the surface to camera is most oblique and in regionswhere the slopes of intersecting surfaces are nearly equal.

However actually performing the linear interpolation is achallenge in the current OpenEXR workflow as samples withthickness > 0 are assumed to be volumetric segments and only loginterpolation is supported. We need some way of knowingwhether a deep sample is hard-surface or volumetric and it mustbe stored on the sample so that deep merging hard-surface andvolumetric samples together retains that info through subsequentdeep operations.

To handle this we add a hard-surface flag bit to each deep sampleand store the bit in a 16-bit half-float flag channel as an integervalue. If the flag is on (1.0) the flattener performs linearinterpolation of the sample’s depth range and if it’s off (0.0) itperforms the normal log interpolation for a volumetric sample.The flattening algorithm must carefully handle overlaps ofdiffering surface types when combining sample segments (Figure5.)

This scheme is backward-compatible with existing volumetricimages written without the hard-surface flag since the flagchannel will be filled with zeros when deep pixels are mergedtogether during compositing. We have modified our renderer toset this flag when outputting deep samples as it is aware of thesurface type during shading. We also expose controls for the userto set/change this bit on Nuke's deep OpenEXR reader, or byusing a separate custom DeepSurfaceType operator.

Figure 4: Comparing the hard-surface intersections of twoteapots offset and overlapped, with and without linearinterpolation. Note that (a) and (b) appear identical as althoughthe samples in (b) have depth ranges, log interpolation fails whensample alpha is 1.0, a very common case with hard-surfaces.

Another useful custom attribute to store on each sample iswhether or not it is a matte object. Making an object become thematte source for another object is a common operation inproduction rendering. The result is similar to setting the matteobject’s color to black (0) and merging it with other objects so theblack matte object becomes a holdout of the others. Unfortunatelyjust setting the object color to zero does not produce a correctlyheld out alpha, and setting the matte object's alpha to zero simplymakes it a transparent black object producing no holdout at all.Because of this the matte operation is handled as a special case ina renderer when merging the surface samples together andrequires some indicator that a surface is matte, either using ashader setting or geometry attribute. This surface information isnormally only valid inside the renderer and is difficult to pass onto later compositing steps.

The matte flag bit has float value 2.0 and is checked duringflattening (or by any operation that needs to take matte-ness intoaccount) and the matte operation is handled just like in therenderer.One common use case for deferring matte object handling untilflattening is the defocusing of deep pixels. Defocusing deepsamples produces a flattened pixel result so the defocus operationmust perform both the blurring logic and flattening logicsimultaneously. Performing the defocusing after flattening onalready held out images will often have incorrectly blurred edgesalong intersections leading to artifacts when merged. A mattesample that's closer to camera can become blurred and partiallyobscure non-matte object samples with black, and since depthintersections are being handled at that moment the intersectionedges of the objects are accurately blurred.

Figure 5: Flattening steps handling a combination of overlappingvolumetric (s0) and hard-surface (s1, s2) samples.

6 Storage of Subpixel Mask and Flags

An 8x8 subpixel mask requires 64 bits or 8 bytes. Storing a 64-bitunsigned integer value uncompressed in an OpenEXR file orloading it into Nuke's deep system is not possible since both onlysupport 32-bit pixel data channels. The 64-bit mask can be splitinto two 32-bit unsigned integer channels but in practice the maskis split into two 32-bit float channels since applications oftenconvert integers into floats on read, destroying the bitmaskpattern.The bitmask pattern is copied directly into the two floats and thereverse is done in the flattener to reassemble the mask (possibleendian issues are currently ignored.) To avoid the conversion riskthe mask can also be split into four 16-bit integer values andstored in four 32-bit float channels, but that incurs a heavierdisk-space and memory footprint which was deemed undesirable.In practice we have not seen problems with keeping the bitmaskpattern stored in floats except when writing OpenEXR deep filesback out of Nuke as care must be taken to not inadvertently dropthe 32-bit floats down to 16-bit half-floats.This is not such a concern for the flag bits as they are stored asinteger float values (0.0, 1.0, 2.0, 3.0, etc) and will survive integerand half-float conversions.

When writing an OpenEXR deep file there will be the usualRGBA channels plus AOVs, the Z(front) & ZBack channels, andthree new spmask channels storing the mask (spmask.1,spmask.2) and the surface flags (spmask.3.)

This is the channel list from a typical OpenEXR deep file writtenby our renderer with the customized deep data: A, 16bit floatingpoint, sampling 1 1 B, 16bit floatingpoint, sampling 1 1 G, 16bit floatingpoint, sampling 1 1 R, 16bit floatingpoint, sampling 1 1 Z, 32bit floatingpoint, sampling 1 1 ZBack, 32bit floatingpoint, sampling 1 1 spmask.1, 32bit floatingpoint, sampling 1 1 spmask.2, 32bit floatingpoint, sampling 1 1 spmask.3, 16bit floatingpoint, sampling 1 1Note again that spmask.1 and spmask.2 are 32-bit floating-pointrather than unsigned int.

7 Nuke Deep System Changes

Since the current Nuke deep system (as of Nuke-9, 2015) will notsupport these workflow changes some deep nodes needed to bemodified or added:

DeepToImage: Replacement for stock Foundryflattening node implementing all the features describedin this paper

DeepSurfaceType: Set/clear the hard-surface and matteflags

DeepMatte: Sets the matte flag for all samples to markan image as a matte source

DeepPremult: Premults/unpremults the color channelsby the subpixel coverage value

DeepCamDefocus: Performs camera bokeh defocusingof deep data with support for matte flag and subpixelcoverage during flattening (same flattening algorithm asDeepToImage)

DeepGrade: Simplified per-sample color correctionwith no attempt to re-weight samples

exrReaderDeep: Modified to add override controls forcontrolling/setting subpixel mask and surface flags

exrWriterDeep: Modified to always write subpixelmask channels as 32-bit floats

8 Conclusion

We have described a method for reproducing several renderingoperations that are unavailable in current deep image compositingworkflows. The production challenges of the current workflowsare: a) lack of subpixel spatial information to correctly mergeoverlapping or common-edge samples containing pixel coverageweights, b) lack of ability to pixel-filter, c) limiting the definitionof a thick sample segment (> 0 depth range) as volumetric data,and d) lack of formal support for matte objects during mergingand flattening.By extending rather than replacing the current methodology wemaintain backward-compatibility while offering newfunctionality. Adding per-sample subpixel masks provide thisminimum set of advantages:

Improved merging of overlapping surfaces Improved merging of common-edge surfaces Pixel filtering can be performed Reduced sample count

By supporting a hard-surface indicator flag and linearinterpolation of thick samples we can blend the intersections ofthese hard surfaces and reduce aliasing.

Adding support for a matte flag avoids the current destructiveholdout methodology and allows complex filtering effects to beapplied with accurate holdouts.

This workflow is a work in progress and we've identified severalareas that would benefit from future work:

Can OpenEXR and Nuke’s deep pixel representation beextended to store the additional bits for the subpixelmask and surface flags as deep sample metadata ratherthan as raw channel data? This would dramaticallysimplify the management of the data and avoid itsaccidental destruction.

Commercial renderer providers should be encouraged toinclude the subpixel mask and surface flag informationwhen they write deep OpenEXRs.

Storing the xy of the surface normal as 16-bit half-floatsto better define the slope direction of the surface andcombining this with the subpixel mask location to find amore accurate per-subpixel Z-depth intersection.

References

1. Hillman, P. 2013. The Theory of OpenEXR Deep Samples http://www. openexr.com/TheoryDeepPixels.pdf

2. Kainz, F. 2013. Interpreting OpenEXR Deep Pixels http://www. openexr.com/InterpretingDeepPixels.pdf

3. Porter, T., Duff, T. 1984. Compositing Digital Images http://graphics.pixar.com/library/Compositing/paper.pdf

Appendix A: Example Subpixel Mask FloatConversion Code

typedef uint64_t SpMask;

static const size_t SPMASK_WIDTH = 8;static const size_t SPMASK_NUM_BITS = 64;static const SpMask SPMASK_ZERO_COVERAGE = 0x0ull;static const SpMask SPMASK_FULL_COVERAGE = 0xffffffffffffffffull;

union SpMaskFloatUnion { SpMask as_mask; float as_float[2];};

/* Construct an 8x8 subpixel mask from 2 float values */SpMask mask8x8From2Floats(float sp0, float sp1) { SpMaskFloatUnion mask_union; mask_union.as_float[0] = sp0; mask_union.as_float[1] = sp1; return mask_union.as_mask;}

/* Split an 8x8 subpixel mask into 2 floats */void mask8x8To2Floats(const SpMask& spmask, float& sp0, float& sp1) { SpMaskFloatUnion mask_union; mask_union.as_mask = spmask; sp0 = mask_union.as_float[0]; sp1 = mask_union.as_float[1];}

http://www.planesofthehead.com/



Appendix B: File Size Comparison

OpenEXR image file sizes from Figure 6, ZIP1 compression, withand without sample collapsing and subpixel mask creation.Produced with a micropolygon renderer as discussed in section 4.Note the small size of the spmask channels in the no-fur renderdue to high compression of the low-varying mask patterns, andlarger size in the fur render due to the highly-varying patterns.

Channels No Fur Fur

RGBA 1.6 Mb (1591280) 3.2 Mb (3204936)

Table 1: Non-deep (flat) render.

Channels No Fur Fur

RGBA, Z 37.1 Mb (37112439) 85.7 Mb (85740917)

Table 2: Deep render, no sample collapsing or subpixel masks.

Channels No Fur Fur

RGBA, Z contribution 5.3 Mb (5261304) 17.8 Mb (17752422)

ZBack contribution 3.0 Mb (3021930) 7.5 Mb (7510822)

spmask contribution 0.14 Mb (139957) 9.7 Mb (9650972)

Total (RGBA,Z +ZBack + spmask)

8.4 Mb (8423191) 34.9 Mb (34914216)

Table 3: Deep render with sample collapsing and subpixel masks.Contributions are broken out to illustrate the compression rate ofthe spmask channels.

Figure 6: Two teapot renders representing typical cases for sample collapsing. Smooth surfaces have a highsample collapsing rate while complex overlapping surfaces have a lower rate.

Documents

Improved Deep Image Compositing Using Subpixel Masksresearch.dreamworks.com/papers/Improved_Deep... · Improved Deep Image Compositing Using Subpixel Masks Jonathan Egstad * Mark