SURE-based Optimization for Adaptive Sampling and ... · SURE-based Optimization for Adaptive Sampling and Reconstruction ... our approach better ... (such as the motion blur pattern

SURE-based Optimization for Adaptive Sampling and Reconstruction

Tzu-Mao Li Yu-Ting Wu Yung-Yu Chuang

National Taiwan University

Figure 1: Comparisons between greedy error minimization (GEM) [Rousselle et al. 2011] and our SURE-based filtering. With SURE, weare able to use kernels (cross bilateral filters in this case) that are more effective than GEM’s isotropic Gassians. Thus, our approach betteradapts to anisotropic features (such as the motion blur pattern due to the motion of the airplane) and preserves scene details (such as thetextures on the floor and curtains). The kernels of both methods are visualized for comparison.

Abstract

We apply Stein’s Unbiased Risk Estimator (SURE) to adaptive sam-pling and reconstruction to reduce noise in Monte Carlo render-ing. SURE is a general unbiased estimator for mean squared error(MSE) in statistics. With SURE, we are able to estimate error foran arbitrary reconstruction kernel, enabling us to use more effectivekernels rather than being restricted to the symmetric ones used inprevious work. It also allows us to allocate more samples to areaswith higher estimated MSE. Adaptive sampling and reconstructioncan therefore be processed within an optimization framework. Wealso propose an efficient and memory-friendly approach to reducethe impact of noisy geometry features where there is depth of fieldor motion blur. Experiments show that our method produces imageswith less noise and crisper details than previous methods.

CR Categories: I.3.7 [Computer Graphics]: Three-DimensionalGraphics and Realism—RayTracing.

Keywords: Sampling, reconstruction, ray tracing, cross bilateralfilter, Stein’s unbiased risk estimator (SURE).

Links: DL PDF WEB

1 Introduction

Monte Carlo (MC) integration is a common technique for render-ing images with distributed effects such as antialiasing, depth offield, motion blur, and global illumination. It simulates a varietyof sophisticated light transport paths in a unified manner; it esti-mates pixel values by using stochastic point samples in the integraldomain. Despite its generality and simplicity, however, the MCapproach converges slowly. A complex scene with multiple dis-tributed effects usually requires several thousand expensive samplesper pixel to produce a noise-free image.

Adaptive sampling and reconstruction (or filtering, used inter-changeably in the paper) are two effective techniques for reducingnoise. Given a fixed budget of samples, adaptive sampling deter-mines the optimal sample distribution by concentrating more sam-ples on difficult regions. To decide which pixels are worth moreeffort, we require a robust criterion for measuring errors. Accurateestimation of errors is challenging in our application because theground truth is not available. Reconstruction algorithms, in con-trast, properly construct smooth results from the discrete samplesat hand. One key issue that reconstruction must resolve is how toselect the filters for each pixel, as the optimal reconstruction kernelsare usually spatially-varying and anisotropic. Recently, approacheshave been developed to address the challenge of spatially-varyingfilters [Chen et al. 2011; Rousselle et al. 2011], producing betterresults than those that use a single filter across the whole image.However, these methods are limited to symmetric filters and donot work well for scenes with anisotropic features such as high-frequency textures on the floor and curtains in Figure 1.

We here propose an adaptive sampling and reconstruction algorithmto improve the efficiency of Monte Carlo ray tracing. The coreidea is to adopt Stein’s Unbiased Risk Estimator (SURE) [Stein1981], a general unbiased estimator for mean squared error (MSE)

http://doi.acm.org/10.1145/2366145.2366205

http://portal.acm.org/ft_gateway.cfm?id=2366205&type=pdf

http://www.cmlab.csie.ntu.edu.tw/project/sbf/

in statistics, to determine the optimal sample density and per-pixelreconstruction kernels. The advantages of using SURE are twofold.For one, it provides a means by which to measure the qualityof arbitrary reconstruction kernels – not just those that are sym-metric (e.g. isotropic Gaussians used in greedy error minimization(GEM) [Rousselle et al. 2011]). As such, it allows for the use ofmore effective filters such as cross bilateral filters and cross non-local means filters. Another advantage is that the per-pixel errorsestimated by SURE can be used to guide further sample distribu-tion. Thus more samples can be allocated to difficult regions. Inaddition to applying SURE, we propose an efficient and memory-friendly approach to maintain the quality of cross bilateral filteringin the presence of noisy geometric features when rendering depth-of-field or motion blur effects. We propose a normalized distance toalleviate the impact of noisy features, making the proposed methodmore robust to all types of distributed effects. Experiments showthe proposed method provides significant improvements over pre-vious techniques for adaptive sampling and reconstruction.

2 Related work

Adaptive sampling and reconstruction. The seminal work ofMitchell [1987; 1991] over twenty years ago laid the foundationfor methods for adaptive sampling and reconstruction. We cate-gorize these techniques as image space methods, multidimensionalmethods, and adaptive filtering.

Image space methods estimate per-pixel errors with various cri-teria and allocate additional samples to difficult regions. Theseapproaches usually are more general and can be used for vari-ous types of effects. Bala et al. [2003] combine edge informa-tion and sparse point samples to perform edge-aware interpolation.The quality of their results is highly dependent on the accuracy ofedge detection. Overbeck et al. [2009] proposed adaptive waveletrendering (AWR), a general wavelet-based adaptive sampling andreconstruction framework. They distinguish the sources of vari-ances as the coarse level for distributed effects and the fine levelfor edges. These two types of noise are addressed separately bysampling the hierarchical wavelet coefficients adaptively; smoothimages are reconstructed by thresholding wavelet coefficients. Re-cently, several approaches [Chen et al. 2011; Rousselle et al. 2011]have been proposed to smooth out noise using multi-scale filters.Chen et al. [2011] focus on depth of field and describe a criterion toselect spatially-varying Gaussian filters from a predefined filterbankbased on the depth map. Rousselle et al. [2011] also use Gaussianfilters to form a filterbank. Although their method uses an errorminimization framework for adaptive sampling and reconstruction,and is more general, the framework can be used only for symmetricfilters. We apply SURE to estimate MSE for more general filtersand allow the use of more effective filters, which yield significantimprovements.

Other approaches perform adaptive sampling and reconstruction ina multidimensional space. Hachisuka et al. [2008] distribute moresamples to discontinuities in the high-dimensional space and recon-struct them anisotropically using structure tensors. Their approachachieves good quality but becomes less efficient as the number ofdimensions increases. Approaches have also been developed thatare based on transform domain analysis, focusing on specific dis-tributed effects such as depth of field [Soler et al. 2009], motionblur [Egan et al. 2009], and soft shadows [Egan et al. 2011]. Re-cently Lehitinen et al. proposed a novel method for reconstructingtemporal light-field samples [2011], and later extended it to handlethe indirect light field for global illumination [2012]. By reproject-ing samples along the sample trajectory in the multidimensionalspace, expensive samples can be reused. In general, to avoid thecurse of dimensionality, multidimensional approaches usually fo-

cus on only one or two specific effects. These methods are ableto generate better results for the effects they focus on because theanisotropy of the integrand is taken into account. Our method, how-ever, is more general, efficient, and memory-friendly.

Some methods focus on reconstruction only. Xu et al. [2005]proposed smoothing out Monte Carlo noise with a modified bi-lateral filter with smoothed range values. Segovia et al. [2006],Dammertz et al. [2010], and Bauszat et al. [2011] focus on interac-tive global illumination. They exploit geometric properties such asdepths and surface normals to identify edges. Shirley et al. [2011]use the depth buffer to help filtering, but target defocus and motionblur. Sen and Darabi [2012] proposed a general adaptive filteringapproach based on information theory. By identifying the depen-dencies between random parameters on sample colors and scenefeatures, they reduce the importance of sample values influencedby MC noise.

Denoising using SURE. Stein [1981] proposed an MSE estimatorcalled Stein’s Unbiased Risk Estimator (SURE) for estimators onsamples with normal distributions. Donoho and Johnstone [1995]incorporate the estimator into a Wavelet shrinkage algorithm toreconstruct functions from noisy inputs. Recently, SURE has re-ceived much acclaim from the image denoising community and hasbeen widely used to optimize denoising parameters [Blu and Luisier2007; Van De Ville and Kocher 2009].

3 Stein’s Unbiased Risk Estimator (SURE)

Monte Carlo ray tracing estimates true pixel colors by randomlysampling the integral domain and reconstructing from samples. Theunbiased Monte Carlo rendering techniques have a stochastic errorbound which can be estimated using the variance of the estimator.According to the central limit theorem, if Y is the pixel color esti-mated by an unbiased Monte Carlo renderer and x is the true color,as the number of samples n approaches infinity, Y ’s distribution ap-proximates a normal distribution with mean x and variance σ2/n:

Yd→ N

(x,σ2

n

), (1)

where σ2 is the variance of the Monte Carlo samples. Previous in-vestigations have demonstrated that, for a finite number of samples,this relationship is still a good approximation [Tamstorf and Jensen1997; Fiorio 2004; Hachisuka et al. 2010].

Since a Monte Carlo renderer is an estimator, it is often necessaryto estimate its accuracy for applications such as adaptive sampling.Stein’s Unbiased Risk Estimator (SURE) offers a means for es-timating the accuracy of a given estimator [Stein 1981; Blu andLuisier 2007]. SURE states that, if y is a measurement on x witha normal distribution N(x, σ2

y), and F is a weakly differentiablefunction, then the following estimation of error1,

SURE(F (y)) = ‖F (y)− y‖2 + 2σ2ydF (y)

dy− σ2

y, (2)

is an unbiased estimator of the mean square error (MSE) of F(y),that is,

E[SURE(F (y))] = ‖F (y)− x‖2. (3)

1We followed Blu and Luisier’s SURE formulation [2007], which wasderived from the original SURE [Stein 1981]; their equivalence has beenshown. Since we apply SURE to estimate MSE errors for each color channelindependently, the filter kernel F is a scalar function. Thus, the dimension is1 and the divergence in the original SURE formulation becomes a derivative.

Equations 2 and 3 indicate that if we can compute σy anddF (y)/dy, we can estimate the error of an estimator F withoutknowing the true underlying value x.

Our goal is to use the above formula to estimate the reconstructionerror of an arbitrary kernel F . As mentioned above, rendering sam-ples y follow a normal distribution. Thus, SURE can be used as theestimator for MSE of any filter F if we can compute dF (y)/dy forthe reconstruction kernel (σy can be directly estimated from MonteCarlo samples). Section 4 describes details of the algorithm, in-cluding the definition of F , the derivation of dF (y)/dy, and howto use SURE for optimal filter selection and adaptive sampling.

It is worth noting that Rousselle et al. [2011] attempted to estimateMSE for the same application. They decomposed the error intovariance and bias and exploited the relation between the biases ofthe two filters to estimate the error. However, they used a quadraticapproximation which is valid only for symmetric filter kernels, thuslimiting the effectiveness of their approach. In contrast, our SURE-based estimation works for arbitrary reconstruction kernels and pro-vides more flexibility to the choice of kernels.

4 Method

Figure 2 demonstrates the flowchart of the proposed method. Ourmethod starts by rendering a small number of initial samples foreach pixel and then iterates between filter selection and adaptivesampling stages until reaching the sample budget. After each sam-pling phase, we reconstruct a set of images using a filterbank. Eachpixel is filtered multiple times with all candidate filters in the fil-terbank and the error of each filtered pixel color is estimated bycomputing SURE. For each pixel, the filtered color with the leastSURE error is chosen and filled into the reconstructed image. Ifmore sample budget is available, the adaptive sampling stage is in-voked and a batch of new samples is distributed to pixels with largerSURE errors. Details are in the following subsections.

4.1 Initial samples

At the beginning, a small number of initial samples (usually 8 or16 samples per pixel) are taken to explore the scene. The samplesare generated by low discrepancy sequences and distributed evenlyto each pixel. After rendering with these samples, our method per-forms the first reconstruction phase with the gathered information.

4.2 Filter selection using SURE

The main advantage of our method over greedy error minimiza-tion [Rousselle et al. 2011] is its ability to use arbitrary filters.We have experimented with three different filters: isotropic Gaus-sian, cross bilateral, and a modified non-local means filter [Buadeset al. 2005] with additional scene feature information (we call thisa “cross non-local means filter”; see Section 5.4). Here, we use thecross bilateral filter as an example since it offers the best compro-mise between performance and quality amongst these filters. Algo-rithms for different filters are the same except that different filtersare used for constructing the filterbank; the derivatives in SURE aredifferent also.

Cross bilateral filters. Cross bilateral filters have been shown ef-fective for removing Monte Carlo noise [Dammertz et al. 2010; Senand Darabi 2012]. Similar to previous work, auxiliary data includ-ing surface normals, depths, and texture colors are collected bycaching information after tracing rays. We compute the per-pixelmean and variance of each feature and store them as feature vec-tors for cross bilateral filters. For the cross bilateral filter, the filter

Figure 2: An overview of our algorithm, which alternates be-tween sampling and reconstruction until reaching the sample bud-get. During sampling, a set of samples collects colors, normals,textures and depths over the image plane. They are used as the sideinformation for filters in the reconstruction stage, during which, foreach pixel, a set of filters are performed and the filtered value withthe minimal SURE value is filled into the reconstructed image. Inaddition, the minimal SURE value of each pixel is recorded to guideadaptive sampling. If there are sample budgets left, more samplesare shot for pixels with larger SURE errors.

weight wij between a pixel i and its neighbor j is defined as

exp(−‖pi − pj‖2

2σs2) exp(−‖ci − cj‖

2

2σr2)

m∏k=1

exp(−D(fik, fjk)2

2σfk2 ),

(4)where fik is the sample mean of the k-th feature for the pixel i;σs, σr , and σfk are the standard deviation parameters of the spa-tial, range (sample color), and feature terms respectively. D is adistance function used to address noisy scene features for depth offield and motion blur. It will be discussed later in this section. Thefiltered pixel color ci of the pixel i is computed as the weightedcombination of the colors cj of all neighboring pixels j:

ci =

∑nj=1 wijcj∑nj=1 wij

. (5)

Note that the cross bilateral kernels are spatially-varying due to therange and feature terms. Figure 1 shows examples.

In our current implementation, the filterbank is composed of crossbilateral filters with different σs, corresponding to different spatialscales. Other parameters σr and σfk are fixed and their values arediscussed in Section 5.

(a) SIBENIK (b) Depth (c) Depth variance

(d) L2 distance (e) Normalized distance (f) Reference

Figure 3: Comparisons of filtering the SIBENIK scene with depth-of-field effects using L2 distance and our normalized distance. Thearea with strong depth of field has noisy geometry information, pre-venting us from filtering when using L2 distance between samplemeans. By incorporating sample variances, normalized distanceallows us to filter these areas even given noisy geometry informa-tion. Note that the large depth variances in (c) allow us to filterareas around the pillars, removing the artifacts exhibited in the re-sult with L2 distance (d).

Depth of field and motion blur. Sen and Darabi [2012] point outthat when rendering depth of field and motion blur effects, the geo-metric features (surface normal and depth) can be noisy due to MCnoise. In these situations, as the weighting function is not accurate,using the features dogmatically for filtering can fail to remove thenoise. They resolve this problem by computing the functional de-pendency of the MC random parameters and the scene features andusing it to reduce the weight of samples if their features are highlydependent on the random parameters. Although their method suc-cessfully handles depth of field and motion blur, it operates at thesample level. Thus, performance and memory consumption be-come issues since computing pairwise mutual information betweensamples and parameters is not only time-consuming but also re-quires considerable memory for storing samples.

We propose a more efficient and memory-friendly approach to pre-vent cross bilateral filters from being affected by noisy scene fea-tures. Each pixel has a set of samples for feature k. Given twopixels i and j, to measure their distance with regard to the feature,the naive metric would be the distance between the sample means,fik and fjk. This, however, completely ignores sample variances.If we model samples as Gaussians, the distance between two sam-ple sets should be normalized by their variances. Thus we definethe normalized distance as

D(fik, fjk) =

√‖fik − fjk‖2

σ2ik + σ2

jk

, (6)

where σ2ik and σ2

jk are sample variances of the k-th feature of pix-els i and j respectively. Intuitively, for a pixel with strong depthof field and motion blur, its samples tends to have a large variancesince these samples usually span over a large region in the spatial-temporal domain. Thus, it tends to have smaller distances and largerweights even when the geometric features are far apart. In the ex-treme case that two feature sets are inseparable due to strong depthof field or fast motion, the cross bilateral filter reduces to a Gaus-sian filter and does not use the unreliable geometric features. This

(a) sibenik gargoyle (b) Scale selection map

(c) Global bilateral (d) Our (e) ReferenceMSE: 0.011718 MSE: 0.002148

Figure 4: Visualization of the scale selection map for σs of ourmethod. We have also compared our approach to a global cross bi-lateral filter (which uses the same scale parameter, the largest σs inthe filterbank, for all pixels). It is clear that the global cross bilat-eral filter produces large bias in the shadow areas. Our approachadapts better and uses fewer samples in these areas, thus leading toa smaller error. The sampling rate of the noisy input image is about32 samples per pixel.

approach allows us to evaluate feature importance at the pixel leveland store only the sample mean and variance of features per pixel.Figure 3 shows the effect of the proposed distance metric.

Computing SURE and selecting the per-pixel optimal filter. Foreach pixel, we need to use the minimal SURE error to determinethe optimal scale for the cross bilateral filters in the filterbank. Asmentioned in Section 3, in calculating SURE, we need to computedF (ci)/dci for the cross bilateral filter F defined in Equation 5.We have obtained its analytic form as

dF (ci)

dci=

1∑nj=1 wij

+1

σ2r

(F 2(ci)− F (ci)2), (7)

where

F 2(ci) =

∑nj=1 wijcj

2∑nj=1 wij

. (8)

The derivation is in Appendix A. We then compute SURE to esti-mate the MSE for each filter in the filterbank using Equation 2. Foreach pixel, the filter with the least SURE error is selected and itsfiltered color is used to update the pixel.

We have observed that computing SURE using MC samples usu-ally leads to noisy filter selection and thus yields noisy results. Thisis because SURE is an unbiased estimator of MSE and has its ownvariances. To reduce variances, one can either add more samples orperform filtering. For the sake of efficiency, we opted to performfiltering to reduce the variances of SURE. To be more concrete,we prefilter the estimated MSE image using a cross bilateral fil-ter with a fixed parameter before SURE optimization. A similarproblem was encountered in the previous method [Rousselle et al.2011]; they smoothed out the selected scales of filters to deal withthe variance of their estimator.

Sampling density Reconstructed image

Figure 5: Visualizations for the sampling density of our approach.

Figure 4 shows the scale selection map and compares our SURE-based filtering with a global cross bilateral filter (with the samescale for each pixel). It is obvious that spatially-varying scale se-lection yields better results both visually and quantitatively.

4.3 Adaptive sampling

The MSE estimated using SURE can be taken as feedback to therenderer; the sampling density should be proportional to the es-timated MSE. However, since our MSE estimation is not perfect(note that Equation 2 can be negative), a heuristic variance termis included to ensure that regions with higher variances are allo-cated more samples. In addition, to guide more samples to darkerareas, we scale our sampling function with the squared luminanceof the filtered color. This strategy was also adopted by a previousapproach [Overbeck et al. 2009] because human eyes are more sen-sitive to error in dark regions. As a result, the sampling function fora pixel i is determined by

S(i) =SURE(F (ci)) + σ2

i

I(F (ci))2 + ε, (9)

where σ2i is the variance of samples within the pixel, I(F (ci))

2 isthe squared luminance of the filtered pixel color, and ε is a smallnumber used to prevent a null denominator (set to 0.001). If thecurrent sampling budget is m, pixel i receives dmS(i)/

∑j S(j)e

samples. Figure 5 visualizes the sampling density for two exam-ples. It is clear that samples concentrate on areas with geometry ortexture details, discontinuities, or more noise.

5 Results and discussions

We implemented the algorithm on top of the PBRT2 system [Pharrand Humphreys 2010]. All results were generated on a machinewith an Intel dual quad-core Xeon E5420 CPU at 2.5GHz, 32GBof RAM, and using 8 threads. As mentioned, we mainly used crossbilateral filters in the proposed SURE-based framework. In Sec-tion 5.4 we discuss results with other filters.

5.1 Parameter setting

There are a number of parameters for the features in Equation 4.They were set as σfk = 0.4 for normal, σfk = 0.125 for tex-

ture color, and σfk = 0.3 for depth throughout all experiments.We did not use σr in our current implementation since in practicewe found the color term in the cross bilateral filter does not helpmuch. We varied the spatial scale parameter σs to form the filter-bank. We used σs = 1, 2, 4 to construct the filterbank in inter-mediate iterations and σs = 1,

√2, 2, 2

√2, 4, 4

√2, 8 for the final

reconstruction. We used fewer filters for intermediate phases as wefound it sufficient and more efficient. Experiments show this set-ting strikes a good compromise between performance and quality.For the parameters used in prefiltering before SURE computation,we set σs = 8, and the same σfk as mentioned above. In practice,results are not very sensitive to these parameters and a wide rangeof parameters work equally well. Although parameters can be fine-tuned for each scene, this yields only marginal improvements.

5.2 Comparisons

We applied our algorithm on rendering four scenes – SIBENIK(1024x1024), TEAPOT (800x800), SPONZA (1600x1200) andTOWN (800x600) – with a variety of effects, including global il-lumination, motion blur, depth of field, area lighting, and glossyreflection (Figures 6 to 9). We have also compared our method onthese scenes with the following methods:

• MC: Uniform sample distribution and per-pixel box filter.This approach is used as the baseline without adaptive sam-pling and reconstruction.

• GEM: Adaptive sampling and reconstruction using greedy er-ror minimization [Rousselle et al. 2011]. The results wereproduced by the authors’ implementation on the PBRT2 sys-tem. For all scenes, we set the γ parameter in their algorithmto 0.2 as the paper suggested.

• RPF: Adaptive filtering using random parameter filtering [Senand Darabi 2012]. We implemented their approach on thePBRT2 system. The σ2 in their algorithm is set to 0.002 ac-cording to the authors’ suggestion.

The number of samples for each method was carefully adjusted tomake equal-time comparisons. However, since RPF consumes con-siderable memory and time compared to other methods, its num-ber of samples was limited to 8 or 16. For very complex scenes,the time for reconstruction could be negligible if taking samples isvery expensive. To make fair comparisons under such situations,we also include equal-sample comparisons between RPF and ourmethod. Finally, we also compared all methods quantitatively withthe relative MSE proposed by Rousselle et al. [2011]. It is definedas the average of (y− x)2/(x2 + ε), where y is the estimated pixelcolor, x is the pixel color in the reference image, and ε is set to 0.01to prevent division by zero.

Figure 6 compares these algorithms on SIBENIK, a scene withglobal illumination and depth of field. The image produced by MCretains considerable high-frequency noise even in simple areas suchas the floor. GEM eliminates floor noise, while at the same timeoversmoothing the area with textures due to its use of isotropic fil-ters. Note that, although its relative MSE seems good, GEM tendsto yield oversmoothed images. RPF produces a slightly sharper im-age than GEM but it is still oversmoothed, especially where thereare depth-of-field effects. Our approach produces an image withmuch less noise while faithfully preserving textures.

The TEAPOT scene (Figure 7) demonstrates a challenging casewith very high-frequency bump mapping and glossy reflections.None of the four methods preserve the bump map on the floor well.Again, MC produces a very noisy image. It is also worth noting thatRPF fails to reproduce the self-reflection on the teapot. Overall, ourapproach still produces an image that is visually more pleasing andquantitatively more accurate than other methods.

Our MC GEM RPF Our(8spp) Our Reference

SIBENIK 44 spp (140s) 39.86 spp (135s) 8 spp (363s) 8 spp (64.2s) 26.69 spp (140s) 4096spprelative MSE 0.029946 0.002070 0.006103 0.003100 0.001489

Figure 6: A comparison on the SIBENIK scene with global illumination and depth of field. GEM adapts poorly to the texture on floor andproduces oversmoothed results. RPF detects high dependency between u-v parameters and the color, thus filtering the area heavily and alsoproducing oversmoothed results. The RPF image noise is from the sampling approximation of the bilateral filter.

Our MC GEM RPF Our Reference

TEAPOT 35 spp (42s) 23.96 spp (44.3s) 8 spp (374.4s) 8 spp (40.4s) 4096spprelative MSE 0.199485 0.171002 0.233701 0.143123

Figure 7: A scene with a glossy teapot. The floor contains complex texture and bump maps. All methods oversmooth the floor. RPF alsooversmooths the glossy self-reflection of the teapot indicated by the arrow.

The SPONZA scene in Figure 8 contains motion blur effects. Asshown in the first row of insets, the anisotropic pattern produced bymotion of the wing is more vivid in our result than in the others. Inaddition, our approach more faithfully preserves the textures on thefloor and the curtains.

Finally the TOWN scene shown in Figure 9 was designed to testenvironment lighting, area lights, and motion blur. The scene ischallenging also due to the heavy occlusion between the buildingsand skyscrapers. Despite its strong MSE, GEM fails to reconstructall the textures in the scene, which are preserved well in our results.RPF, on the other hand, produces a very noisy image. This couldbe related to the sampling procedure in their bilateral filtering com-putation. Our approach outperforms the others by producing lessnoise and crisper details.

5.3 Discussions

GEM performs adaptive sampling and selects per-pixel filters in anattempt to minimize MSE. From the results, it does achieve lowerrelative errors compared to MC and RPF (and comparable to ourapproach). However, as mentioned, GEM is limited to symmetricfilters and does not adapt well to high-frequency textures and de-

tailed scene features. In all our test scenes, the results producedby GEM exhibit obviously oversmoothed artifacts. In addition, theGEM adaptive sampling criterion tends to send very few rays to theregions where most of the samples carry null radiance (for exam-ple, the right pillar of SPONZA in Figure 8). Our approach signif-icantly alleviates these problems by using cross-bilateral filters andprefiltering MSE before SURE optimization.

RPF adjusts the weights of cross-bilateral filters by using mutualinformation and adapts well to scene features in most cases. It alsoremoves the noise produced by few samples when rendering depth-of-field or motion blur effects. However, its multi-pass reconstruc-tion algorithm can produce slightly oversmoothed results, such asthe texture on the floor in SIBENIK (Figure 6), the disappearedshadows in SPONZA (Figure 8), and the glossy reflection on theteapot in TEAPOT (Figure 7). Another severe limitation of this ap-proach is that the mutual information must be computed at the sam-ple level, making the computation inefficient in both performanceand memory consumption. To render one high-quality image at the1920x1080 full HD resolution with 64 samples per pixel, it takes upto 13 GB to store the samples (108 bytes per sample as described inthe paper). Finally, RPF is designed for reconstruction and does nothave a feedback mechanism to the renderer for adaptive sampling.

MC GEM RPF Our(16spp) Our Reference68spp 63.84spp 16spp 16spp 63.24spp 8192spp890.5s 906.2s 1676.1s 273.3s 896s

0.133096 0.017605 0.031972 0.020549 0.012097

Figure 8: Comparisons on a complex scene SPONZA with globalillumination and motion blur. The image on the top is our result.Insets show that GEM does not preserve details with symmetric fil-ters, while RPF tends to oversmooth the shadows.

Our method does away with the limitations of both GEM and RPF.At one end, we adopt SURE to estimate the error of an arbitraryreconstruction kernel. This allows us to optimize over a discreteset of cross bilateral filters for each pixel and determine the optimalsample distribution. Also, we propose a memory-friendly methodto detect noisy geometric features when rendering depth of field andmotion blur. As a result, our method successfully eliminates MCnoise for a wide range of effects while preserving high-frequencytextures and fine geometry details.

5.4 Other filters

To demonstrate the flexibility of the proposed framework with re-spect to different filters, in addition to cross bilateral filters, wehave also experimented with isotropic Gaussian filters and crossnon-local means filters. For isotropic Gaussians, we compare theresults with GEM [2011] which is specifically designed for opti-mizing over an isotropic Gaussian filterbank. To be fair, we filterthe SURE-estimated MSE using an isotropic Gaussian filter with-out using scene feature information. As shown in Figure 10, resultsof both methods are comparable and the scale selection maps aresimilar. This means that our SURE optimization is comparable tothe specifically-designed GEM for the isotropic case.

The non-local means filter [Buades et al. 2005] is a popular methodfor image denoising. It assigns filter weights based on the similaritybetween pixel neighborhoods. In the context of rendering, we can

MC GEM RPF Our(8spp) Our Reference82 spp 51.82 spp 8 spp 8 spp 39.79 spp 4096 spp59.9s 61.8s 272.4s 20s 60.9s

0.029797 0.018352 0.057937 0.034708 0.018023

Figure 9: Comparisons on the TOWN scene with an environmentlight, an area light, and heavy occlusion. GEM fails to adapt totextures, and RPF does not obtain enough samples to reconstructthe scene within the given time. Also, RPF contains heavy noise dueto its sampling bilateral filtering approach. Our method adaptivelysamples the dark noisy area and preserves details well.

further utilize scene features for better results. Thus, the cross non-local mean filter assigns the weight wij between two pixels i and jas

exp

(−∑n∈N ‖ ci+n − cj+n ‖

2

2|N |σr2

)m∏k=1

exp

(−D(fik, fjk)2

2σfk2

),

(10)whereN is the neighbourhood (N = {(x, y)|−2 <= x, y <= 2}in our implementation). Other symbols are the same as definedin Section 4.2. Note that we use the patch-based distance onlyfor color information, since patch-based distance for scene featurestends to smooth out features. The filtered pixel color ci of pixeli is computed as the weighted combination of the colors cj of allneighboring pixels j within a 41× 41 neighborhood.

To demonstrate the utility of SURE-based filter selection, we ap-plied cross non-local means filters in two settings. For the first set-ting – the global cross non-local means filter – we used the samerange parameter σr across the whole image. For the second one– the SURE cross non-local means filter – we constructed a crossnon-local means filterbank by varying σr and used SURE to se-lect best filters and shot samples. Figure 11 shows the comparisonsbetween the two. It is clear that the SURE-based framework signif-icantly alleviates the over-smoothness problem of the global filter,especially in shadows and in the motion blur of the moving car.From our experiments, filtering with cross non-local means filters

GEM (MSE: 0.000287) Our (MSE: 0.000356)

Figure 10: The proposed SURE-based framework incorpo-rated with an isotropic Gaussian filterbank and compared withGEM [Rousselle et al. 2011]. The results and scale selection mapsgenerated by both methods are similar.

sometimes generated slightly better results than cross bilateral fil-ters. However, it is about 10 times slower than the cross bilateralfilter. As a compromise between quality and performance, we optedto use cross bilateral filters for most results in the paper.

5.5 Limitations

The TEAPOT scene (Figure 7) reveals a limitation of our approach.The bump mapped floor contains a large number of very high-frequency textures. At the same time, it suffers from a large amountof MC noise due to the environment lighting and glossy reflection.With a low sample budget, our approach does not preserve all thedetails well. In addition, as with most reconstruction approaches,our method was susceptible to oversmoothing.

6 Conclusion and Future Work

We have presented an efficient adaptive sampling and reconstruc-tion algorithm for reducing noise in Monte Carlo rendering by us-ing Stein’s Unbiased Risk Estimator (SURE) in the error estimationframework. For reconstruction, the use of SURE enables us to mea-sure the reconstruction quality for arbitrary filter kernels. It doesaway with the limitation of using only symmetric kernels imposedby previous work. This freedom to use non-symmetric kernels sig-nificantly improves the effectiveness of the framework. When per-forming adaptive sampling, SURE can be used to determine thesampling density. Another contribution of this paper is an efficientand memory-friendly approach to detect noisy geometric featureswhen rendering depth of field and motion blur. As a result, theproposed adaptive sampling and reconstruction method efficientlyeliminates MC noise while preserving the vivid details of a scene.Experiments show the proposed method offers significant improve-ment over state-of-the-art approaches.

Global cross non-local means SURE cross non-local meansMSE:0.085032 MSE:0.012010

Global SURE Reference Global SURE Reference

Figure 11: Comparison of cross non-local means filters withoutand with SURE-based framework. Compared to the global crossnon-local means filter, our SURE-based optimization largely alle-viates the oversmoothing problem. The sampling rate of the noisyinput image is about 41 samples per pixel.

One possible future direction is to implement the proposed algo-rithms on GPUs for interactive applications. We also would liketo extend the SURE-based framework to animation rendering. Inthe current algorithm, as there is no built-in mechanism specificallydesigned for temporal data, temporal coherence cannot be guaran-teed. In practice, we have experimented with a naive approach thatrenders each frame independently. The results look good enoughwith only very subtle temporal flicking. However, a better way tohandle animation would be to consider temporal samples and per-form filtering in the spatial-temporal domain. Finally, it would alsobe interesting to adapt SURE to other rendering applications thatrequire error estimation.

A Derivatives for filters

To compute SURE for reconstruction filters, we must calculate theirderivatives dF (ci)/dci and substitute into Equation 2. For the crossbilateral filter, from Equation 5, we have (note that wii = 1 accord-ing to the definition in Equation 4)

F (ci) =

∑nj=1 wijcj∑nj=1 wij

=

∑j 6=i wijcj + ci∑n

j=1 wij. (11)

Let Wi =∑nj=1 wij . After applying the quotient rule of deriva-

tives, we have

dF (ci)

dci=

(d(

∑j 6=i wijcj+1)

dci)− dWi

dciF (ci)

Wi. (12)

After substituting dwij

dci=

(cj−ci)σ2r

wij into Equation 12, and aftersome manipulations, we obtain

dF (ci)

dci=

1

Wi+

1

σ2r

(F 2(ci)− F (ci)2), (13)

where

F 2(ci) =

∑nj=1 wijcj

2∑nj=1 wij

.

For the cross non-local means filter, its dF (ci)/dci is

dF (ci)

dci=

1

Wi+

1

|N |σ2r

(F 2(ci)− F (ci)2)+

1

|N |σ2rWi

∑n∈N

wi,i−n(ci − ci+n)(F (ci)− ci−n),(14)

where Wi and F 2(ci) are defined similarly as above. Similarderivations can be found in Van De Ville and Kocher’s paper [2009].

Acknowledgements. We would like to thank the anonymous re-viewers for their valuable comments and the creators of the mod-els used in this paper: airplane (Pedro Caparros), Ferrari (Render-Here), buildings, street cones (UnityDevelopment), western build-ings (Tippy), Big Bens (Skipper25), street lamps (jotijoti), down-loaded from ShareCG.com; elevated walk (dogbite1066) fromShareAEC.com; plants (Tiago Crisostomo) from Google Sketchup;Crytek Sponza (Marko Dabrovic and Crytek GmbH), Sibenik(Marko Dabrovic and Mihovil Odak), gargoyle (INRIA) via theAIM@SHAPE repository; Toasters scene (Andrew Kensler) via theUtah 3D Animation Repository. This work was partly supported bygrants NTU101R7609-5 and NSC101-2628-E-002-031-MY3.

References

BALA, K., WALTER, B., AND GREENBERG, D. P. 2003. Com-bining edges and points for interactive high-quality rendering.ACM Trans. Graph. (Proceedings of SIGGRAPH 2003) 22, 3,631–640.

BAUSZAT, P., EISEMANN, M., AND MAGNOR, M. 2011. Guidedimage filtering for interactive high-quality global illumination.Computer Graphics Forum (Proceedings of EGSR 2011) 30, 4,1361–1368.

BLU, T., AND LUISIER, F. 2007. The SURE-LET approach toimage denoising. IEEE Transactions on Image Processing 16,11, 2778–2786.

BUADES, A., COLL, B., AND MOREL, J.-M. 2005. A non-localalgorithm for image denoising. In Proceedings of IEEE Com-puter Vision and Pattern Recognition (CVPR 2005), 60–65.

CHEN, J., WANG, B., WANG, Y., OVERBECK, R. S., YONG, J.-H., AND WANG, W. 2011. Efficient depth-of-field renderingwith adaptive sampling and multiscale reconstruction. ComputerGraphics Forum 30, 6, 1667–1680.

DAMMERTZ, H., SEWTZ, D., HANIKA, J., AND LENSCH, H.P. A. 2010. Edge-avoiding A-Trous wavelet transform for fastglobal illumination filtering. In Proceedings of the Conferenceon High Performance Graphics (HPG 2010), 67–75.

DONOHO, D., AND JOHNSTONE, I. M. 1995. Adapting to un-known smoothness via wavelet shrinkage. Journal of the Amer-ican Statistical Association 90, 1200–1224.

EGAN, K., TSENG, Y.-T., HOLZSCHUCH, N., DURAND, F., ANDRAMAMOORTHI, R. 2009. Frequency analysis and shearedreconstruction for rendering motion blur. ACM Trans. Graph.(Proceedings of SIGGRAPH 2009) 28, 3, Article 93.

EGAN, K., HECHT, F., DURAND, F., AND RAMAMOORTHI, R.2011. Frequency analysis and sheared filtering for shadow lightfields of complex occluders. ACM Trans. Graph. 30, 2, Article9.

FIORIO, C. V. 2004. Confidence intervals for kernel density esti-mation. Stata Journal 4, 2, 168–179.

HACHISUKA, T., JAROSZ, W., WEISTROFFER, R. P., DALE, K.,HUMPHREYS, G., ZWICKER, M., AND JENSEN, H. W. 2008.Multidimensional adaptive sampling and reconstruction for raytracing. ACM Trans. Graph. (Proceedings of SIGGRAPH 2008)27, 3, Article 33.

HACHISUKA, T., JAROSZ, W., AND JENSEN, H. W. 2010. Aprogressive error estimation framework for photon density esti-mation. ACM Trans. Graph. (Proceedings of SIGGRAPH Asia2010) 29, 6, Article 144.

LEHTINEN, J., AILA, T., CHEN, J., LAINE, S., AND DURAND,F. 2011. Temporal light field reconstruction for rendering distri-bution effects. ACM Trans. Graph. (Proceedings of SIGGRAPH2011) 30, 4, Article 55.

LEHTINEN, J., AILA, T., LAINE, S., AND DURAND, F. 2012. Re-constructing the indirect light field for global illumination. ACMTrans. Graph. (Proceedings of SIGGRAPH 2012) 31, 4 (July),51:1–51:10.

MITCHELL, D. P. 1987. Generating antialiased images at lowsampling densities. In Proceedings of SIGGRAPH 1987, 65–72.

MITCHELL, D. P. 1991. Spectrally optimal sampling for distribu-tion ray tracing. In Proceedings of SIGGRAPH 1991, 157–164.

OVERBECK, R. S., DONNER, C., AND RAMAMOORTHI, R. 2009.Adaptive wavelet rendering. ACM Trans. Graph. (Proceedingsof SIGGRAPH Asia 2009) 28, 5, Article 140.

PHARR, M., AND HUMPHREYS, G. 2010. Physically Based Ren-dering: From Theory To Implementation, 2nd ed. Morgan Kauf-mann Publishers Inc.

ROUSSELLE, F., KNAUS, C., AND ZWICKER, M. 2011. Adaptivesampling and reconstruction using greedy error minimization.ACM Trans. Graph. (Proceedings of SIGGRAPH Asia 2011) 30,6, Article 159.

SEGOVIA, B., IEHL, J. C., MITANCHEY, R., AND PEROCHE,B. 2006. Non-interleaved deferred shading of interleavedsample patterns. In Proceedings of the 21st ACM SIG-GRAPH/EUROGRAPHICS Symposium on Graphics Hardware,53–60.

SEN, P., AND DARABI, S. 2012. On filtering the noise fromthe random parameters in Monte Carlo rendering. ACM Trans.Graph. 31, 3, Article 18.

SHIRLEY, P., AILA, T., COHEN, J., ENDERTON, E., LAINE, S.,LUEBKE, D., AND MCGUIRE, M. 2011. A local image recon-struction algorithm for stochastic rendering. In Proceedings ofSymposium on Interactive 3D Graphics and Games, 9–14.

SOLER, C., SUBR, K., DURAND, F., HOLZSCHUCH, N., ANDSILLION, F. 2009. Fourier depth of field. ACM Trans. Graph.28, 2, Article 18.

STEIN, C. M. 1981. Estimation of the mean of a multivariatenormal distribution. Annals of Statistics 9, 6, 1135–1151.

TAMSTORF, R., AND JENSEN, H. W. 1997. Adaptive sampling andbias estimation in path tracing. In Proceedings of EurographicsRendering Workshop, 285–295.

VAN DE VILLE, D., AND KOCHER, M. 2009. SURE-based non-local means. IEEE Signal Processing Letters 16, 11, 973–976.

XU, R., AND PATTANAIK, S. N. 2005. A novel Monte Carlo noisereduction operator. IEEE Computer Graphics and Applications25, 2, 31–35.

Documents

SURE-based Optimization for Adaptive Sampling and ... · SURE-based Optimization for Adaptive Sampling and Reconstruction ... our approach better ... (such as the motion blur pattern