1
Denoising Low Light Images Nitish Padmanaban, Geet Sethi, Paroma Varma {nit,gsethi,paroma}@stanford.edu Camera technology has been advancing, but low-light Poisson noise is a fundamental problem that still persists. This is exacerbated by the small sensors of the near-ubiquitous cellphone cameras. Current denoising techniques are often focused on Gaussian noise, require multiple images of the same scene, or may simply be too computationally expensive for the capture process. We attempt to solve this via a linear regression approach, which can be implemented efficiently with convolution. Introduction Models Results Learning Approach We start with images from the Berkeley Segmentation Data Set, and process them to simulate low-light photographs. The processing is done using the Image Systems Engineering Toolbox to convert from RGB to number of photons, adjusting the color temperature to match natural light, scaling down the photon count, and adding Poisson noise before converting back to RGB. We use 96 images to train,and 57 to test. Dataset We use a 13×13 pixel region, from which we generate the features, which are the pixel values themselves, and the horizontal and vertical differences between adjacent pixels, for each of the three RGB channels. That gives (13×13 + 13×12 + 12×13) × 3 = 1443 total features. Features In the first model, we use linear regression on the 13×13 region to predict the values of the center pixel (which is three RGB values). For a given 13×13 region, if we let x (i) be the features for the region, and y (i) be the true value of the center pixel, then we learn the value of θ using batch gradient descent on the cost function: 1×1 Linear Regression J ()= 1 m m X i=1 T x (i) - y (i) 2 2 y (i) 2 R 3 ,x (i) 2 R 1443 , 2 R 31443 The second model is similar to the first except that it tries to predict the center 3×3, for a total of 27 RGB values. The cost function is unchanged except for the dimension of the variables: 3×3 Linear Regression y (i) 2 R 27 ,x (i) 2 R 1443 , 2 R 271443 We adapt a commonly used CNN to incorporate Poisson noise on our dataset as a machine learning baseline. We also use averaging, and a bilateral filter (edge-aware averaging) to compare to more traditional denoising methods. Baselines PSNR VSNR Linear 3x3 44.71739881 0.023562304 Linear 1x1 44.07567474 0.021755898 CNN 43.39981386 0.016494334 Bilateral 43.0095528 0.014149648 Smooth 41.83301976 0.009381497 Training Test Linear 3 3 1.3210 -3 3.110 -3 Linear 1 1 1.6410 -3 3.310 -3 CNN 0.621 1.032 Different error metric from the others, which are MSE We include comparisons of PSNR and VSNR (a perceptually accurate measure). These are more relevant than the training and test errors (also included), which are on small patches. We also include patches from a selected image for qualitative comparison. Noise-free Noisy Linear 3x3 Linear 1x1 CNN Bilateral Averaging Patch A Patch B Patch C Discussion Our regression does surprisingly well in maintaining edges and reducing noise. The edge awareness, based on the values we see for the learned θ parameter, is due to the use of horizontal and vertical differences. We note that it performs better than the CNN, but this is likely due to a smaller training dataset, which more negatively affects the CNN than our approach. The qualitative analysis shows that the ML-based approaches removed chromatic noise, while keeping noise in the intensity, opposite the traditional approached. Future Work First approaches would include adding perceptually and physically linear features such as pixel values and differences in the Lab and XYZ color spaces. There is also a lot of optimization that can be done on the input and target patch sizes as well, which may require a larger dataset. It would also be informative to compare this against state-of-the- art, but computationally expensive, denoising techniques such as BM3D, and with the CNN on a larger training set. Exaggerated 1x1 θ 1x1 θ Visualization References Farrell, Joyce, Okincha, Mike, Parmar, Manu, and Wandell, Brian. Using visible snr (vsnr) to compare the image quality of pixel binning and digital resizing. In IS&T/SPIE Electronic Imaging, pp. 75370C–75370C. International Society for Optics and Photonics, 2010. Hunt, R.W.G. The Reproduction of Colour. The Wiley-IS&T Series in Imaging Science and Technology. Wiley, 2005. ISBN 9780470024263. URL https://books.google.com/books?id=Cd_ FVeuO10gC. Hyvärinen, Aapo, Hoyer, Patrik, and Oja, Erkki. Image denoising by sparse code shrinkage. In Intelligent Signal Processing. Citeseer, 1999. Jin, Fu, Fieguth, Paul, Winger, Lowell, and Jernigan, Edward. Adaptive wiener filtering of noisy images and image sequences. In Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on, volume 3, pp. III–349. IEEE, 2003. Kingsbury, Nick. Complex wavelets for shift invariant analysis and filtering of signals. Applied and computational harmonic analysis, 10(3):234–253, 2001. Martin, D., Fowlkes, C., Tal, D., and Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, volume 2, pp. 416–423, July 2001. Ren, Jimmy SJ and Xu, Li. On vectorization of deep convolutional neural networks for vision tasks. arXiv preprint arXiv: 1501.07338, 2015. Wandell, Brian and Farrell, Joyce. Image systems engineering toolbox (ISET) http://www.imageval.com/aboutiset/.

Denoising Low Light Images - Machine learningcs229.stanford.edu/proj2016/...DenoisingLowLightImages-poster.pdf · Denoising Low Light Images Nitish Padmanaban, Geet Sethi, Paroma

  • Upload
    trandat

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Denoising Low Light Images - Machine learningcs229.stanford.edu/proj2016/...DenoisingLowLightImages-poster.pdf · Denoising Low Light Images Nitish Padmanaban, Geet Sethi, Paroma

Denoising Low Light Images Nitish Padmanaban, Geet Sethi, Paroma Varma

{nit,gsethi,paroma}@stanford.edu

Camera technology has been advancing, but low-light Poisson noise is a fundamental problem that still persists. This is exacerbated by the small sensors of the near-ubiquitous cellphone cameras. Current denoising techniques are often focused on Gaussian noise, require multiple images of the same scene, or may simply be too computationally expensive for the capture process. We attempt to solve this via a linear regression approach, which can be implemented efficiently with convolution.

Introduction Models Results

Learning Approach

We start with images from the Berkeley Segmentation Data Set, and process them to simulate low-light photographs. The processing is done using the Image Systems Engineering Toolbox to convert from RGB to number of photons, adjusting the color temperature to match natural light, scaling down the photon count, and adding Poisson noise before converting back to RGB. We use 96 images to train,and 57 to test.

Dataset

We use a 13×13 pixel region, from which we generate the features, which are the pixel values themselves, and the horizontal and vertical differences between adjacent pixels, for each of the three RGB channels. That gives (13×13 + 13×12 + 12×13) × 3 = 1443 total features.

Features

In the first model, we use linear regression on the 13×13 region to predict the values of the center pixel (which is three RGB values). For a given 13×13 region, if we let x(i) be the features for the region, and y(i) be the true value of the center pixel, then we learn the value of θ using batch gradient descent on the cost function:

1×1 Linear Regression

J(✓) =1

m

mX

i=1

������✓Tx(i) � y

(i)������2

2

y

(i) 2 R3, x

(i) 2 R1443, ✓ 2 R3⇥1443

The second model is similar to the first except that it tries to predict the center 3×3, for a total of 27 RGB values. The cost function is unchanged except for the dimension of the variables:

3×3 Linear Regression

y

(i) 2 R27, x

(i) 2 R1443, ✓ 2 R27⇥1443

We adapt a commonly used CNN to incorporate Poisson noise on our dataset as a machine learning baseline. We also use averaging, and a bilateral filter (edge-aware averaging) to compare to more traditional denoising methods.

Baselines

PSNR VSNRLinear 3x3 44.71739881 0.023562304Linear 1x1 44.07567474 0.021755898

CNN 43.39981386 0.016494334Bilateral 43.0095528 0.014149648Smooth 41.83301976 0.009381497

Denoising Low-Light Images

Noisy Brightened Noisy Brightened Denoised

Figure 2. A sample set of images from the test set. The first col-umn is the input noisy image, the second this image artificiallybrightened for easier viewing, and the third column is the bright-ened denoised output for the 3⇥ 3 linear regression.

Table 1.Training Test

Linear 3⇥ 3 1.32⇥10

�33.1⇥10

�3

Linear 1⇥ 1 1.64⇥10

�33.3⇥10

�3

CNN⇤0.621 1.032

⇤Different error metric from the others, which are MSE

color channel’s filter does depend on information from theother two colors. This suggests that training for one RGBpixel instead of each color channel separately does improvedenoising.

The output image is generated in non-overlapping 1⇥ 1 or3⇥ 3 patches using the learned ✓.

5. Results

We ran the linear regression described in section 4 for boththe 1⇥ 1 and 3⇥ 3 cases. The training errors for these, aswell as for a commonly used CNN denoising model (?) canbe found in Table 1. The errors for our models are com-puted according to Equation 1. The CNN was re-trainedon our dataset to account for Poisson instead of Gaussiannoise, as well as more directly compare the algorithms

Nois

e-fre

eNo

isy

Line

ar 3

x3Li

near

1x1

CNN

Bila

tera

lAv

erag

ing

Patch A Patch B Patch C

A

B

C

Figure 3. The various methods trade off between spatial and chro-matic blur, residual chromatic noise, and residual noise in inten-sity. Our 3 ⇥ 3 method achieves comparable chromatic blur tothe CNN with smaller spatial blur. The more standard smooth-ing and bilateral approaches have less noise in the intensities, buthave residual chromatic noise.

We include comparisons of PSNR and VSNR (a perceptually accurate measure). These are more relevant than the training and test errors (also included), which are on small patches. We also include patches from a selected image for qualitative comparison.

Noi

se-fr

eeN

oisy

Line

ar 3

x3Li

near

1x1

CN

NB

ilate

ral

Ave

ragi

ng

Patch A Patch B Patch C

A

B

C

Discussion Our regression does surprisingly well in maintaining edges and reducing noise. The edge awareness, based on the values we see for the learned θ parameter, is due to the use of horizontal and vertical differences. We note that it performs better than the CNN, but this is likely due to a smaller training dataset, which more negatively affects the CNN than our approach. The qualitative analysis shows that the ML-based approaches removed chromatic noise, while keeping noise in the intensity, opposite the traditional approached.

Future Work First approaches would include adding perceptually and physically linear features such as pixel values and differences in the Lab and XYZ color spaces. There is also a lot of optimization that can be done on the input and target patch sizes as well, which may require a larger dataset.

It would also be informative to compare this against state-of-the-art, but computationally expensive, denoising techniques such as BM3D, and with the CNN on a larger training set.

Exag

gera

ted

1x1 θ"

1x1 θ

Visu

aliz

atio

n"

References Farrell, Joyce, Okincha, Mike, Parmar, Manu, and Wandell, Brian. Using visible snr (vsnr) to compare the image quality of pixel binning and digital resizing. In IS&T/SPIE Electronic Imaging, pp. 75370C–75370C. International Society for Optics and Photonics, 2010.

Hunt, R.W.G. The Reproduction of Colour. The Wiley-IS&T Series in Imaging Science and Technology. Wiley, 2005. ISBN 9780470024263. URL https://books.google.com/books?id=Cd_ FVeuO10gC.

Hyvärinen, Aapo, Hoyer, Patrik, and Oja, Erkki. Image denoising by sparse code shrinkage. In Intelligent Signal Processing. Citeseer, 1999.

Jin, Fu, Fieguth, Paul, Winger, Lowell, and Jernigan, Edward. Adaptive wiener filtering of noisy images and image sequences. In Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on, volume 3, pp. III–349. IEEE, 2003.

Kingsbury, Nick. Complex wavelets for shift invariant analysis and filtering of signals. Applied and computational harmonic analysis, 10(3):234–253, 2001.

Martin, D., Fowlkes, C., Tal, D., and Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, volume 2, pp. 416–423, July 2001.

Ren, Jimmy SJ and Xu, Li. On vectorization of deep convolutional neural networks for vision tasks. arXiv preprint arXiv:1501.07338, 2015.

Wandell, Brian and Farrell, Joyce. Image systems engineering toolbox (ISET) http://www.imageval.com/aboutiset/.