Color Transfer and its Applicationssc/papers/book-chapter07.pdf · 2007-02-04 · Color Transfer and its Applications Arvind Nayak1, Subhasis Chaudhuri2, and Shilpa Inamdar2 1 ERP

Color Transfer and its Applications

Arvind Nayak1, Subhasis Chaudhuri2, and Shilpa Inamdar2

1 ERP Joint Research Institute in Signal and Image ProcessingSchool of Engineering and Physical Sciences, Heriot-Watt UniversityRiccarton, Edinburgh – EH14 4AS, [email protected]

2 Vision and Image Processing LaboratoryDepartment of Electrical Engineering, Indian Institute of Technology-BombayPowai, Mumbai – 400 076, INDIA.{sc, shilpa}@ee.iitb.ac.in

Varying illumination conditions result in images of a same scene differing widely in

color and contrast. Accommodating such images is a problem ubiquitous in machine

vision systems. A general approach is to map colors (or features extracted from colors

within some pixel neighborhood) from a source image to those in some target image

acquired under canonical conditions. This article reports two different methods, one

neural network-based and the other multidimensional probability density function

matching-based, developed to address the problem. We explain the problem, discuss

the issues related to color correction and show the results of such an effort for specific

applications.

1 Introduction

Every machine vision system is built with some underlying constraints, andas long as these are satisfied it works beautifully well. Assumptions like, theimages have good contrast and the illumination conditions do not changesignificantly are often implicit. In a controlled environment, for example, au-tomated mounting of electronic components on a printed circuit board, wherethe direction, intensity and wavelength of the illumination is well known in ad-vance, it is feasible to assume that the illumination conditions do not change.Consider another example, a widely required task in image-based applicationsof tracking features through image sequences. Human gait analysis, gesturebased telerobotic applications, video surveillance, observing ground activitiesfrom aerial images and automatic pointing of space telescopes for studyingobjects in space are a few examples of the outdoor environment. Images ac-quired during many such applications suffer from drastic variations in colorand contrast due to large variations in illumination conditions. A generalapproach for correcting colors or improving color contrast is to map colors

2 Arvind Nayak, Subhasis Chaudhuri, and Shilpa Inamdar

from a source image to those in some target image acquired under canoni-

cal3 conditions. Simply put, this means transforming a source image, takenunder unknown illumination conditions, to an image such that it appears asif the original scene is illuminated by the canonical illumination conditions.Sometimes, the illumination conditions in the source and target image couldbe such that the mapping is non-linear. The complexity of this problem canbe appreciated through the following equation (reproduced here from [6] forbrevity), given for a color signal at the sensor. The camera output is affectedby surface reflectance and illumination. For the red channel we have:

R(x, y) =

∫

E(λ)S(x, y, λ)CR(λ) dλ, (1)

where CR(λ) is the spectral sensitivity of camera’s red channel (similar equa-tions for the green and blue channels G(x, y) and B(x, y), respectively), E(λ)is the spectrum of the incident illumination and S(x, y, λ) is the spectralreflectance of the surface. The spectral sensitivity of camera for the threechannels (R, G and B) and the spectral reflectance of surface at (x, y) are theproperties which remain constant. Even though the equation appears straightforward it is impossible to differentiate between the contributions of S andE without any additional information or scene model [5]. The aim thereforeis to correct colors using simple techniques, bypassing the difficult process ofestimating S and E.

This article presents two different methods for color correction. First, weshow it is possible to use a relatively simple neural network to learn themapping (build a discrete lookup table) of colors between the image acquiredunder vision-unfriendly illumination conditions and the image acquired undersome canonical illumination conditions. Second, we show a multidimensionalprobability density function (pdf) matching scheme that does the work verywell by not only correcting colors but also preserving the statistics within theimages. Such a technique can also be used to correct multispectral imagesobtained by remote sensing satellites. We test our techniques on color andgray level imaging scenarios, and in situations like registered and unregisteredimage pairs4.

For the first method (hereafter referred to as the NN method), this articlediscusses an approach based on multi-layer feed-forward network using a back-propagation learning rule. The principal application domain is skin color-based hand tracking. The trajectory information obtained through trackingcan be used for recognition of dynamic hand gestures [13]. It is well known thatany skin color-based tracking system must be robust enough to accommodatevarying illumination conditions that may occur while performing gestures.

3 It is an ambient illumination condition under which the machine vision system isknown to perform well.

4 The source and the target images used for training are of dissimilar types, i.e.,they do not have a pixel-to-pixel correspondence.

Color Transfer and its Applications 3

Some robustness may be achieved in cases where the skin color distributionhas undergone narrow changes, through the use of luminance invariant color-spaces [14, 20, 22]. But under poor lighting conditions, mere switching toa different color space will not be of any help. One solution is to correctimage colors, by appropriately modifying the look up table, before giving it tothe tracking algorithm. The neural network learns the mapping between twoillumination conditions with the help of one or more pairs of color palettes andthen applies this transformation to every pixel in the successive frames of theimage sequence. The initial experiments for training the neural network wereperformed by computer generated standard color palettes (refer Section 3)and later, an automatically extracted palm region alone was used as a colorpalette for training the neural network (refer subsection 4.1). Finally we usethe Condensation algorithm [9] for tracking and verify the results of colortransfer. The reason behind using this algorithm is its capability for robusttracking of agile motion in clutter.

We extend the NN method (hereafter referred to as the NNx method) anduse a cluster-based strategy for contrast restoration5 in arbitrary gray levelimages (by arbitrary we mean situations where registration between sourceand target image pairs is not available or possible). The image is first seg-mented through a fuzzy c-means clustering technique and correspondencesare established for segments in source and target image pairs. The neuralnetwork then estimates the color transfer function for pixels using their localhistograms in the source and the target images.

The second method (hereafter referred to as the ND method) discusses N -dimensional pdf matching which takes into account correlation among variousspectral channels. Here N is the number of spectral channels in an image.Although, for comparison with NN method we use N = 3, i.e R, G and B

channels of a color image, the method can easily be extended for N > 3. Thisis usually a case with multispectral remote sensing images.

The organization of the article is as follows. In Section 2 we very brieflyreview some of the current approaches to color correction. The proposed NNmethod for color correction is discussed in Section 3 followed by skin color-based hand tracking (principal application domain for the NN method) inSection 4. NNx is discussed in Section 5 and ND in Section 6. Results ofvarious experiments using NN, NNx and ND are shown in respective sections.Finally, Section 7 provides a brief summary.

5 The restoration of contrast is one of the main techniques by which the illuminationof a scene may appear to have been corrected. True illumination correction in thecontext of computer vision, however, should take into account the directions oflight sources and surface properties. But, usually such techniques require morecomputations and also prior information about illuminants which are not feasiblein many applications.


2 Related work

Equation 1 showed that the colors apparent to an imaging system depend onthree factors: the light source illuminating the object, the physical propertiesof the object and the spectral response of the imaging sensor. A small changein any of these factors can bring a drastic change in image colors. This canseriously affect the operation of a machine vision system designed to workonly under some specific conditions. One solution is to correct colors in theimages before passing them to the machine vision system. Barnard et al. [2]define the goal of computational color constancy to account for the effect ofilluminant, either by directly mapping the image to a standardized illuminantinvariant representation, or by determining a description of the illuminantwhich can be used for subsequent color correction of the image.

Several algorithms for color constancy have been suggested, the mostpromising among them fall under the following categories: gray world meth-ods, gamut mapping methods, color by correlation, neural net based methodsand algorithms that are based on the Retinex theory. An extensive compari-son of these can be found in [2, 3]. Estimating illuminant for color constancyis an underdetermined problem and additional constraints must be added tomake the solution feasible. Many papers, [4, 5, 11, 12, 23], advocate neuralnetwork-based methods as they do not make any explicit assumptions and arecapable of modeling non-linearities existing in the mapping function.

Choice of color space is another issue in color correction algorithms. Rein-hard et al. [18] argue that the color space, lαβ, developed by Ruderman et al.

[19] minimizes correlation among color channels of many natural scenes andlittle correlation between the axes in lαβ color space means different oper-ations on different channels are possible without having to worry about thecross-channel artifacts. This algorithm converts the RGB space to lαβ space,works on that space, and converts back to RGB at the final step. RGB stillremains the most popular color space because of its use in display and sensordevices. A comparison of color correction results using different color spacescan be found in [18, 24].

Most color transfer techniques mentioned before are sensitive to the sizeof matching color clusters in the source and the target images. They producevisually unpleasant colors if there are not enough matching regions or a certainregion in an image occupies, relatively, a larger fraction than in the other.Pitie et al. [17] develop a non-parametric method that is effective in matchingarbitrary distributions. Our work presented in Section 6 is an extension totheirs.

Apart from correcting colors for image processing [16, 21] or for improvingperformance of machine vision systems, color transfer techniques have alsobeen employed for region-based artistic recoloring of images [8]. Here thetarget image is recolored according to the color scheme of a source image.Other applications include color and gray level correction of photos printedin photo printing labs [11] and digital projection environments [23].


3 Color correction for similar images

G

B

20 NODES20 NODES

R

G

R

B

IMAGE IMAGE

TARGETSOURCE

3 NODES 3 NODES

Fig. 1. The multi-layer feed-forward neural network used to learn mapping betweencolors in source and target image pairs.

The first stage in this process involves iteratively estimating the mapping(updating a lookup table) between colors in the source and the target im-ages, and the second stage corrects the image pixel-wise, based on the lookuptable. Neural networks are capable of learning non-linear mappings that ap-proximate any continuous function with any required accuracy. The neuralnetwork learns the mapping between two illumination conditions with thehelp of one or more pairs of color palettes. Each pair of the color palette usedfor training has one image of the palette observed under unknown illumina-tion conditions, which we refer as realworld palette, and the other, image ofthe same palette observed under canonical lighting conditions, which we referas canonical palette.

We suggest a multi-layer feed-forward network with the back-propagationlearning rule, for learning the mapping between the realworld palette and thecanonical palette. This is illustrated in Figure 1. The weight adjustment isdone with the sigmoid as the activation function. Each layer consists of nodeswhich receive their inputs from nodes from a layer directly below and sendtheir outputs to nodes in a layer directly above the nodes. The network isthus fully connected but there are no connections within a layer. Empiricalstudies have shown that two hidden layers with 20 nodes in each of the hiddenlayers yield acceptable results in learning the mapping between the palettesat a moderate computation. The input nodes are merely fan-out nodes; noprocessing takes place in these nodes. We study the effect of training theneural network by using standard color palettes.

During our initial experimentation we captured images of computer gen-erated standard color palettes in the desired illumination condition, prior tostart of a gesture (this comes as an application discussed in Section 4). These


(a) (b) (c)

Fig. 2. (a), (b) and (c): Standard color-palettes as observed under canonical illu-mination conditions.

(a) (b) (c)

Fig. 3. (a), (b) and (c): Standard color-palettes as observed under arbitrary illumi-nation conditions.

(a) (b) (c)

Fig. 4. Results obtained after applying color correction to palettes in Figures 3(a),(b) and (c), respectively. The target images in this case were Figures 2(a), (b) and(c), respectively.

palettes train the neural network for the mapping between the realworldpalette and the canonical palette. Figure 2(a) shows a computer generatedcanonical palette for 16 colors (colors are repeated and placed arbitrarily).Figure 2(b) shows a canonical palette within skin color range (this comes asan application). The palette is generated in such a manner that it covers theentire range of skin color in the Y CbCr color space. Similarly, Figure 2(c)shows a palette obtained with the whole gray scale range arranged in the


(a) (b)

Fig. 5. Illustration of color correction for the gesture dataset. (a) Input image,(b) color corrected (a) using weights generated from standard color palette basedtraining.

ascending order from 0 − 255 for 8-bit levels. Figures 2(a), (b) and (c) showthe corresponding realworld palettes observed under an arbitrary illuminationcondition.

The realworld palette and the canonical palette form a pair for each train-ing set by the neural network. The sizes of the palettes in the image planeare properly scaled to make both the palettes equal in size. The R, G and B

values for each pixel in the training palettes are also scaled between 0 − 1.The goal of the neural network is then to estimate the weights for mappingof the scaled RGB values for the pixel at the input nodes to the scaled RGB

values for each corresponding pixel in the canonical palette(s).When a learning process is initiated, in the first phase, the scaled RGB

values of the pixels chosen randomly from the realworld palette image arefed into the network. The weights are initially assigned small random valuesand the activation values are propagated to the output nodes. The actualnetwork output is now compared with the scaled RGB value, i.e the desiredvalue, of the pixel at the corresponding position in the canonical palette. Weusually end up with an error in each of the output nodes. The aim is to reducethese errors. The second phase involves a backward pass through the networkduring which the error signal is passed to each unit in the network and theappropriate weight changes are calculated.

The network having two hidden layers with 20 nodes in each layer wastrained by feeding the matched pixel values multiple times, randomly. Thepurpose of using a fairly large number of weights is to make the histogramtransfer function a smooth one for all color planes. Although there is no guar-antee that the learnt mapping will be monotonic in each band, we do not expe-rience any noticeable difficulty during experimentation. The establishment ofcorrespondences between pixels from source and target palettes is quite sim-ple. We detect the top-left and bottom-right corners of the palettes, resize andalign them to have pixel wise correspondences. The number of iterations (orequivalently the number of times pixels were fed into the network) is depen-dent on both the size of the image and the color variations within the image.


(a) (b)

(c) (d) (e)

Fig. 6. Color correction using registered source and target images in the living roomdataset. (a) Source image 1, (b) source image 2, (c) target image, (d) color corrected(a), and (e) color corrected (b).

Empirical studies show that increasing the number of iterations beyond 20times the number of image pixels does not appreciably reduce the error. Thetraining is now terminated and the weights thus obtained are eventually usedto transform successive frames in the image sequence. The expected result atoutput of the color correction algorithm is a realworld image transformed intoan illumination condition had it been illuminated by the canonical lightingconditions.

We present typical results in Figures 4 and 5. It can be observed that theillumination conditions of Figures 4(a), (b) and (c) are transformed to nearlymatch the illumination conditions of Figures 2(a), (b) and (c), respectively.Figure 5(a), image from a Gesture sequence, was taken under the same illu-mination conditions as Figure 3(a), and is used as a test image. Figure 5(b)shows the result when weights obtained by training the neural network withFigure 3(a) and Figure 2(a) as source and target image pair, respectively, wereused for color correction.

The previous set of experiments make use of the availability of standardcolor palettes. In absence of above, one may use any registered pair of imagesunder two different lighting conditions to learn the mapping function whichcould subsequently be used to transfer the color of any test image accordingly.However, the learning of the mapping function depends on the richness of thecolor tones in the training data set.

Figure 6 illustrates the above on a Living Room dataset. Two color imagesof similar contents acquired under two different poor illumination conditions,shown in Figures 6(a) and (b), are used as source images. Figures 6(c) isthe target image with the desired illumination. Figures 6(d) and (e) show


(a) (b)

Fig. 7. 3-D Scatter plots for the living room dataset. x-axis, y-axis and z-axisrepresent R, G and B color triplets, respectively. Scatter plots for source, targetand color corrected images are shown in a single plot. Red color markers representthe source image, green, the target image, and blue, the color corrected image. (a)scatter plots for Figures 6(a), (c) and (d). (b) scatter plots for Figures 6(b), (c) and(e).

(a) (b) (c) (d) (e)

Fig. 8. Color correction in synthetically color distorted images of paintings. (a)Source image (synthetically color distorted (b)). (b) Target image (original). (c)Test image (synthetically color distorted (e) with conditions similar to that usedfor (a)). (d) color corrected (c). (e) Original test image before synthetic colordistortion, shown here for side-by-side visual comparison with (d). Image source:http://www.renaissance-gallery.net/.

color corrected Figures 6(a) and (b), respectively. The color contrast is verysimilar to that of Figure 6(c). Figure 7 shows 3-D scatter plots for the livingroom dataset. x-axis, y-axis and z-axis represent R, G and B color triplets,respectively. For visual comparison, scatter plots for the source, target andcolor corrected images are combined in a single plot. Markers with red colorrepresent the scatter plot for the source image, green for the target image, andblue for color corrected image. Figures 7(a) and (b) show two such combined3-D scatter plots for tests on source images shown in Figures 6(a) and (b).In Figure 7(a), notice the small cluster of red points, this is for the source


(a) (b)

(c) (d) (e)

Fig. 9. Color correction in synthetically color distorted satellite images. (a) Sourceimage (synthetically color distorted (b)). (b) Target image (original). (c) Testimage (synthetically color distorted (e) with conditions similar to that used for(a)). (d) color corrected (c). (e) Original test image before synthetic color dis-tortion, shown here for side-by-side visual comparison with (d). Image source:http://www.wikimapia.org/.

image. The larger green cluster is for the target image; notice the blue clusterwhich represents the corrected image. It is stretched to the limits similar tothat of the green cluster. This can be seen to overlap very well with the targetdistribution.

Figure 8 shows another set of experimental results on synthetically colordistorted images of paintings. Original images of two paintings downloadedfrom an Internet source (http://www.renaissance-gallery.net/) are shownin Figures 8(b) and (e). Both the original images were distorted in color withsimilar parameters. The color distorted images are shown in Figures 8(a) and(c), respectively. For training Figures 8(a) and 8(b) were used as source andtarget images, respectively. Figure 8(c) was then used as a test input and thecolor corrected image is shown in Figure 8(d). Notice the colors in Figure 8(d),which now have characteristics of Figure 8(b). It is interesting to note the colorof the yellow fruit lying at the base of the flower-vase. After color correction,it is changed to near white, while the original color was yellow itself. Thisis due to the fact that the network has failed to learn the look-up table forthe yellow region due to its absence during training. Figure 9 shows results


(a) (b) (c) (d)

Fig. 10. Contrast transfer in gray scale images. (a) Poor and (b) high contrastimages obtained using a mobile phone camera (Nokia 3650).(c) Test input and (d)contrast restored image with NN method.

for synthetically color distorted satellite images. Satellite images of a portionof Ratnagiri city, in Maharashtra, India, were downloaded from Wikimapia(http://www.wikimapia.org). The figure caption explains further details.Notice the quality of color correction in Figure 9(d), it is almost similar tothe original image, Figure 9(e).

It is needless to say that the above technique is applicable to gray scaleimages also. Figure 10(a) and (b) show two images of similar contents capturedusing a mobile phone camera, one with a good contrast and other having avery poor contrast. The images are obtained by simply switching on and offsome of the light bulbs in the room. The neural network is trained usingthis pair of images for generating the look up table for contrast transfer.Figure 10(c) shows another poor quality image captured with a mobile phonecamera. The result of contrast transfer is shown in Figure 10(d) where thedetails are visible.

4 Application in skin color-based hand tracking

In most machine vision-based applications tracking objects in video sequencesis a key issue, since it supplies inputs to the recognition engine. Generally,tracking involves following a bounding box, around the desired object, astime progresses. Skin color-based hand tracking is our principal applicationdomain for testing the NN method.

Figure 11 illustrates the entire scheme of the proposed automatic skincolor correction and hand tracking algorithm [15]. A digital camera capturesgestures under varying lighting conditions. We are required to perform thecolor correction at the beginning of a sequence to account for illuminationchanges during each instance of a gesture. Similarly, one may have to updatethe color correction lookup table during a particular tracking sequence as theambient illumination may also change while performing a particular gesture.To decide for an initiation of a new training, the average intensity of thepalm region from each frame is compared with the average intensity of thepalm region from the frame that was used for the last training. Any change


CAMERADETECT AVERAGE

CHANGE IN INTENSITY

AUTOMATIC EXTRACTION

OF PALM REGION FOR

TRAINING

NEURAL

NETWORK

CANONICAL

PALETTE

TRANSFORMATION

OF FRAMES

CONDENSATION

TRACKER

WEIGHTS

CORRECTED

COLOR

FRAMES

UPDATE

FRAMES

Fig. 11. Block diagram representation of the color correction-based tracking system.

in average brightness value above some predetermined threshold triggers afresh training process. The first few frames from an image sequence are usedto automatically extract the palm region of the hand. The neural networkthen learns the mapping between unknown illumination conditions and thecanonical illumination conditions using this palm region and a pre-stored palmpalette. The successive frames are then color-corrected using the learnt map-ping. The color-corrected frames are then given to the Condensation (particlefilter) tracker. By modeling the dynamics of the target and incorporating ob-servations from segmented skin regions, the tracker determines the probabilitydensity function of the target’s state in each frame with an improved accuracy.In the remaining part of this section we briefly describe our self-induced colorcorrection scheme, the model for the state dynamics and measurements forthe Condensation tracker.

4.1 Self-induced color correction

(a) (b) (c) (d)

Fig. 12. Results using the palm itself as a palette. (a) realworld palm as an observedpalette; (b) Palette (a) enhanced in Linux’s XV color editor simply for visualizationas (a) has very little contrast; (c) canonical palm palette; and (d) the palm regionafter applying color correction on palette in (a).

Earlier we discussed the use of standard color palettes for training theneural network. Whether it is feasible to display the palette in front of thecamera each time we want to capture a gesture sequence, is a question thatobviously arises. The answer to which is, definitely no! We, therefore, go aheadin automating the training process and eliminate our previous requirement ofdisplaying the standard color palettes every time. We use the palm region fortraining which we call here as self-induced training process. Since the ges-ture vocabulary in our studies [13] starts with a flat out palm moving from


a fixed, static position, a motion detector is used to crop out the hand re-gion and eventually the palm region. The only constraint that we put duringthese first few number of frames is that there is low background clutter sothat the palm can be extracted reliably. However, unlike in the case of usingstandard color palettes, one may not actually have a proper pixel wise cor-respondence among the palm palettes as the palm shape itself may change.Nonetheless, we use the correspondences in pixels in a similar way as beforewithin the rectangles bounding the palms. We reduce the number of nodes ineach hidden layer to 5 since a smaller number of matching pixels are availablefor training the network. Figure 12(a) shows an automatically extracted palmregion from the realworld image sequence. Figure 12(c) shows a palm regionfrom a canonical illumination. Figure 12(d) shows the result of applying thelearnt mapping on Figure 12(a). Since Figure 12(a) has very little contrast,the contrast was manually adjusted for visualization and comparison purposes(shown in Figure 12(b)). The results indicate that the illumination conditionsin the transformed image are similar to those in Figure 12(c). Observe thatthe test and the target images are of different size and thus does not pose anyproblem during the training. The neural network will be good only for skinregion and the color mapping for the rest of the color space is expected to bequite inferior as it is not be trained.

4.2 Model for State Dynamics

Since our purpose is to track rectangular windows bounding the hands, weselect the co-ordinates of the center of each rectangular window (x, y) and itsheight (h) and width (w) as elements of the 4-dimensional state vector φt =[x y h w]T . We model the state dynamics as a second-order auto-regressive(AR) process.

φt = A2φt−2 + A1φt−1 + vt (2)

where A1 and A2 are the AR-model parameters, vt is a zero-mean, whiteGaussian random vector. This is intuitively satisfying, since the state dynam-ics may be thought of as a two dimensional translation and a change in size ofthe rectangular window surrounding the hand region. We form an augmentedstate vector Xt for each window as follows.

Φt =

(

φt−1

φt

)

. (3)

Thus we may rewrite Equation 2 as Φt = AΦt−1 + Vt. The Markoviannature of this model is evident in above equation.

4.3 Observation

In order to differentiate the hand region from the rest of the image we need astrong feature which is specific to the hand. It has been found that irrespective


of race, skin color occupies a small portion of the color space [10, 22]. As aresult, skin color is a powerful cue in locating the unadorned hand.

Colors are represented as triplets, e.g. RGB values. However, to detectskin color, the effect of luminance needs to be removed. Hence, a suitablecolor space representation is required which expresses the color independentof intensity or luminance. After an initial survey and experimentation, wechose the Y CbCr color representation scheme. The skin pixels lie in a smallcluster in the Y CbCr space and the RGB to Y CbCr conversion is a lineartransformation. The intensity information is contained in Y and the positionin the color space is given by the Cb and Cr values.

In order to detect pixels with skin-like color, a Bayesian likelihood ratiomethod is used. A pixel y is classified as skin if the ratio of its probability ofbeing skin to that of it not being skin is greater than some threshold whichis found using the likelihood ratio ℓ(y).

ℓ(y) =P (color|skin)

P (color|notskin)> threshold. (4)

The likelihood functions P (color|skin) and P (color|notskin) are obtained bylearning from a large number of images. Portions of the image containing skinare manually segmented to obtain a histogram. It is interesting to note that,unlike in the work of Sigal [20], there is no need to change the likelihoodfunction for skin color detection, since the pre-processing step of illuminationcorrection keeps the ratio test unchanged.

Once each pixel y is assigned a likelihood ratio ℓ(y), a histogram of ℓ(y) isobtained. The 99th percentile of ℓ(y) is chosen as the upper threshold thU . Westart from pixels having ℓ(y) > thU and form the skin colored blobs aroundthem by including pixels having ℓ(y) > thL. We select the 80th percentile ofthe histogram of ℓ(y) as the lower threshold thL. Thus, we chose pixels whichhave a very high probability of being skin and starting from them grouptogether other connected pixels having the likelihood ratio above the lowerthreshold thL to form a skin colored blob. In this process, we also make use ofthe constraint that skin colored pixels will be connected to form a skin coloredregion. It should be noted that the use of skin color detection will yield regionsnot only of the hands but also of the face and the neck. Apart from this, evenother objects like wooden objects are likely to be classified as skin. Skin colorregions are detected only in the predicted window to get better results.

The observation vector at time t is given by Zt = [t l b r]T where t, l, b,and r correspond to the top, left, bottom and the right co-ordinates of thebounding box, respectively. In measurements where more than one blob isdetected, we select the measurement corresponding to the blob which has themaximum probability of being the skin.


Frames 12, 25, 36 and 48.

Fig. 13. Results of the tracker in an uncluttered environment without applyingany color correction. All results are given in gray tone images although the originalsequence is in color.

Frames 12, 25, 36 and 48.

Fig. 14. Results of the tracker in an uncluttered environment after correcting thecolor of the corresponding sequence given in Figure 13. Standard color palettes havebeen used for training the neural network.

Frames 12, 25, 36 and 48.

Fig. 15. Results of the tracker in an uncluttered environment. Here a self-inducedskin palette has been used for training the neural network.

4.4 Tracking using color correction

To test the performance of the proposed algorithm, we capture hand ges-tures in both uncluttered and cluttered environments with a stationary cam-era. Each image sequence is of different duration with each frame being of asize 352 × 288 pixels. We present results for both standard color-palette andself-induced palm-palette (where the palm itself serves as the color referencepalette) based training. In case of standard color palette based training theperson displays the color palette in front of the camera before the initiationof the gesture. For the self-induced color correction scheme, there is no suchneed to flash the palette as the palm itself serves as the palette. For the back-propagation neural network we used 2 hidden layers, each with 5 nodes and


Frames 0, 22, 26 and 30.

Fig. 16. Results of the tracker in a cluttered environment, without applying thecolor correction.

Frames 0, 22, 26 and 30.

Fig. 17. Results obtained in a cluttered environment using the skin color palettefor subsequent color transformation.

a sigmoidal activation function. After learning the weights of the neural net-work, the color look up tables for the subsequent image frames are changedand the Condensation tracker is used to track the bounding box for the palmregion.

In case of the image sequence shown in Figure 13 the input to the tracker isa real world image sequence, without any applied color correction. We observethat the tracker gradually fails. This is due to the poor lighting conditions.The tracker, hence, falsely classifies some part of the background as skin,resulting in a larger bounding rectangle. We now train the neural networkusing the standard color palettes. Results shown in Figure 14 depict thatnow the bounding box is closer to hand for the same video. Though thebounding box is not very tightly fitting the hand region it is definitely muchbetter than the previous results obtained on the direct realworld sequence asshown in Figure 13 where there was no color correction to the frames. Due toinfeasibility of using the standard color palettes for training, now an automaticcolor correction is done by training the neural network using the self-inducedskin-palettes obtained from the initial couple of frames. The correspondingresults are shown in Figure 15. Note that here too the bounding rectanglefits well to the hand and the results are comparable to that obtained aftertraining the neural network with the standard color palettes. We observe thatan automatic color correction makes the images a bit more yellowish thanwhat was given in Figure 14, although the skin color is definitely enhanced.The reason behind this is that since in the later case we are restricting ourtraining by using only the palm region of the hand, the color correction process


(a) (b) (c) (d) (e)

Fig. 18. (a) Target MRI image. (b) and (d) low contrast source images. (c) and (e)contrast restored (b) and (d), respectively for the target contrast of (a).

tries to apply the same transformation for the whole image which disturbs thecolor of the background and also affects the average intensity as compared tothe standard color palette based training. However, this creates no problemto the tracker, as the tracker is based on the skin color only.

Figures 16 and 17 show the corresponding results for a cluttered environ-ment where a person is moving along with a shadow in the background underextremely poor lighting conditions. Note that the brightness and the contrastof the images in Figure 16 are changed for display purposes, which otherwiseare barely visible due to the poor illumination. The tracker fails within thefirst 30 frames when no color correction is applied. On training the neuralnetwork with the self induced skin palette and successive color correction offrames, the tracking error reduces substantially. This can be observed by aclosely fitting bounding box around the hand region in Figure 17. Figure 17shows that the color corrected images appear noisy with a poor quality. Thisis because of two reasons: the domination of CCD noise in extremely poorlighting conditions and the restricted training using only the palm region.Even though the quality of the transformation for the background region isquite poor, the improvement in the hand (skin) region is quite significant andit helps in tracking the palm.

5 Correction for dissimilar images

In the previous section we assumed that the source and the target images arenearly identical so that after scale adjustment between the pair of images, apixel wise correspondence can be assumed and hence the neural network canbe trained accordingly. However, if the canonical observation involves an imagewhich is very different from the source image, the above training procedureis no longer valid. When the source and the target images are of differenttypes, we segment the image into equal number of regions (clusters) and usethe correspondence between regions to learn the look-up table for contrasttransfer. The learning of the contrast transfer function becomes much easier,and hence we use a simplified network with only a single hidden layer of 10nodes [7].


We use a fuzzy c-means clustering algorithm to obtain region segmentationin images (both training and test). During training a canonical and realworldpalette are segmented into M regions. Histograms of pixels from the segmentedregions are used to train a single neural network. The following procedure isadopted to segment gray scale images into regions:

Algorithm to segment images into regions

1. Each image is divided into blocks of 15×15 pixels.

2. Following features are calculated for each block:

• Average gray level of the block.

• Entropy of the histogram of the block.

• Entropy of the co-occurrence matrix of the block.

3. A fuzzy c-means algorithm is applied to the 3D feature vectors

obtained from each block of the image to classify it into one of M

classes6.

For color images, one can use just the values of the three channels as thefeatures. For each of the clusters in the source image, we compute the corre-sponding gray level histograms. The top and bottom ten percent populationis removed from each histogram to provide robustness against noise pertur-bations. Using the average gray level of a cluster as the feature, we establishcorrespondence between clusters in source and target images for all M clus-ters. It may be noted that each cluster in an image does not necessarily haveto represent a contiguous segment. The trimmed histogram for each of thecorresponding clusters in source and target images are matched in terms ofnumber of quantization bins and then used to learn the same neural networkas discussed in the previous section. The training is performed by feedingthe input and output gray level relationship taken randomly from any of thesegments. Since the segmented regions have typically overlapping histograms,the learnt look-up table is fairly continuous, yielding visually pleasing results.The advantage of this type of color transfer is that each segment can haveits own color transfer scheme that smoothly blends across colors of differentsegments.

In the previous section we used the same type of source and target imagesfor training purposes. Now we show the results of contrast transfer whenthe canonical image is very different from that of the input image. We usecontrast improvement in MRI images as the application domain. Each imageis segmented into 5 clusters and the histograms of corresponding clusters areused for training a simplified neural network as discussed earlier.

Figure 18(a) shows a good quality MRI scan of brain and this is usedas the target contrast. Figures 18(b) and (d) show low contrast MRI im-ages. Figures 18(c) and (e) show the results of contrast enhancement withthis. One can clearly see the vertebrae in the contrast restored image shown

6 The number of clusters M is assumed to be known and is same for both sourceand target images.


(a) (b) (c)

Fig. 19. Contrast transfer in color images. (a) source image. (b) target image and(c) result of contrast transfer.

in Figure 18(c), and the internal details are clear in Figure 18(e). We alsodemonstrate the correction for dissimilar color images as shown in Figure 19.Figures 19(a) and (b) show the source and target image pair and Figure 19(c)shows the result of contrast transfer. Each RGB image is first converted tolαβ color space to minimize correlation among color channels [18]. The neuralnetwork is trained independently for the corresponding pairs of l, α and β

channels. The resultant image obtained after contrast transfer is convertedback to RGB. As can be seen from the images, the colors in the result imageexhibit more contrast and the finer details in the buildings are more visiblethan in the source.

6 Multidimensional pdf matching

This section discusses a novel multidimensional probability density function(pdf) matching technique for transforming and adapting between them thedistributions of two multispectral images. As mentioned earlier, this techniquetakes into account the correlation among N number of spectral channels.

Let X1 and X2 denote the source and the target multispectral images,respectively, made up of N spectral channels. Let X = (X1, . . . ,XN ) be amultivariate N -dimensional random variable. The b-th component Xb(b =1, . . . , N) of X represents the random variable associated with the digitalnumber of pixels in the b-th spectral band. We denote the N-dimensional pdfof the source image X1 as p1(X) and that of the target image X2 as p2(X). It isworth noting that in this technique there are no restrictions on the nature andmodel of the distributions during the pdf matching phase. Our goal is to finda transfer function that can map the function p1(X) into a new distributionthat is as much similar as possible to p2(X). For a single plane image (singledimension) this problem is much simplified [17] and is known as histogramspecification or 1-D transfer. The solution is obtained by finding a monotonemapping function T (Xb) such that:

T (Xb) = C−1

2[C1(Xb)], (5)


where C1 and C2 are the cumulative pdfs of the source and target images,respectively. Next, we include our extension to the algorithm introduced byPitie et al. [17] for N -dimensional pdf matching.

Algorithm for N-dimensional pdf transfer

1. Select X1 and X2, source and target data sets respectively. Both

have N components corresponding to the N spectral bands.

2. Pick a randomly generated N ×N rotation matrix R (see next

subsection on how this is achieved).

3. Rotate the source and target: Xr1 ← RX

(ρ)1 and Xr

2 ← RX2, where Xr1

and Xr2 are the rotated intensity values for the current rotation

matrix R; and X(ρ)1 represents the image derived from the source

after ρ rotations iteratively.

4. Find the marginal density functions p1(X(ρ)b ) and p2(Xb), where

b = 1, 2, . . . N, by projecting p1(X(ρ)) and p2(X) on each of the axes.

Thus, there will be N marginal density functions for each image,

one for each spectral band.

5. Find the 1D pdf transfer function for each pair of marginal density

functions (corresponding to each spectral band) according to

Equation 5.

6. For each pair of marginals, perform histogram matching on rotated

bands individually: p1(X(ρ)1 ) to p2(X1); p1(X

(ρ)2 ) to p2(X2); . . .

p1(X(ρ)N ) to p2(XN ). At the end, the source image is modified.

7. Rotate back the source image: X(t+1)1 ← RT X

(ρ)1 .

8. Move to the next iteration: t← t + 19. Go to Step 2 and compute a new random rotation matrix, until

convergence.

The theoretical justification of carrying out the above steps has been given in[17]. In Step 9, a convergence is reached when any further iteration will fail tochange the pdf of the modified source image in the N -dimensional space. Step2 requires that an N -dimensional rotation matrix R be generated randomly ateach iteration. The algorithm to generate such a rotation matrix is discussednext.

6.1 Generation of N-dimensional rotation matrix

The generalized approach described in [1] for performing general rotations ina multidimensional Euclidean space is used to obtain N ×N rotation matrixin our case.

An N × N rotation matrix has degrees of freedom equal to all possiblecombinations of 2 chosen from total of N , i.e NC2. As an example, for N = 6,there are 15 degrees of freedom and hence one needs 15 rotation angles (alsoknown an Euler angles) to represent a 6-D rotation. The overall rotation ma-trix R that depends only on NC2 independent angular values is obtainedby sequentially multiplying each of the corresponding rotation matrices Ri,


i = 1, 2, . . . ,N C2. In order to guarantee that each of these NC2 rotation anglesis chosen as independent and uniformly distributed in [0, 2π], we propose toadopt the following procedure of generation of the random matrix:

Algorithm for generating a random rotation matrix R

1. Pick NC2 angles θ1, θ2, . . . , θN C2randomly in the uniformly

distributed interval of angles [0, 2π].

2. Generate NC2 matrices R1, R2, . . . , RN C2of size N ×N by considering

one angle at a time, each describing a rotation about an

(N-2)-dimensional hyperplane. As an example, for N=6, the matrices

will be constructed as follows:

R1 =

cosθ1 sinθ1 0 0 0 0−sinθ1 cosθ1 0 0 0 0

0 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1

, R2 =

1 0 0 0 0 00 cosθ2 sinθ2 0 0 00 −sinθ2 cosθ2 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1

, . . . ,

R15 =

cosθ15 0 0 0 0 sinθ15

0 1 0 0 0 00 0 1 0 0 00 0 0 1 0 00 0 0 0 1 0

−sinθ15 0 0 0 0 cosθ15

. (6)

3. Generate the final rotation matrix R of size N ×N as the product

of all above NC2 matrices7:

R = R1 ·R2 · . . . ·RN C2. (7)

For a special case, N = 3, the rotation matrix R can be constructed bymultiplying three separate rotation matrices as:

R = R1 · R2 · R3 (8)

where

R1 =

cosθ1 sinθ1 0−sinθ1 cosθ1 0

0 0 1

, R2 =

1 0 00 cosθ2 sinθ2

0 −sinθ2 cosθ2

, and

7 Matrix multiplications in Equation 7 do not commute. A change in the order ofthe hyperplanes about which the Euler rotations are carried out will result in atotally different rotation matrix R. But this has no bearing to our application, aswe are interested only in generating a set of random rotation matrices.


(a) (b) (c)

Fig. 20. (a) Target image (same as Figure 6(c), included here for side-by-side visualcomparison). (b) and (c) color corrected Figures 6(a) and (b), respectively.

R3 =

cosθ3 0 sinθ3

0 1 0−sinθ3 0 cosθ3

. (9)

Here, the 3 Euler angles θ1, θ2 and θ3 should be generated randomly suchthat θi ∈ [0, 2π], i = 1, 2, 3.

6.2 Pdf matching with similar source and target images

(a) (b)

Fig. 21. 3-D Scatter plots for the living room dataset. Labeling convention inthe scatter plots remains the same as followed in Figure 7. (a) scatter plots forFigure 6(a) and Figures 20(a) and (b). (b) Scatter plots for Figure 6(b) and Fig-ures 20(a) and (c).

Figure 20 shows test on the living room dataset. Two color images ofsimilar contents acquired under two different poor illumination conditions,shown previously in Figures 6(a) and (b). Figure 20(a) is the target image.The target image is same as the one shown previously in Figure 6(c) but isincluded here again to facilitate side-by-side visual comparison. Figures 20(b)and (c) show color corrected Figures 6(a) and (b), respectively. Figure 21


(a) (b) (c)

Fig. 22. (a) Source, (b) target and (c) Color correction with ND method.

shows 3-D scatter plots for the set of results shown here. Labeling conventionin the scatter plots remains the same as followed in Figure 7. Notice the colorsin Figures 20(b) and (c), they are almost similar to those in Figure 20(a).

For the ND method it was observed that for the presented dataset theresults converged after around 25 iterations. The proper convergence of thedistributions may be observed by the contour plots in the sub-spaces of theglobal feature domain. It is also worth noting that after de-rotation of theimage, it is expected that the result should lie in the same range as the orig-inal image. However during the transformation certain spurious values maybe observed due to numerical inaccuracies accruing from Equation 5. To takecare of these spurious values we normalized the values accordingly. The nor-malization logic used was a simple linear mapping of the range of the pixelintensities in the result to that in the target image in each channel separately.This may introduce slight distortions for pixels having values at the limitsof the dynamic range of the images. As seen clearly in the results, the NDmethod effectively corrects the problem due to poor illumination.

6.3 Pdf matching with dissimilar source and target image types

Figures 22(a) and (b) show the source and the target image pair which aredissimilar. It is very important in the ND method that the target is wellchosen as the result takes up the color scheme of the target. As illustratedin Figure 22, the result shows more orangish pixels than there actually are.This can be attributed to the contents of the target image. In the processof matching the shapes of the histograms, the overall occurrence of variouscolors in the result becomes similar to that in the target. In this case again,we required around 25 iterations for convergence.

Another set of results is shown in Figure 23. Here, Figures 23(a) and (b)show the source and the target images for a carefully chosen pair of dissimilarimages. The quality of color correction is obvious in Figure 23(c). It may benoticed that the yellow fruit at the base of the flower-vase has now picked upa very natural hue. The convergence is achieved in relatively lesser iterations;15, in this particular case. Figures 24(a), (b), and (c) show contour plotsin GB subspace for source, target and corrected images, respectively. Note


(a) (b) (c)

Fig. 23. Color correction in synthetically color distorted images of paintings. (a)Source image (synthetically color distorted, same as Figure 8(a)). (b) A well chosendissimilar target image (notice the contents of the target image are mostly similarto (a)). (c) color corrected (a).

(a) (b) (c)

Fig. 24. Contour plots in GB subspace for (a) source, (b) target and (c) result ofND method. Resultant distribution in GB subspace is more similar to the targetdistribution.

the resultant distribution in GB subspace, it is more similar to the targetdistribution.

7 Conclusions

We explained two different techniques for color correction. The first uses aneural network to learn mapping between colors in a source image and sometarget image. This requires that training image pairs have pixel-to-pixel cor-respondence. To provide a qualitative comparison we show results for imagesin a typical indoor environment, for paintings and for satellite images. Skincolor based hand tracking is our principal machine vision-based application


for which the color corrected images bring an improved performance. An im-proved version of the first technique rules out the requirement of registeredtraining image pairs as it uses a cluster-based strategy to match regions be-tween a source and target image pair. For this, results of experiments on con-trast restoration in both MRI images as well as colored images are shown. Thesecond technique that provides a further improvement over the first, does thework remarkably well by not only correcting colors but also making sure thatthe statistical properties in images are maintained even after color correction.

References

1. A. Aguilera and R. Perez-Aguila. General n-Dimensional Rotations. In Interna-tional Conference in Central Europe on Computer Graphics, Visualization andComputer Vision, Czech Republic, February 2004.

2. K. Barnard, V. Cardei, and B. Funt. A Comparison of Computational ColorConstancy Algorithms- Part I: Methodology and Experiments With SynthesizedData. IEEE Transactions on Image Processing, 11(9):972–984, September 2002.

3. K. Barnard, L. Martin, A. Coath, and B. Funt. A Comparison of ComputationalColor Constancy Algorithms– Part II: Experiments With Image Data. IEEETransactions on Image Processing, 11(9):985–996, September 2002.

4. B. Bascle, O. Bernier, and V. Lemaire. Illumination-invariant Color ImageCorrection. In International Workshop on Intelligent Computing in PatternAnalysis/Synthesis, Xi’an, China, August 2006.

5. V. Cardei. A Neural Network approach to Color Constancy. PhD thesis, SimonFaser University, Burnaby, BC, Canada, 2000.

6. B. Funt, K. Barnard, and L. Martin. Is Machine Color Constancy Good Enough?In 5th European Conference on Computer Vision, pages 445–459, 1998.

7. A. Galinde and S. Chaudhuri. A Cluster-based Target-driven Illumination Cor-rection Scheme for Aerial Images. In IEEE National Conference on Image Pro-cessing, Bangalore, India, March 2005.

8. G. Greenfield and D. House. A Palette-Driven Approach to Image Color Trans-fer. In Computational Aesthetics in Graphics, Visualization and Imaging, pages91–99, 2005.

9. M. Isard and A. Blake. Condensation - Conditional Density Propagation forVisual Tracking. International Journal of Computer Vision, 28(1):5–28, 1998.

10. R. Kjeldsen and J. Kender. Finding Skin in Color Images. In Proceedings of theSecond International Conference on Automatic Face and Gesture Recognition,pages 312–317, 1996.

11. M. Kocheisen, U. Muller, and G. Troster. A Neural Network for Grey Level andColor Correction used in Photofinishing. In IEEE International Conference onNeural Networks, pages 2166–2171, Washington, DC, June 1996.

12. H. Lee and D. Han. Implementation of Real Time Color Gamut Mapping Us-ing Neural Networks. In IEEE Mid-Summer Workshop on Soft Computing inIndustrial Applications, pages 138–141, Espoo, Finland, June 2005.

13. J. P. Mammen, S. Chaudhuri, and T. Agrawal. Hierarchical Recognition ofDynamic Hand Gestures for Telerobotic Application. IETE Journal of Researchspecial issue on Visual Media Processing, 48(3&4):49–61, May-August 2002.


14. J. B. Martinkauppi, M. N. Soriano, and M. H. Laaksonen. Behaviour of SkinColor under Varying Illumination seen by Different Cameras at Different ColorSpaces. In M. A. Hunt, editor, Proceedings SPIE, Machine Vision in IndustrialInspection IX, volume 4301, pages 102–113, San Jose, California, 2001.

15. A. Nayak and S. Chaudhuri. Automatic Illumination Correction for Scene En-hancement and Object Tracking. Image and Vision Computing, 24(9):949–959,September 2006.

16. B. Pham and G. Pringle. Color Correction for an Image Sequence. IEEEComputer Graphics and Applications, 15(3):38–42, 1995.

17. F. Pitie, A. Kokaram, and R. Dahyot. N-dimensional Probability FunctionTransfer and its Application to Colour Transfer. In IEEE International Conf.on Computer Vision, volume 2, pages 1434–1439, 2005.

18. E. Reinhard, M. Ashikhmin, B. Gooch, and P. Shirley. Color Transfer betweenImages. IEEE Computer Graphics and Applications, 21(5):34–41, September2001.

19. D. Ruderman, T. Cronin, and C. Chiao. Statistics of Cone Responses to NaturalImages: Implications for Visual Coding. Journal of Optical Society of America,15(8):2036–2045, August 1998.

20. L. Sigal, S. Sclaroff, and V. Athitsos. Estimation and Prediction of Evolv-ing Color Distributions for Skin Segmentation Under Varying Illumination. InProceedings IEEE Conference on Computer Vision and Pattern Recognition,volume 2, pages 152–159, June 2000.

21. C. Wang and Y. Huang. A Novel Color Transfer Algorithm for Image Sequences.Journal of Information Science and Engineering, 20(6):1039–1056, November2004.

22. J. Yang, W. Lu, and A. Waibel. Skin-Color Modeling and Adaptation. CMU-CS-97-146, May 1997.

23. J. Yin and J. Cooperstock. Color Correction Methods with Applications toDigital Projection Environments. Journal of the Winter School of ComputerGraphics, 12(3):499–506, February 2004.

24. M. Zhang and N. Georganas. Fast Color Correction using Principal RegionsMapping in Different Color Spaces. Journal of Real-Time Imaging, 10(1), 2004.

Documents

Color Transfer and its Applicationssc/papers/book-chapter07.pdf · 2007-02-04 · Color Transfer and its Applications Arvind Nayak1, Subhasis Chaudhuri2, and Shilpa Inamdar2 1 ERP