Multiple color space channel fusion for skin detection

Multimed Tools ApplDOI 10.1007/s11042-013-1443-7

Multiple color space channel fusion for skin detection

Rehanullah Khan · Allan Hanbury · Julian Stöttinger ·Farman Ali Khan · Amjad Ullah Khattak · Amjad Ali

© Springer Science+Business Media New York 2013

Abstract Skin detection is used in applications ranging from face detection, trackingbody parts and hand gesture analysis, to retrieval and blocking of objectionablecontent. We investigate color based skin detection. We linearly merge different colorspace channels representing it as a fusion process. The aim of fusing different colorspace channels is to achieve invariance against varying imaging and illuminationconditions. The non-perfect correlation between the color spaces is exploited bylearning weights based on an optimization for a particular color space channelusing the mathematical financial model of Markowitz. The weight learning processdevelops a color weighted model using positive training data only. Experimentson a database of 8991 images with annotated pixel-level ground truth show that

R. Khan (B) · A. AliSarhad University of Science and IT, Peshawar, Pakistane-mail: [email protected]

A. Alie-mail: [email protected]

A. HanburyInstitute for Software Technology & Interactive Systems, TU-Wien, Vienna, Austriae-mail: [email protected]

J. StöttingerDepartment of Information Engineering and Computer Science,University of Trento, Trento, Italye-mail: [email protected]

F. A. KhanCOMSATS Institute of Information Technology, Attock Campus, Attock, Pakistane-mail: [email protected]

A. U. KhattakUET, Peshawar, Pakistane-mail: [email protected]

Multimed Tools Appl

the fusion of color space channels approach is well suited to stable and robustskin detection. In terms of precision and recall, the fusion approach provides acompetitive performance to other state-of-the-art approaches which require negativeand positive training data with the exception of the decision tree based classifier(J48). As a real-time application, we show that the weight based color channel fusionapproach can be used for learning of weights for skin detection based on detectedfaces in image sequences.

Keywords Skin detection · Color space fusion · Markowitz model

1 Introduction

Skin detection is a popular and useful technique for the detection and tracking of hu-man body parts in images and video sequences [13–15]. Its applicability includes butnot limited to; detecting and tracking faces, naked people detection, hand trackingand human retrieval. Skin detection has also been shown to be useful for blockingobjectionable multimedia content [16, 25]. Color based skin detection is mostlyfavored because of its high processing speed and invariance against rotation, partialocclusion and pose change. However, standard skin color detection techniques aregreatly affected by changing lighting conditions, background colors and objectshaving skin-like colors. In color based skin detection domain, decision rules areconstructed that differentiates between skin and non-skin pixels given only a colortriplet as an input. According to [11], the major difficulties in skin color detectionare caused by factors such as varying illumination, camera characteristics, ethnicity,individual properties and other factors e.g. makeup, hairstyle, glasses, sweat, andbackground colors. A robust skin detection approach has therefore to be stableagainst these artifacts.

The contribution of this paper is an approach for the selection of weights fordifferent color space channels in order to achieve robust skin detection using onlypositive examples for training. This is a one class classification problem. Based ona prior approach for color space color segmentation [22, 23], we use the Markowitzmodel [18] of efficient portfolio selection for determining weights in a linear combi-nation of color channels in order to effectively merge them. The Markowitz modelis generic, utilizing the non-perfect correlation between portfolios in business anddata features in general. We model the aggregated data as a Gaussian process forthe constitution of a classifier for skin detection. By merging different color spacechannels we exploit the non-perfect correlation between color space channels. Theresult is a stable skin detection in varying viewing conditions, with a good balancebetween color space invariance and discriminative power. The classification principlecan be used for other generic color based object recognition in image processing. Assuch, the work is a continuation of our previous work [12, 14, 15] in color based skindetection area.

Experiments on a database of 8991 images with annotated pixel-level ground truthshow that in terms of precision and recall, the fusion of color space channels approachis well suited to stable and robust skin detection, providing competitive performanceto other state-of-the-art approaches which require negative and positive trainingdata with the exception of the decision tree based classifier (J48). The fusion ofcolor channels approach is recommended for real-time skin detection as well. Since

Multimed Tools Appl

training is based on positive samples only and in real-time, the approach thereforecan be used for on-line learning of skin model based on detected faces in imagesequences.

As such, we summarize the contribution of our work as follows:

1. Application of the Markowitz color space selection principle [23] for color basedskin detection scenario.

2. Verification of the historic quantitative experiments regarding the choice of colorspaces [11] and color channels in skin detection.

3. Outperforming the state-of-the-art skin detection approaches with a one classlinear classifier.

4. Providing basis for contextual real-time skin detection.

Rest of the paper is organized as follows: In Section 2, we summarize the relatedwork regarding skin detection and usage of color spaces. Section 3 discusses theMarkowitz efficient portfolio selection (MPT). In Section 4, we apply the MPTtheory to color based skin detection. Experimental detail is given in Section 5.Section 6 concludes.

2 Related work

In computer vision, skin detection is mostly used in the pre-processing steps for facedetection [9] and gesture tracking systems [1]. It has also been used in the applicationdomain of naked people detection [5, 17] and for filtering objectionable content [25].

Generally, the skin detection approaches are grouped into three types of skinmodeling: Parametric, non-parametric and explicit skin cluster definition methods.The parametric approaches use a Gaussian color distribution [29]. Non-parametricmethods estimate the color distribution from the histogram constructed using thetraining data [10]. Since color histograms are stable object representations unaffectedby occlusion, changes in the view and differentiate a large number of objects [28],they can be used to create reliable skin classifiers, if the training dataset is sufficientlylarge [10].

The widely used method for skin detection is the creation of static skin filters(color space thresholding). The static skin filters are used by [20] and explicitlydefines the boundaries of the skin clusters in a given color space. Static filters used inYCbCr and RGB color domains are reported in [4] and [19]. The main drawback ofstatic filters is the increased false detections [11]. Khan et al. [16] addressed the falsedetections using multiple model approach. The multiple model approach makes itpossible to filter out skin for multiple people with different skin tones, reducing falsepositives.

The choice of a color space is important for many computer vision algorithmsbecause it induces the equivalence classes of the detection algorithms [22, 23]. Colorspaces such as the HS* family transforms the RGB color cube into a cylindricalcoordinate representation. They are widely used in skin detection scenarios [2, 6, 7].Perceptually normalized color spaces, for example, the CIELAB and CIELUV areused for skin detection in [3]. Color spaces like YCbCr, YCgCr, YIQ, YUV and YESform independent components. YCbCr is one of the most successful color spaces forskin detection [9, 27].

Multimed Tools Appl

A Markowitz model [18] based weighting scheme has been proposed in [22, 23]for color space channel selection and fusion based on the positive samples only. Weextend [23], where 12 color channels were used to learn a model from a single imageand the detection was shown on another image lacking in skin decision threshold. Weconcentrate on post fusion classification rules for flagging a pixel as skin or non-skinby learning a model on representative skin samples. From the linearly fused datawe propose an indicator image which is used in a decision threshold for flagging apixel as skin or non-skin. For decision threshold we use the model obtained fromthe training data. For performance tuning we introduce external parameter K. Intotal seven color spaces (19 color channels) are weighted for their role in the skindetection scenario opting for a complete skin detection system where performancecan be tuned for applications. Our extensive evaluation of the proposed approachdemonstrates its usefulness in skin detection. We also show that such a positivetraining weight based skin detection can be used for real-time skin detection, learninga model from detected faces.

Skin detection in varying illumination in videos is addressed in [21, 24, 28], bymapping the illuminance of the image into a common range. They opt for compen-sation for the variance of varying lighting to equalize the appearance of skin colorthroughout scenes. The methods rely on the lighting correction techniques and theirability to estimate the illuminant source. For robust skin detection under varyingillumination, we combine different color space channels using the Markowitz model,thereby achieving invariance against varying illumination. We augment the differentproperties offered by different color space channels for robust skin detection undervarying illumination circumstances.

3 The Markowitz model

The Markowitz modern portfolio theory (MPT) is related to investment which triesto maximize return and minimize risk with the proportionate selection of differentassets based on the mathematical formulation of the concept of diversification ininvestment. The aim of proportionate selective investment collection is to have lowerrisk than the individual assets. The MPT models the return of assets as normallydistributed random variable. The risk is modeled as the standard deviation and theportfolio as the weighted combination of assets. The return portfolio is the weightedcombination of the assets returns. With the combination of different assets whosereturns are not correlated, the objective of MPT is reducing the total variance ofthe portfolio. In business, the assets in an investment portfolio cannot be selectedindividually, each on their on merits. It is important to consider how each assetchanges price relative to how every other asset changes price in the portfolio. Sinceinvestment is a trade off between risk and return, the assets with higher return areriskier. For the amount of risk, MPT describes how to select a portfolio with thehighest possible return.

For the mathematical formulation of MPT, given sets of observations of the samequantity expressed in the same unit but their method of obtaining is different, withthe only knowledge that probability distribution of the observations is a unimodalfunction. In order to combine the output of these methods for obtaining the mostaccurate measurement of the process, the observations can be represented as [23]:

u = μu ± σu (1)

Multimed Tools Appl

where μ is the mean value and σ is the fluctuation of the quantity u. With themathematical model for efficient portfolio selection of Markowitz [18], in general,N different observations can be fused using the following weighting scheme:

μ =N∑

i=1

xiμi (2)

where μi is the average return value of a particular method i and xi is the weightassigned to that method. μ is the total return for all the quantities involved. Forweight optimization, the constraints imposed are:

N∑

i=1

xi = 1 (3)

−1 ≤ xi ≤ 1, and i = 1, . . . , N. (4)

The Markowitz model finds the set of portfolios that provides minimum risk forall the possible returns. In general, the Markowitz model involves maximizing theexpected return or minimizing the variance. According to (2), the expected estimateof quantity μ from a large set of N quantities involved is the weighted sum of theexpected estimate of the individual quantities involved. For computing the varianceof the whole set of quantities, we need the covariances between the individualquantities as well. The variance for several combined observations is

V =N∑

i=1

x2i vi +

N∑

i=1

N∑

j=1

xix jqij, and i �= j (5)

where vi is the variance of an individual quantity, qij is the covariance betweenthe quantities i and j and qij = σiσ jρij. The term ρij is the correlation between thequantities involved and σ is the standard deviation. The Markowitz model involvesminimizing the following;

σ = √V (6)

It is governed by the constraints given in (3) and (4) for a given expected estimate ormaximizing the expected estimate of a given standard deviation σ . Constraint (3) isfor full allocation of the resources. The search space for the solution of (6) is limitedby the constraint (4). The objective function in (6) is quadratic with linear constraints,solved by linear programming.

A Markowitz model weights different observations taking into account the non-perfect correlation between the observations involved and the individual perfor-mance. For different quantities involved, the efficient frontier (see Fig. 2) obtainedwith the Markowitz model provides us with mean-standard deviation pairs corre-sponding to different weighting pairs. At every point on the efficient frontier is theset of weights corresponding to the total number of quantities involved. The weightson the optimal frontier give the optimal combination of the quantities involved. The

Multimed Tools Appl

selection of the final set of weights from the infinite set of weights from the efficientfrontier given by the Markowitz model is generally problem dependent. The optimalweights are those that have the highest signal to noise ratio or those providing thehighest ratio between return and risk. The optimal set of weights known as therisky weights can be obtained from the optimal frontier by maximizing the objectivefunction W,

W = μ

σ(7)

where μ represents the mean and σ is the standard deviation for a particular set ofweights generated.

4 Markowitz model for color

In this section, we apply the Markowitz model to color based skin detection andexplain the fusion operation.

4.1 Color and MPT

For color based detection in different imaging conditions, the choice of a colorspace is essential, thereby inducing equivalence classes for the detection algorithm[22, 23]. As every color space has its merits and demerits for viewing conditions,we aim to exploit fusion based relative integrative correlation of color spaces bythe Markowitz model. For color based feature observations related to a particularobject in different color spaces, the data thus combined should represent the samequantity with boosting and diminishing behavior due to the inherent properties ofcolor spaces. For the skin detection scenario, in (1), μ is the average value of positiveskin samples and σ is the standard deviation for that set of samples in a particularcolor space channel. Regarding color based features for a training/testing sample,the return values in (2) are the pixel values in a particular color space channel. Forcolor based training/testing scenario, it is the aggregated value of different colorspace channels. For skin detection in images where the boundary of human skin coloris defined, minimizing the standard deviation in (6) will increase the concentrationtowards the trained color of skin samples in a color space.

The performance of the color feature detectors is based on repeatability and thediscriminative power. Repeatability is concerned with the invariant behavior underuncontrolled viewing conditions, such as varying illumination, shading and highlights.However, there is a trade-off between repeatability and distinctiveness. For theskin detection task which is subject to different viewing conditions, the selection ofcolor models invariant to varying illumination should provide discriminative powerfor skin segmentation algorithm. Therefore, to weight color channels for a properbalance between color invariance and discriminative power, the skin clustering spaceand correlation between the color channels has to be taken into account. For the skindetection problem where different color space channels are the quantities involved,the efficient frontier obtained with the Markowitz model provides us with mean-standard deviation pairs, representing different weightings for different color spacechannels.

Multimed Tools Appl

4.2 Augmentation of color channels

For the training part of the fusion of multiple color channels, the positive trainingdata is transformed into the following 19 color channels. Normalized red (nr) =R/(R + G + B) and normalized green (ng) = G/(R + G + B), where R, G and Bare the red, green and blue color channels of the RGB color space. The opponentcolor channels red-green (RG) = R − G and yellow-blue (Y B) = (2B − R + G)/4.The CIE L∗a∗b ∗ is defined as

L∗ = 116(

YYn

)1/3 − 16, (8)

a∗ = 500

[(XXn

)1/3 −(

YYn

)1/3]

(9)

b ∗ = 200

[(YYn

)1/3 −(

ZZn

)1/3]

(10)

where XnYn Zn is the white point reference. In case of YCbCr, the following valuesapply:

Y = (0.299 ∗ (R − G)) + G + (0.114 ∗ (B − G))

Cb = (0.564 ∗ (B − Y)) + 128

Cr = (0.713 ∗ (R − Y)) + 128

The hue color channel of HSV color is represented in degrees. The saturation S isobtained using S = V − min(R, G, B), where V is defined as V = max(R, G, B). Theimproved Hue, Luminance and Saturation (iHLS) color space is introduced in [8].The iHLS model is improved with respect to the similar color spaces (HLS, HSI,HSV, etc.) by removing the normalization of the saturation by the brightness. Thecolor channels of iHLS are represented as iH (represented in degrees) for hue, iSfor saturation and iY for intensity. iH is defined as the trigonometric angle, iS asiS = max(R, G, B) − min(R, G, B) and iY as iY = 0.2125R + 0.7154G + 0.0721B.

These color models are commonly used in color image precessing. They containboth variant and invariant properties with reference to imaging conditions. RGB,CIE L∗ and SV are sensitive to shadows, shading, illumination and highlights andthe nr, ng and CIE a∗b ∗ are invariant to shadows, shading and illumination intensity[22, 23]. The opponent color components RG and Y B are invariant to highlights,assuming a white light source [23]. The transformation simplicity and explicit sepa-ration of luminance and chrominance components makes YCbCr attractive for skincolor modeling [26]. The unnormalized saturation of iHLS gives a better distributionin the color space.

4.3 Color fusion algorithm

In this section, we present the training and testing steps for the proposed approach.

Multimed Tools Appl

4.3.1 Training

For training on skin samples, the following steps are preformed:

1. Converting the training (images) data to 19 color channels. For all the colorchannels of the training samples, mean for every color channel is calculated.

2. Similarly, the standard deviation for every color channel is computed.3. Using the standard deviation and the correlation, the covariance between the

color channels is calculated.4. Using the values of the above steps, the optimal weight (w) for each of the color

channel is calculated using the Markowitz model.5. Using weights from Step 4, every channel is multiplied with its corresponding

weight and then all the channels are added (fusion). From this fused data, theexpected value (mean) Etrain and the standard deviation σtrain is calculated. TheEtrain and σtrain are to be used for testing.

4.3.2 Testing

For the test image for skin detection:

1. Converting the test image to the 19 color channels.2. The appropriate weight w obtained for each of the 19 color channels from the

training phase (Step 4) is multiplied with the corresponding color channel valueper pixel. This constitutes weighted color channels of the test image.

3. All the 19 weighted color channels of the test image are then added perpixel resulting in a gray valued image. Let gi(x, y) represents the test imageconverted to the color channel i. If wi is the weight assigned from the Markowitzmodel to the corresponding color channel, the final gray valued weighted imageG(x, y) is:

G(x, y) =N∑

i=1

wigi(x, y) (11)

where G(x, y) is the gray valued image and (x, y) are the image coordinates. Nrepresents the total number of color channels.

4. From the gray valued image G(x, y), the indicator image I is obtained by:

I(x, y) = |G(x, y) − Etrain| (12)

where (x, y) are the image coordinates. The Etrain is the statistical expected valueof the training samples obtained from the training Step 5.

5. The pixel values in the image are labeled as skin or non-skin from the indicatorimage according to the following decision rule:

b(x, y) ={

b(1) i f I < (σtrain + K)

b(0) Otherwise(13)

where K is a tuning parameter, σtrain is the standard deviation of the training dataobtained from training Step 5 and I is the indicator image. b(x, y) is the binaryimage representing skin non-skin with pixel values set to 1 for skin and 0 for non-skin. The skin detection performance (precision and recall) is controlled through theparameter K. A value of K greater than 0 expands the skin decision boundary. A

Multimed Tools Appl

value of K less than 0 produces tighter bounds for skin decision. The optimal valueof K can be experimentally determined, as shown in Section 5.5.

5 Experiments

In this section, the experimental evaluation is presented using an on-line availabledataset.

5.1 Dataset

We use images extracted from 25 videos provided by an Internet service provider.The data set is available on-line.1 For reference, see Fig. 1. The images extracted fromthese 25 videos is arranged into three sets. The first set contains 5817 images, everyimage of which contains some skin (referred to as skin-only images). The second setcontains 3174 images, all the images of which are without skin (referred to as non-skin images). The third set is the union of skin-only and non-skin sets having 8991images (referred to as hybrid images).

5.2 Classification comparison

The fusion of color space channels technique is compared to classifiers which requirenegative and positive training data, the histogram approach of Jones and Rehg [10]and an approach of [14] represented as US in the current evaluation. The classifiersset includes AdaBoost with decision stump, Bayesian network (BayeNet), NaiveBayes (NaiveBayes), RBF network (RBF) and J48 for skin-only, non-skin and hybridimages.

For the classifiers (Adaboost, BayeNet, NaiveBayes, RBF and J48), the 19 colorchannels discussed in Section 4.2 are used as the feature vectors. The 19 color spacechannels are used in conjunction for the same quantity to justify the performancecomparison with the color space fusion technique which also uses 19 color channels.A representative set of 118 images is used for training the classifiers. Each of the 118images contain both skin and non-skin pixels. For an approach of [14] (US), the same118 images are used for constructing the model.

5.3 Mean-standard deviation space and weights

The optimal frontier calculated from the training set is illustrated in Fig. 2 andthe weights assigned to 19 color channels according to amount of training data aregiven in Table 1. Figure 2 shows that both Cb and Cr have low standard deviationcompared to other color channels and therefore assigned higher weights in Table 1.This corresponds to the literature about the successful skin detection using Cb and Cr[9, 27]. We also demonstrate that these two color channels have the major effect onskin detection performance and thus high positive weights. Therefore the proposedapproach supports the empirical results that Cb and Cr are the preferred color

1http://disi.unitn.it/~stottinger/Data-sets.html

http://disi.unitn.it/~stottinger/Data-sets.html

Multimed Tools Appl

Fig. 1 Example frames from the annotated video data-set. (Source: [14])

channels for skin detection. The R and G channels of RGB color space have highstandard deviation in the mean-standard deviation space and are highly correlated.Therefore the weights assigned to these color channels are close to zero and theiroverall effect is reduced (last row of Table 1). This diminishes especially the role ofR channel for skin detection. The nr channel gets a positive weight of 0.35. Howeverthe effect of ng channel is reduced by giving it a lower weight. The effect of RGchannel of an opponent color space is boosted by assigning it a higher negative weightcompared to Y B channel. The B, a∗, b ∗ play their role in the aggregate data withtheir negative weights. In Fig. 2, H, S, V of HSV and iH, iS, iY of iHLS reside veryclose to each other in the Mean-Standard deviation space. Therefore in Table 1, theirweights are almost identical. Because S and iS have almost perfect correlation, theyare placed at exactly the same place in Fig. 2. Their weights are therefore identicali.e. 0.02.

5.4 Incremental training and weights

The size of the training data for the fusion technique is the same set of 118 imageswhich are used to train the classifiers but here only the skin pixels are used fortraining as the technique requires positive samples only. The total number of positiveskin pixels is 2113703. Table 1 shows weights obtained for a particular size of data.These weights correspond to the amount and type of training data. The training data

Fig. 2 Mean-Standard deviation space and the corresponding placement of 19 color channels in thisspace. The curved blue line is the efficient frontier. The efficient frontier corresponds to differentweighting pairs. The weights on the efficient frontier give the optimal combination of the quantities(color channels) involved. An optimal set of weights is one which gives high signal-to-noise ratio.Right image shows the zoomed positions of the corresponding color channels in this mean-standarddeviation space for easy visualization

Multimed Tools Appl

Tab

le1

Wei

ghts

obta

ined

for

diff

eren

tcol

orsp

ace

chan

nels

corr

espo

ndin

gto

diff

eren

tsiz

esin

num

ber

ofpi

xels

ofth

etr

aini

ngsa

mpl

e

RG

Bnr

ngR

GY

BL

a∗b

∗H

SV

YC

bC

riH

iSiY

Tra

inin

gsi

ze

0.00

0.04

−0.0

60.

350.

30−0

.66

−0.0

90.

09−0

.25

−0.4

40.

000.

01−0

.01

0.03

0.68

0.99

0.00

0.01

0.02

3210

000.

010.

07−0

.09

0.33

0.27

−0.6

1−0

.13

0.06

−0.3

2−0

.47

0.00

0.03

−0.0

20.

050.

751.

000.

000.

030.

0364

2000

0.01

0.06

−0.0

80.

400.

33−0

.71

−0.1

30.

10−0

.30

−0.5

50.

000.

02−0

.02

0.05

0.76

1.00

0.00

0.02

0.03

9630

000.

020.

05−0

.07

0.38

0.30

−0.7

4−0

.13

0.10

−0.3

1−0

.50

0.00

0.03

−0.0

10.

040.

791.

000.

000.

030.

0312

8400

00.

020.

05−0

.07

0.40

0.30

−0.7

5−0

.14

0.12

−0.3

1−0

.51

0.00

0.03

−0.0

10.

040.

781.

000.

000.

030.

0316

0500

00.

010.

08−0

.11

0.35

0.08

−0.9

8−0

.19

0.16

−0.2

6−0

.29

0.02

0.02

−0.0

10.

061.

001.

000.

010.

020.

0421

1370

3

The

trai

ning

size

isth

enu

mbe

rof

pixe

lsof

the

trai

ning

data

show

nin

crem

enta

lly.T

hew

eigh

tssh

own

inth

ela

stro

war

eus

edfo

rth

eex

peri

men

tsan

dth

esk

inde

tect

ion

exam

ples

Multimed Tools Appl

has an effect on the weight distribution. The last row of the Table 1 shows stableweights for representative training data selected. Since skin covers a well definedboundary in a color spaces, we argue that further increase in training data will havea negligible effect on the weights distribution. On the other hand if a completely newtraining data is introduced which covers skin samples in different lighting conditions,the weights distribution will be affected. In our experimental setup, the weights inthe last row of Table 1 are used for experiments and for skin detection examples.

5.5 Performance parameter

Etrain used in (12), obtained from the training data (Step 5) is 271.93 and σtrain usedin (13) is 18.41. The skin detection performance (precision and recall) is controlledthrough the external parameter K in (13). As shown in Fig. 3, starting from thenegative value of K, the performance increases. The maximum F-Score is achievedusing K = 2. Any increase beyond K = 2 decreases performance. For optimal valueof K the skin-only set was used. All the experimental evaluation and skin detectionexamples are based on this value of K.

5.6 Skin detection in images

Figure 4 shows successful skin detection examples using weights from last row ofTable 1. The first column of Fig. 4 shows original images, the second column showsthe indicator images. For easy visualization the indicator images are shown in reverse.As can be seen the skin portion is prominent in these indicator images compared toother regions. These prominent areas represented are more closer to Etrain and moreprobable to be classified as skin. The third column shows the skin detected imagesbased on thresholding in (13). Figure 5 reports cases where skin detection fails or

Fig. 3 Skin detection performance is controlled through parameter K. Starting from the negativevalue of K, the performance increases. The maximum precision and recall is achieved using K = 2and thus higher F-Score. Any increase beyond K = 2 decreases performance. K = 2 is used for allthe comparative experiments

Multimed Tools Appl

Fig. 4 Skin detection based on weighting of color space channels. First column: original images.Second column: the indicator images. For easy visualization, the indicator images are shown inreverse. Third column: skin detected images (black shows non-skin)

non-skin pixels are reported as skin pixels. The fusion technique is trained only onpositive images and therefore the non-skin pixels are detected as skin pixels in theseimages.

5.7 Performance evaluation

The fusion of color space channels approach is compared to the histogram approachof Jones and Rehg [10], US [14], AdaBoost, BayeNet, NaiveBayes, RBF and J48.The evaluation is based on F-measure and Specif icity for the skin-only images andhybrid images. For non-skin images the F-measure is not defined as the precision iszero or undefined and recall is undefined (because every detection is a false positive)and therefore, we use specificity as the only evaluation measure for this set. The

Multimed Tools Appl

Fig. 5 Skin detection scenarios where either skin is not properly detected or non-skin pixels arereported as skin

F-measure is calculated by evenly weighting precision and recall. The specificity isdefined as the true negative rate.

First, we evaluate the eight approaches on 5817 skin-only images. Figure 6 showsF-measure and specificity for all the eight approaches on the skin-only set. Figure 6shows values calculated on per pixel basis. For per pixel basis, we first calculate truepositives, false positives, true negatives and false negatives for the whole dataset andthen apply F-measure and specificity formula. In Fig. 6, high F-measure is reportedfor the fusion technique compared to AdaBoost, JR, BayeNet, NaiveBayes, RBF,J48 and US. Since the fusion technique is trained only on positive data, the specificityin Fig. 6 is not as high as AdaBoost, RBF network and J48, though it is greater thanthat of JR, NaiveBayes, BayeNet and US.

Next, we compare the performance of the eight approaches for the non-skinimages. If a skin detection system detects skin when applied to images containingskin, then it should detect nothing when applied to images containing no skin. For

Fig. 6 F-measure and specificity for the 5817 skin-only images. The fusion of color space channelshas higher precision and recall and thus high F-measure, outperforming other approaches whichrequire negative and positive training data. The values reported are calculated per pixel basis. (US:The universal seed approach from [14])

Multimed Tools Appl

Fig. 7 Specificity for the 3174 non-skin images. Since the fusion technique is trained on positive dataonly, it has low specificity compared to the discriminative learning methods. (US: The universal seedapproach from [14])

this purpose, the valid evaluation measure is specificity which is the true negativerate. We compare the eight approaches based on specificity on 3174 images. Figure 7shows the specificity of the concerned techniques for non-skin images calculated on aper pixel basis. The specificity of the fusion technique (0.76) is higher than BayeNet(0.74), US (0.70) and less than NaiveBayes (0.78), JR (0.79), RBF network (0.95),AdaBoost (0.88) and J48 (0.90). The comparably low specificity is due to the factthat the fusion technique is trained only on positive samples. Interestingly, the RBFnetwork, which performs worse on skin-only images, has higher true negative ratefor the non-skin images.

The final evaluation is based on 8991 hybrid images. This dataset contains skin-only and non-skin images. For this set of images, we use the F-measure and specificityas the evaluation measures. Figure 8 shows F-measure calculated on a per pixel basis.In Fig. 8, it can be seen that the specificity of fusion approach is higher than JR,

Fig. 8 F-measure and specificity for the 8991 hybrid images. For the combined skin-only and non-skin images the fusion technique outperforms other approaches in terms of precision and recall withthe exception of the tree based classifier (J48). The values are calculated on a per pixel basis. (US:The universal seed approach from [14])

Multimed Tools Appl

Fig. 9 False positives and false negatives. (US: The universal seed approach from [14])

NaiveBayes, BayeNet and US and less than AdaBoost, RBF and J48. Figure 9 showstrue positives and false positives per total number of pixels for the hybrid set. Thefusion approach provides low false negative rate compared to RBF, J48, JR, Ad-aBoost, NaiveBayes and US, whereas, BayeNet has less false negatives compared tothe fusion approach. The fusion approach has higher false positive rate compared toRBF, J48 and JR and less than that of AdaBoost, NaiveBayes, US and the BayeNet.Regarding precision and recall, Fig. 8 shows that the fusion technique has higherF-measure. The fusion technique provides increased classification performance ofalmost 4 % to AdaBoost, 7 % to JR, 3.8 % to BayesNet, 18 % to NaiveBayes, 26 %

(a) (b) (c) (d) (e)

(f) (g) (h) (i) (j)

Fig. 10 Skin detection based on face detection using fusion approach. The detected face in one framecan be used to learn a skin model for detecting skin in the incoming frames. First column: detectedfaces, Second column: original images, Third column: detected skin of the second column imagesbased on faces from first column images, Fourth column: original images, Fifth column: detected skinbased on faces from first column images. Black shows non-skin

Multimed Tools Appl

(a) (b) (c) (d) (e)

(f) (g) (h) (i) (j)

Fig. 11 Skin detection based on face detection using Jones and Rehg technique. First column:detected faces, Second column: original images, Third column: detected skin of the second columnimages based on faces from first column images, Fourth column: original images, Fifth column:detected skin based on faces from first column images. Black shows non-skin

to RBF, 1 % to US and decreased performance of almost 4 % compared to J48. Forcombined skin-only and non-skin images, the fusion technique outperforms otherapproaches in terms of precision and recall with the exception of J48.

5.8 Real time training and detection

The fusion of color space channels approach can be efficiently used to learn weightsfor the contextual based skin detection based on faces. In Fig. 10, it is shown how acolor space model can be learnt and used for skin detection in the incoming frames.A face detector detects a face in an image. This image is used to learn weights forthe 19 color space components. The weights obtained are used to detect skin in thesubsequent frames of a particular scene. For comparison of the fusion approach withJR, see Fig. 11. The fusion approach provides precise skin detection compared tothe JR approach. This approach of contextual based skin detection is feasible forprecise skin detection as the learning of weights is in real-time and requires onlypositive data for learning the model. The positive data for training is obtained fromthe face area returned by the face detector. The calculation of weights takes about 6milliseconds in Matlab. The time consuming operation is the color space conversion.For real-time either the number of color spaces could be reduced or with the binaryimplementation of the color space conversion, the real-time learning of weights andthereby skin detection can be achieved.

6 Conclusion

We presented a skin detection approach based on selection and fusion of differentcolor space channels being weighted in order to achieve robustness. We use theMarkowitz model of efficient portfolio selection for obtaining weights for differentcolor space channels. These weights are obtained only from the positive training data.By merging different color space channels, we exploit the non-perfect correlationbetween color space channels. The high weights given by the Markowitz model

Multimed Tools Appl

to the Cb and Cr channels of YCbCr color space agrees with the state-of-the-art results regarding the successful skin detection using these two color channels.Therefore, the proposed approach supports the empirical results that Cb and Crare the preferred color channels for skin detection. Results show that the proposedscheme of merging different color space channels is well suited to the problem ofcolor based skin detection. A robust skin detection should not only detect skin whenit is present, but should detect nothing in images without skin. We pay attention tothis in our evaluation by calculating evaluation measures on subsets dividing the testdataset into skin-only, non-skin and hybrid dataset. For skin-only dataset, the fusionapproach outperforms AdaBoost, JR, BayeNet, NaiveBayes, RBF network, J48 andUS in terms of F-Score. For non-skin data the specificity of the proposed method isnot as high as the F-measure for skin-only data though outperforming BayeNet andUS. This is due to the fact that fusion approach is based on positive training dataonly. For the hybrid dataset, the fusion approach outperforms all other approachesexcept J48 on the basis of F-measure. We also showed that the fusion of color spacechannels approach is well suited to real-time skin detection and can be used for on-line learning of a skin model based on detected faces.

References

1. Argyros AA, Lourakis MI (2004) Real-time tracking of multiple skin-colored objects with apossibly moving camera. In: ECCV, pp 368–379

2. Brown D, Craw I, Lewthwaite J (2001) A SOM based approach to skin detection with applicationin real time systems. In: BMVC’01, pp 491–500

3. Cai J, Goshtasby A (1999) Detecting human faces in color images. Image Vis Comput 18:63–754. Chai D, Ngan K (1998) Locating facial region of a head-and-shoulders color image. In: Int. conf.

automatic face and gesture recognition, pp 124–1295. Fleck MM, Forsyth DA, Bregler C (1996) Finding naked people. In: ECCV, pp 593–6026. Fu Z, Yang J, Hu W, Tan T (2004) Mixture clustering using multidimensional histograms for skin

detection. In: ICPR. Washington, DC, USA, pp 549–5527. Garcia C, Tziritas G (1999) Face detection using quantized skin color regions merging and

wavelet packet analysis. IEEE Trans Multimedia 1:264–2778. Hanbury A (2003) A 3d-polar coordinate colour representation well adapted to image analysis.

In: SCIA, pp 804–8119. Hsu R, Abdel-Mottaleb M, Jain A (2002) Face detection in color images. PAMI 24:696–706

10. Jones MJ, Rehg JM (2002) Statistical color models with application to skin detection. IJCV46:81–96

11. Kakumanu P, Makrogiannis S, Bourbakis N (2007) A survey of skin-color modeling and detec-tion methods. PR 40:1106–1122

12. Khan R, Hanbury A, Sablatnig R, Stöttinger J, Khan F, Khan F (2012) Systematic skin segmen-tation: merging spatial and non-spatial data. Multimed Tools Appl 1–25

13. Khan R, Hanbury A, Stöttinger J (2010) Skin detection: a random forest approach. In: ICIP, pp4613–4616

14. Khan R, Hanbury A, Stöttinger J (2010) Universal seed skin segmentation. In Internationalsymposium on visual computing, pp 75–84

15. Khan R, Hanbury A, Stöttinger J, Bais A (2012) Color based skin classiffcation. Pattern RecognLett 33(2):157–163

16. Khan R, Stöttinger J, Kampel M (2008) An adaptive multiple model approach for fast content-based skin detection in on-line videos. In: ACM MM, AREA workshop, pp 89–96

17. Lee JS, Kuo YM, Chung PC, Chen EL (2007) Naked image detection based on adaptive andextensible skin color model. PR 40:2261–2270

18. Markowitz H (1952) Portfolio selection. Journal of Finance 7:77–9119. Peer P, Kovac J, Solina F (2003) Human skin colour clustering for face detection. In: EURO-

CON, vol 2, pp 144–148,

Multimed Tools Appl

20. Phung SL, Bouzerdoum A, Chai D (2005) Skin segmentation using color pixel classiffcation:analysis and comparison. PAMI 27:148–154

21. Sigal L, Sclaroff S, Athitsos V (2004) Skin color-based video segmentation under time-varyingillumination. PAMI 26:862–877

22. Stokman H, Gevers T (2005) Selection and fusion of color models for feature detection. In:Proceedings of the CVPR. IEEE Computer Society, Washington, DC, USA, pp 560–565

23. Stokman H, Gevers T (2007) Selection and fusion of color models for image feature detection.IEEE Trans Pattern Anal Mach Intell 29:371–381

24. Störring M, Andersen H, Granum E (2000) Estimation of the illuminant colour from human skincolour. In: IEEE International conference on automatic face and gesture recognition, pp 64–69

25. Stöttinger J, Hanbury A, Liensberger C, Khan R (2009) Skin paths for contextual agging adultvideos. In: International symposium on visual computing, pp 303–314

26. Vezhnevets V, Sazonov V, Andreev A (2003) A survey on pixel-based skin color detectiontechniques. In: GraphiCon, pp 85–92

27. Wong K, Lam K, Siu W (2003) A robust scheme for live detection of human faces in color images.Signal Process Image Commun 18:103–114

28. Yang J, Lu W, Waibel A (1997) Skin-color modeling and adaptation. In: ACCV, pp 687–69429. Yang M, Ahuja N (1999) Gaussian mixture model for human skin color and its application in

image and video databases. In: SPIE, pp 458–466

Rehanullah Khan graduated from University of Engineering and Technology Peshawar, with aBSc degree (Computer Engineering) in 2004 and MSc (Information Systems) in 2006. He obtainedPhD degree (Computer Engineering) in 2011 from Vienna University of Technology, Austria. Heis currently an Associate Professor at Sarhad University of Science and Technology, Peshawar. Hisresearch interests include color interpretation, segmentation and object recognition.

Multimed Tools Appl

Allan Hanbury is Senior Researcher at the Information & Software Engineering Group of the Vi-enna University of Technology, Austria. He is scientific coordinator of the EU-funded KHRESMOIIntegrated Project on biomedical information search and analysis. He was leader of the Evaluation,Integration and Standards work package of the MUSCLE EU Network of Excellence, and has leada number of Austrian national projects. His research interests include information retrieval, healthinformation retrieval, multimodal information retrieval, and the evaluation of information retrievalresults. He is author or co-author of over 60 publications in refereed journals and internationalconferences.

Julian Stöttinger graduated from Vienna University of Technology with a BSc degree (mediainformatics) in 2004, a MSc (computer graphics and pattern recognition) in 2007 and a PhD(computer science) in 2010. He is currently a post-doc researcher at the University of Trento. Hisresearch interests include color interpretation, local features and visual learning in computer vision.

Multimed Tools Appl

Farman Ali Khan has completed PhD degree from the Institute of Software Technology & Inter-active Systems, Vienna University of Technology. His research interests include learning processes,adaptivity and personalization in learning environments.

Amjad Ullah Khattak completed his MSc. from George Washington University, USA and PhDfrom UET Peshawar, Pakistan. He is currently an Associate Professor in the Electrical Engineeringdepartment at UET Peshawar, Pakistan.

Multimed Tools Appl

Amjad Ali received his B.Sc and M.Sc in Electrical Engineering from the University of Engineeringand Technology, Peshawar, Pakistan. He achieved his doctoral degree from the Information andCommunication Engineering Department of Beijing, University of Posts and TelecommunicationsChina. His research interests focus on Pattern Recognition, Image processing and Biometrics.

Documents

Multiple color space channel fusion for skin detection