Off-Line Driver's Eye Detection: Multi-Block Local Binary Pattern Histogram Vs. Gabor ... Processing/ICATS_2015... · 2015. 11. 12. · Off-Line Driver's Eye Detection: Multi-Block

Off-Line Driver’s Eye Detection: Multi-Block LocalBinary Pattern Histogram Vs. Gabor Wavelets

Djamel Eddine Benrachou, Salah BensaoulaUniversity Badji Mokhtar Annaba

Laboratory of Automatic and Signals- AnnabaDepartment of electronics

BP 12, 23000, Annaba, AlgeriaEmail: [email protected]

Brahim Boulebtateche, Mohamed-Mourad LafifiUniversity Badji Mokhtar Annaba

Laboratory of Automatic and Signals- AnnabaDepartment of electronics

BP 12, 23000, Annaba, AlgeriaEmail: [email protected]

Abstract—Eye detection is a complex issue applied in severalapplications, such as human gaze estimation, Human-Robotinteraction and driver’s aid framework for automatic drowsinessdetection. This paper presents an algorithm that detects thehuman eye in still gray-scale images. The proposed scheme isbased on learning statistical appearance model, which implies; theextraction of features and their classification. A comprehensivecomparison of two photometric feature descriptors is performedbetween Gabor wavelets and Multi-block Local Binary Patternhistogram (BHLBP) features. Facial images are normalized andthe eye features are extracted using the precedent descriptors,then Support Vector Machine classifier (SVM) is used to dis-tinguish eye from non-eye classes. The presented schemes arebuilt and tested with different frames, collected from a realdriving video sequences of the RobeSafe Driver Monitoring Videodataset (RS-DMV), the driving sequences are recorded underrealistic scenarios and with different subjects, which are exposedto a real driving environment. Discriminative performance of thedescriptors are reported.

Keywords—Eye detection, eLBPh, Gabor wavelets, Driver’sdrowsiness.

I. INTRODUCTION AND RELATED WORKS

The detection of the eyes is an active research subject anda challenging topic, which has shown expanding interest theselast decades and demonstrates an efficient applicability indifferent areas, it is still widely studied and used in numerousengineering applications including; driver drowsiness statemonitoring framework, gaze direction estimation for Human-Robot interaction applications. Detecting the ocular regionserves a crucial role in face normalization and thus facilitatesfurther localization of other facial features. Some facialexpressions are clearly distinguished through the eye state,since proven that the driver’s fatigue has a strong correlationwith the PERCLOS measure[1]. These characteristics areincreasingly finding used in safety applications to detectsituations such as sleepiness and lack of attention whiledriving [2], [3], [4] or using hazardous machinery.

In automotive security framework the driver’s somnolencestate is an important issue, as there are few direct measuresused to detect this phenomenon and most of them are relatedto observed outcome symptoms gathered from the person’sbehavior while driving. Recently, Gonzalez-Ortega et al[4], proposed a real-time vision-based eye state detectionframework to monitor the driver’s alertness, it consists to

locate the eyes to recognize their states. This approachis implemented with a consumer-grade computer and anUniversal Serial Bus camera with passive illumination. Lately,Benrachou et al [5] proposed an eye localization frameworkthat extends the conventional Local Binary Pattern (LBP) topyramid transform domain for encoding the eye in 3− levelsof the image pyramid with different LBP variants. Forthe decision issue on eye detection, two well establishedclassifiers were used, Support Vector Machine (SVM)and Multi-layer Perceptron (MLP) for binary classificationbetween eye/not-eye classes. In the same fashion, Benrachouet al [6] exposed an algorithm for On-line eye detection inreal world conditions, their framework is based on uniformLBP histogram to describe the eye patches and Long-ShortTerms Memory Recurrent Neural Network (LSTM-RNN) thatestimates the eye position on the running video frames. Nanniet al [7] presented an eye classification module based onmulti-resolution LTP and LPQ descriptors. Ying and Wang[8] proposed a revised projection algorithm, to determinethe eye position called locally selective projection (LSP ).Sun and Zheng [9] improve the LSP scheme, by proposingan algorithm called LBP + SVM mode, this technique isable to localize the eyes and recognize their state as well. Intheir framework, the projection-based algorithm is avoided,for the reason that is sensitive to the image low quality andthe projection calculation tends to fail in the lower gray-scaleregions, with highly predictable regions that contain the trueposition of eyes.

In fact, there exist commercial eye localization systems,which perform well under relatively controlled environmentalconditions, but tend to fail when the variation appears fromdifferent factors such as, facial expressions, the partialocclusion, the pose variation and lighting conditions.Several eye localization techniques have been emerged, andare broadly categorized into three main types; Methods basedon the measurement of the eye characteristics [10], othersExploiting structural information [11], [12] and those usingLearning statistical appearance model [8], [9].

In this paper, the proposed algorithm is based on conven-tional learning statistical appearance model, three stages areinvolved; i) preprocessing (noise removal, illumination correc-tion etc.) ii) the extraction of useful visual features of eyes andiii) their classification. The advantage of this type of method

International Conference on Automatic control, Telecommunications and Signals (ICATS15)University BADJI Mokhtar - Annaba - Algeria - November 16-18, 2015

1

compared to the aforementioned ones is, that provide richerand more reliable information about the eye for the subsequentclassification. However, the employed feature descriptors musthave some essential proprieties such as, identifiability of theinteresting object, rotation and scale invariance, robust againstenvironmental illumination conditions and lighting changes.In agreement with these conditions, Local Binary Patternshave proven their effectiveness for texture description, theyare naturally tolerant to changes in monotonous gray scale,lighting variations, simple to compute and discriminative.To extract useful visual features from photometric appear-ance of eyes, we adopt the spatially enhanced Local BinaryPattern histogram (i.e BHLBP) to represent the micro- andmacro-structure of the eye image patches. The analysis ofthe discriminant patterns is done by resorting to advancedmachine learning approaches: Support Vector machine (SVM).Thus, our eye appearance-based framework behaves robustlyeven under challenging imaging conditions (pose, illuminationchanges, facial expressions). In our experiments, discriminativeperformance of BHLBP are compared with those of Gaborwavelets descriptors.

The remainder of this paper is organized as follows, SectionII devotes to the details of the proposed approach, in SectionIII the multi-block LBP histogram descriptor modeling ispresented in detail and the conventional Gabor wavelets arebriefly introduced. Section V presents the implementation ofour experiments and finally Section VI contains conclusion,remarks and future work.

II. METHODOLOGY

In this section, the performance of an extended LBP basedapproach for eye detection are analyzed under gray-scale facialimages. Experiments are conducted on the RobeSafe DriverMonitoring Video (RS-DMV) database [13], to show the effec-tiveness of these LBP extension (i.e BHLBP). Discriminativeperformance of BHLBP are compared against GW for eyedescription.

The global architecture of the proposed eye detection sys-tem is given in Fig. 1. For a given test image, the ocular regionis described and segregated from other facial features (i.e nose,mouth, eyebrow . . . etc). The search for the eye location iscarried out by applying the associated detector over an testimage I with size (Nx × Ny)pixels i.e (300 × 163)pixelsfor the outdoor videos and (300 × 103)pixels for the indoorvideo frames. The image I is normalized in the range of (0, 1)and fully scanned with the precedent image descriptor over ashifting a sub-window with size (27× 18)pixels through theentire image map. The algorithm needs a training image set,which is manually collected for this purpose. Either positive(eye) or negative (non−eye) used for training and testing areillustrated in Fig. 2 (a) and (b), respectively. The eye detectionresults are illustrated in Fig. 5.

III. FEATURE EXTRACTION

Features extraction step consists in transforming the inputraw data into meaningful information. Thus, we obtain areduction of the decision space which may accelerate theprocessing time.

(a) (b) (c)

Fig. 1. The overall architecture of the proposed eye detection system: (a). Input image,(b).Delimitation of the ocular region by omitting the non-eye elements in the image (whiteregion is an eye and black region is not-eye), (c). the eyes are detected.

A. Global shape descriptor

Gabor wavelets have been successfully applied for a varietyof machine vision applications, such as Texture segmentation,Edge detection, Boundary detection and so on. The filterbank can capture a number of salient visual proprieties in-cluding spatial localization, orientation selectivity and spatialfrequency selectivity quit well. They have widely explored inface recognition domain. This representation is an optimal toolused for local features extraction, in the sens that it producesthe same operating principle of simple cells in visual cortexof the mammal’s brain and also properties in multi-directions,optimal for measuring local spatial frequencies. Based on theseadvantages, Gabor representation is widely used in applicationsof image analysis and applications of face recognition andextraction of features such as facial expressions. Exploit theeffectiveness of this filter, we extract the representative featuresof the eye images, collected by the algorithm from introduceddriving frames. The used filters have the form

ψµ,ν(x) =‖ kµ,ν ‖2

σ2e−‖kµ,ν‖2‖x‖2

2σ2 [e(ikµ,ν) − eσ2

2 ] (1)

Where σ is the standard deviation of Gaussian kernel, µ and νdefine the orientation and scale of Gabor filter kernel and wavevector kµ,ν can be represented as kµ,ν = kνexp(iφµ), herekµ = kmax

fν and φµ = µπ8 , kmax is the maximum frequency,

spacing factor between the kernels and the frequency domain.

Gµ,ν(x, y) = I(x, y) ∗ ψµ,ν(x, y) (2)

Where ∗ is the convolution operator. Consequently, the imageI(x, y) could be represented by the Gabor wavelets;

{Gµ,ν(x, y), µ = 0, . . . , 4; ν = 0, . . . , 7} (3)

We used a filters bank of eight orientations and five centralfrequencies, convolving the image with each of the 40 filters,in order to generate Gabor features.

B. Local texture descriptor

The non-parametric LBP operator was firstly mentionedby Harwood et al [14], then introduced by Ojala et al.[15] fortexture image description. The original LBP works in a gridsize of (3× 3)pixels for a given arbitrary pixel over an inputgray-scale image, the LBP code is computed by comparing thegray-level value of the center pixel and its neighbors, withinthe respective grid. It should be pointed out that gray-valueof the neighbor pixels, which are not covered by the gridsis estimated by interpolation, the thresholding stage is carriedout with respect to the center pixel, resulting a binary number


2

called LBP code.

LBPP,R =P−1∑

p=0

s(gp − gc)2p (4)

where s(z) is the signum function

s(z) =

{1, z ≥ 00, z < 0

(5)

The equation 5 represents the signs of the computed differ-ences in 4, gc is the gray value of the center pixel, gp isthe value of its neighbors, P is the total number of samplingpoints, which controls the quantization of the angular spaceand R is the radius of the neighborhood (radii). The samplingis performed commonly in the clockwise direction, aroundthe central pixel according to a particular radius value, whichdetermines the spatial resolution of the distributed samplingpoints. In practical tasks, LBP histogram (LBPH) is usuallyused, the binary pattern of each pixel in the input image isidentified. Thereafter, the LBPH is computed over the wholeLBP image, i.e.,

H(k) =M∑

m=1

N∑

n=1

f(LBPP,R(m,n), k), k ∈ [0,K] (6)

f(x, y) =

{1, x = y0, otherwise

(7)

Where K is the maximal LBP pattern value. The ocular

0 10 20 30 40 50 600

0.05

0.1

0.15

0.2

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0 10 20 30 40 50 600

0.05

0.1

0.15

0.2

0 10 20 30 40 50 600

0.05

0.1

0.15

0.2

0

0.1

0.2

0.3

0.4

60 120 180 2400

Fre

qu

en

cy

Pattern

Left

Upper

Right

Upper

Right

Bottom

Left

Bottom

Left upper Right upper

Left bottom Right bottomLBP code

LB

P h

isto

gra

m

Fig. 2. Feature extraction using BHLBP

region is considered a dynamic and non-rigid object withlarge variations, this clearly indicates that textures of theocular region is not close to be uniform due to intrinsic andenvironmental variations that can manifest in this facial region,and preprocessing such as subregion division can mitigatethese large variations for a given eye image to certain extent.Hence, this process indicates a sort of rationality to use Multi-block histogram LBP (BHLBP) as an eye feature descriptor.

In this paper, BHLBP is used to describe the ocularregion, it is formed on the basis of an eye image with size(27× 18)pixels, which is divided into 6 non-overlapped sub-blocks of size 9 × 9 pixels; and then the 59-label LBPu28,1operator is adopted to extract the LBP features. Hence, thelocal histogram is computed over the sub-regions and theglobal spatial histogram of 354bins (i.e 59bins×6) is obtained,by concatenating all sub-regional histograms. These parametersettings for eye representation were suggested in [6]. Theutilized LBPu28,1 is an extension of the original LBP operator,

which indicates a coding and histogram mapping scheme in theu2-uniform LBP codes of eight neighbors at a distance of onepixel to the origin. The LBP operator is considered uniformif the binary pattern contains at most two bitwise transitionsfrom 0 to 1 or vice versa, when the bit pattern is consideredcircular. LBPu2 is statistically stable and less sensitive to noise[16].

IV. DATA AND SETTINGS

The RobeSafe Driver Monitoring Video (RS-DMV)database is a publicly available dataset, which is constitutedfrom a set of video sequences of different subjects whiledriving. It contains 10 video sequences, 7 recorded outside(TypeA) as shown in Fig. 3, the videos have been captured incar cockpit moving around the university campus, The (typeB) as shown in Fig. 4 sequences include 3 video records,captured in a realistic truck simulator cockpit, under low-lightconditions, that approached night time scenarios as shownin Fig. 4. In both cases, the cameras were installed over thedashboard in front of the driver. The persons were fully awake,talked frequently and were asked to look regularly to rear-view mirrors and operate the car sound system. Sequences in-cludes some vision challenging scenarios; partial occlusions,illumination changes and other elements that are helpful fordriver’s behavior monitoring systems using computer vision.Frames are recorded in gray-scale, at 30 frames per second, andstored as RAW video. The sizes of the outdoor and the indoorvideo frames are (960× 480)pixels and (1390× 480)pixels,respectively.

Fig. 3. Type(A): Outdoor video records

Fig. 4. Type(B): Indoor video records

V. EXPERIMENTAL RESULTS

Eye detection application, requires a certain ability todistinguish the ocular region through the human face, despitethe inherent variations in the intrinsic eye properties and itsexternal appearance changes.


3

TABLE I. STATISTICAL RESULTS ON ROBESAFE DRIVER MONITORING VIDEO (RS-DMV) DATABASE

Approach TP (%) FP (%) TN(%) FN (%) Prec Rec F1score Acc (%)

BHLBP+SVM(Linear) 4.7 1.1 94.1 0.1 0, 8085 0, 9743 0, 8836 98.57Gabor+SVM(Linear) 69.0 0.6 29.1 1.3 0, 9915 0, 9810 0, 9862 98.08

BHLBP+SVM(Quadratic) 3.2 2.6 94.0 0.2 0, 5531 0, 9285 0, 6932 97.17Gabor+SVM(Quadratic) 68 0.9 26.7 3.7 0, 9872 0, 62 0, 7616 95.42

In this section, we investigate the classificationperformance of the spatially enhanced LBP and Gaborwavelets descriptors, both are applied for eye detectionproblem. For this purpose a large image set is collected from(RS-DMV) database, including face images of subjects takenunder unconstrained conditions: hard pose variation, facialexpression, different illumination conditions and occlusions.

Searching for eye location, is carried out by applyingthe associated detector that scans a full image of size(Nx × Ny)pixels over multiple shifting windows with size(27 × 18)pixels. The algorithm needs a training image set,which is manually collected for this purpose. 2476 imagesare used, which confounded between eye and non-eye imagepatches, the collected samples are randomly divided, with67% used for training, and the remaining 33% imagesare used for testing. The discriminative performance ofBHLBP is compared against Gabor wavelets features. Inthis work, BHLBP and Gabor wavelets are trained andvalidated with SVM classifiers, by using the open sourceSVM implementation Libsvm version 3.12 [17].The size of the test frames are reduced to (300× 163)pixelsfor the outdoor video fames and (300 × 103)pixels for theindoor scenario frames, in order to speed up the detectionprocess. BHLBP is applied with a subregion division andhistogram concatenating, this image decomposition strategyyields a global histogram with a length of 354bins, pleaserefer to Sec. III-B, this representation represents the micro-and macro-structure of the eye patch.Nevertheless, Gabor wavelets are applied with 40 filters (8orientations and 5 scales) on (27 × 18)pixels eye patches,the resulting feature vector has 19440-dimensional vector.In our experiments, Gabor filter bank appears to be quiteperspective and has several advantages such as invarianceto homogeneous illumination changes, robust against smallchanges in head poses and partial occlusions, but it isrelatively long comparing the that of BHLBP descriptorand may implies a computational complexity in the featureextraction step.In the classification phase, two SVM’s kernels (i.e Linearand Quadratic kernels) are used, to estimate the separabilitypotential of the collected image, and preserving the tradeoff between the implementation simplicity and detectioneffectiveness, in terms of detection accuracy (Acc).The use of such classification method tends to suffer fromsurrounding environmental variations implies a generalizationproblem, which is theoretically discussed from Vapnik-Chervonenkis (VC) in [18]. In the real-world applications,generalization problem constitutes a huge bottleneck forAutomatic face and facial feature detection.To improve the proposed approach, training data is collectedfrom the driver’s environmental space, which makes thetrained classier significant, and less exposed to false positives.

In practice this process can be applied by using a web-cameraembedded in the driver cockpit, that records surroundingdriver’s environment and enriches the dataset used for training.

The performance validation of the detector and the finalresults are presented, including the calculation of TruePositive (TP), False Positive (FP) , True Negative (TN) andFalse Negative (FN). The final average accuracy (Acc) foreach descriptor is presented in Table I.Moreover, precision and recall are calculated. Thanks tothese two metrics, we compute the F-Score interpreted asa harmonic mean of the precision and recall for furthercomparison of the obtained results. When the trained classifieris predicting a positive class, we mostly have a high recalland a low precision, in contrast if the trained model predictsa negative class, a high precision is obtained against a verylow recall. Thus, model achieves the highest precision andrecall simultaneously is often desirable.

The proposed eye detection method, was tested on someRS-DMV image set. Therein images stress the real-worldconditions, it includes variations in view (resolution changes),pose variations, illumination and expressions changes. Visualinspection is used as an evaluation measure for demonstratingthe robustness of our method against these variations. To bemore rigorous in evaluating the results, examples of successfullocalization of eyes from these datasets are shown in Figure. 5

Fig. 5. Eye detection results on some images of the RS-DMV database.

From Table I, it is clearly shown that Gabor waveletstransform, are well applied for eye detection providing agood results, which means that this descriptor is effectivefor facial features detection, besides being efficient forface description. Although, LBP based descriptor has someattractive proprieties while being very simple to compute andless complex than Gabor wavelets, it is invariant to ambientenvironment changes, illumination and alignment. Moreover,it should be pointed out that both approaches detect well the


4

human eyes in different states (open, partially closed andclosed).

VI. CONCLUSION AND FUTURE WORK

In this paper, we have exposed an algorithm for eyedetection in video recordings from automotive cockpit. Twofeature extraction methods have been assessed for ocularregion description; Gabor wavelets representation and BHLBP.In terms of eye presence classification and from conductedexperiments, it concludes that the BHLBP with linear SVMapproach gets a slight better performance than linear SVMtrained with Gabor features. Thus, BHLBP and SVM(linear)realized a best recognition results of 98.57%.This allow us to conclude that proposed Off-line eye detectionframework, achieved a good detection results and conductedexperimentations validate the performance of BHLBP/SVMscheme, which is favorably applied in an On-line eye detectionframework on an embedded platform such as Raspberry Pi.Moreover, LBP based approach with sub-region division andhistogram concatenating strategy, enhances the discriminativepower of the descriptor. However, more complex image de-composition strategies exist (radial devisions, pyramidal imagedecomposition . . . etc) that can be joint to the LBP descriptorand tested in the future, such as for example.In the future this work could be extended to track the positionof the eyes while detecting their states, within a video streamto monitor the driver’s vigilance while he is conducting acar. The output of this algorithm, could constitute an inputto the Long Short Term Memory- Recurrent Neural Networks(LSTM-RNNs), this deep-neural architecture can handle thetemporal information dependencies through time without sig-nificant loss. Our system runs in a standard computer andcamera, that makes it easy the implementation of the vision-based automotive security system.

REFERENCES

[1] Dinges, D.F., Grace, R.: Perclos: A valid psychophysiological measureof alertness as assessed by psychomotor vigilance. Federal HighwayAdministration. Office of motor carriers, Tech. Rep. MCRT-98-006(1998)

[2] Benrachou, D.E., Boulebtateche, B., Bensaoula, S.: Gabor/pca/svm-based face detection for drivers monitoring. Journal of Automation andControl Engineering 1 (2013) 115–118

[3] Flores, M.J., Armingol, J.M., de la Escalera, A.: Real-time warningsystem for driver drowsiness detection using visual information. Journalof Intelligent & Robotic Systems 59(2) (2010) 103–125

[4] Gonzalez-Ortega, D., Dıaz-Pernas, F., Anton-Rodrıguez, M., Martınez-Zarzuela, M., Dıez-Higuera, J.: Real-time vision-based eye state detec-tion for driver alertness monitoring. Pattern Analysis and Applications16(3) (2013) 285–306

[5] Benrachou, D.E., dos Santos, F.N., Boulebtateche, B., Bensaoula, S.:Automatic eye localization; multi-block lbp vs. pyramidal lbp three-levels image decomposition for eye visual appearance description. In:Pattern Recognition and Image Analysis. Springer (2015) 718–726

[6] Benrachou, D.E., dos Santos, F.N., Boulebtateche, B., Bensaoula, S.:Online vision-based eye detection: Lbp/svm vs lbp/lstm-rnn. In:CONTROLO2014–Proceedings of the 11th Portuguese Conference onAutomatic Control, Springer (2015) 659–668

[7] Nanni, L., Lumini, A.: A combination of face/eye detectors for a highperformance face detection system. IEEE Multimed. PP 19 (2012) 20–27

[8] Zheng, Y., Wang, Z.: Robust and precise eye detection based on locallyselective projection. In: ICPR. (2008) 1–4

[9] Sun, R., Ma, Z.: Robust and efficient eye location and its state detection.In: Advances in Computation and Intelligence. Springer (2009) 318–326

[10] Bourlai, T., Whitelam, C., Kakadiaris, I.: Pupil detection under lightingand pose variations in the visible and active infrared bands. In:Information Forensics and Security (WIFS), 2011 IEEE InternationalWorkshop on, IEEE (2011) 1–6

[11] Wu, J., Mei, L.: A face recognition algorithm based on asm and gaborfeatures of key points. In: 2012 International Conference on Graphicand Image Processing, International Society for Optics and Photonics(2013) 87686L–87686L

[12] Hollingsworth, K., Clark, S., Thompson, J., Flynn, P.J., Bowyer, K.W.:Eyebrow segmentation using active shape models. In: SPIE Defense,Security, and Sensing, International Society for Optics and Photonics(2013) 871208–871208

[13] : http://www.robesafe.com/personal/jnuevo/datasets.html[14] Harwood, D., Ojala, T., Pietikainen, M., Kelman, S., Davis, L.: Texture

classification by center-symmetric auto-correlation, using kullback dis-crimination of distributions. Pattern Recognition Letters 16(1) (1995)1–10

[15] Ojala, T., Pietikainen, M., Harwood, D.: A comparative study of texturemeasures with classification based on featured distributions. Patternrecognition 29(1) (1996) 51–59

[16] Jian, W., Honglian, Z.: Eye detection based on multi-angle templatematching. In: Image Analysis and Signal Processing, 2009. IASP 2009.International Conference on, IEEE (2009) 241–244

[17] Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines.ACM Transactions on Intelligent Systems and Technology (TIST) 2(3)(2011) 27

[18] Tan, Y., Wang, J.: A support vector machine with a hybrid kerneland minimal vapnik-chervonenkis dimension. Knowledge and DataEngineering, IEEE Transactions on 16(4) (2004) 385–395


5

Documents

Off-Line Driver's Eye Detection: Multi-Block Local Binary Pattern Histogram Vs. Gabor ... Processing/ICATS_2015... · 2015. 11. 12. · Off-Line Driver's Eye Detection: Multi-Block