An FPGA-based Parallel Hardware Architecture for Real-time ... · presented an eye detection method using AdaBoost training with MCT-based eye features [17]. The MCT-based AdaBoost

http://dx.doi.org/10.5573/JSTS.2012.12.2.150 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.2, JUNE, 2012

An FPGA-based Parallel Hardware Architecture for Real-time Eye Detection

Dongkyun Kim*, Junhee Jung*, Thuy Tuong Nguyen*, Daijin Kim**, Munsang Kim***, Key Ho Kwon*, and Jae Wook Jeon*

Abstract—Eye detection is widely used in applications,

such as face recognition, driver behavior analysis, and

human-computer interaction. However, it is difficult

to achieve real-time performance with software-based

eye detection in an embedded environment. In this

paper, we propose a parallel hardware architecture

for real-time eye detection. We use the AdaBoost

algorithm with modified census transform(MCT) to

detect eyes on a face image. We parallelize part of the

algorithm to speed up processing. Several downscaled

pyramid images of the eye candidate region are

generated in parallel using the input face image. We

can detect the left and the right eye simultaneously

using these downscaled images. The sequential data

processing bottleneck caused by repetitive operation

is removed by employing a pipelined parallel

architecture. The proposed architecture is designed

using Verilog HDL and implemented on a Virtex-5

FPGA for prototyping and evaluation. The proposed

system can detect eyes within 0.15 ms in a VGA image.

Index Terms—Eye detection, hardware architecture,

FPGA, image processing, HDL

I. INTRODUCTION

Eye detection is an image processing technique to

locate eye position from an input facial image. Eye

detection is widely used in applications, including face

recognition, driver behavior analysis, and human-

computer interaction [1-4]. The extraction of local facial

features, such as nose, eye, and mouth, is important in

face recognition to analyze the face image [1-2]. The eye

is the most important and most commonly adopted

feature for human recognition [2]. Eye detection is used

to track the eye and gaze to monitor driver vigilance in

driver behavior analysis [3]. Eye-gaze tracking system

can be used for pointing and selection in human-

computer interactions [4].

Z. Zhu and Q. Ji classified traditional eye detection

methods into three categories: template based methods,

appearance based methods, and feature based methods

[5]. In the template-based methods, a generic eye model

is first designed and the template matching is used to

detect the eyes on the image. The deformable template

method is a commonly used template based method [6-8].

In this method, an eye model is first designed and then

the eye position is obtained via a recursive process. This

method can detect the eye accurately, but it is

computationally expensive and good image contrast is

required for the method to converge correctly.

The appearance-based methods detect eyes based on

their photometric appearance [9-11]. These methods

usually need a large amount of training data representing

the eyes of different subjects, under different face

orientations, and under different illumination conditions.

These data are used to train a classifier; detection is then

achieved via classification. The feature-based methods

use characteristics, such as the edge, iris intensity, color

distribution, and the flesh of the eyes to identify features

Manuscript received Aug. 23, 2011; revised Dec. 19, 2011. * Dep. ECE., Sungkyunkwan University, Suwon, Korea ** Dep. SCE., POSTECH, Pohang, Korea *** Adv. Robotics Research Center, KIST, Seoul, Korea E-mail : [email protected]

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.2, JUNE, 2012 151

around the eyes [12].

There are several approaches to reduce eye detection

processing time. These approaches can be broadly

classified into two groups: software-based and hardware-

based. The focus in the software-based approach is

designing and optimizing an eye detection algorithm for

real-time performance. Conversely, in the hardware-

based approach, the eye detection algorithm is

parallelized and implemented on DSP or FPGA to

achieve real-time performance.

T. D'Orazio et al. propose a real time eye detection

algorithm [13]. They use iris geometrical information to

determine the eye region candidate and symmetry to

select the pair of eyes. Their algorithm consists of two

steps. First, the region candidate that contains one eye is

detected from the whole image by matching the edge

directions with an edge template of the iris. Then, in the

opposite region, a search is carried out for the second eye,

whose distance and orientation are compatible with the

range of possible eye positions. The software has been

implemented using Visual C++ on a Pentium III 1GHz.

They use a camera with a frame rate of 7.5 fps. The

search for the eyes in an image of 640×480 takes about

0.3 sec.

K. Lin et al. proposed a fast eye detection scheme for

use in video streams rather than still images [14]. The

temporal coherence of sequential frames was used to

improve the detection speed. First, the eye detector

trained by the AdaBoost algorithm is used to roughly

obtain the eye positions. Then, these candidate positions

are filtered according to geometrical patterns of human

eyes. The detected eye regions are then taken as the

initial detecting window. The detecting window is

updated after each frame is detected. The experimental

system was developed with Visual C++ 6.0 on a Pentium

2.8 GHz and 1 GB memory. The mean detection rate was

92.73% for 320×240 resolution test videos in their

experiments, with a speed of 24.98 ms per frame.

However, CPU-based implementation has two major

bottlenecks that limit performance: data transfer

bandwidth and sequential data processing rate [15]. First,

video frames are captured from the camera and

transferred to the CPU main memory via a transfer

channel, such as USB or IEEE1394. These channels have

limited bandwidth; hence they impose a data flow

bottleneck. After a frame is grabbed and moved to

memory, the CPU can sequentially process the pixels.

The CPU adds a large overhead to the actual computation.

Its time is split between moving data between memory

and registers, applying arithmetic and logic operators on

the data, and maintaining the algorithm flow, handling

branches, and fetching code. Additionally, a relatively

long latency is imposed by the need to wait for an entire

frame to be captured before it is processed.

The Deployment and real-time processing of the eye

detection algorithm in an embedded environment are

difficult due to these reasons. Therefore, A. Amir et al.

implemented an embedded system for eye detection

using a CMOS sensor and FPGA to overcome the

shortcomings of CPU-based eye detection [15]. Their

hardware-based embedded system for eye detection was

implemented using simple logic gates, with no CPU and

no addressable frame buffers. The image-processing

algorithm was redesigned to enable highly parallel,

single-pass image-processing implementation. Their

system processes 640×480 progressive scan frames at a

rate of 60 fps, and outputs a compact list of eye

coordinates via USB communication. It is difficult to

achieve real-time performance with CPU-based eye

detection in an embedded environment. A. Amir et al.

reduce processing time by using FPGA [15]. However,

the eye detection algorithm they used is relatively

simple; hence, the eye miss rate is relatively high.

B. Jun and D. Kim proposed a face detection algorithm

using MCT and AdaBoost [16]. I. Choi and D. Kim

presented an eye detection method using AdaBoost

training with MCT-based eye features [17]. The MCT-

based AdaBoost algorithm is very robust, even when the

illumination varies, and is very fast. They choose the

MCT-based AdaBoost training method to detect the face

and the eye due to its simplicity of learning and high

speed of face detection.

In this paper, we propose a dedicated hardware

architecture for eye detection. We use the MCT-based

AdaBoost eye detection algorithm proposed by I. Choi

and D. Kim [17]. We revise the algorithm slightly for

parallelization. We parallelize the part of the algorithm

that is iterated to speed up processing. Several

downscaled images of eye candidate regions are

generated in parallel using the input face image in the

pyramid downscaling module. These images are passed

to the MCT and cascade classifier module via a pipeline

152 DONGKYUN KIM et al : AN FPGA-BASED PARALLEL HARDWARE ARCHITECTURE FOR REAL-TIME EYE DETECTION

and processed in parallel. Thus, we can detect the left

and right eye simultaneously. The sequential data

processing bottleneck caused by repetitive feature

evaluation is removed by employing pipelined parallel

architecture.

The proposed architecture is described using Verilog

HDL (Hardware Description Language) and implemented

on an FPGA. Our system uses CameraLink to interface

with the camera. The CameraLink interface transfers the

control signal and data signal synchronously. The entire

system is synchronized with a pixel clock signal from the

camera. Our system does not need an interface such as

USB; hence we have removed the bottleneck caused by

the interface bandwidth. The frame rate of the proposed

system can be flexibly changed in proportion to the

frequency of the camera clock. The proposed system can

detect left and right eyes within 0.15 ms using a VGA

(640×480) image and a 25 MHz clock frequency.

This paper consists of six sections and appendices. The

remainder of the paper is organized as follows. In

Section 2, we describe the details of the MCT and

AdaBoost algorithm and the overall flow of the eye

detection algorithm that we use. Section 3 describes the

proposed hardware architecture and the simulation

results. Section 4 presents the configuration of the system

and the implementation result. Section 5 presents the

experimental results. Finally, our conclusions are

presented in Section 6.

II. EYE DETECTION ALGORITHM

1. Modified Census Transform

The modified census transform is a non-parametric

local transform that is a modification of the census

transform by B. Fröba, and A. Ernst [18]. It is an ordered

set of comparisons of pixel intensities in a local

neighborhood to find out which pixels have lower

intensity than the mean pixel intensity. In general, the

size of the local neighborhood is not restricted, but in this

work, we always assume a 3×3 surrounding.

The modified census transform is described below. We

define N(x) as the local spatial neighborhood of the pixel

x, so that x N(x), and N’(x) = N(x) x. Let I(x) and

( )I x be the intensity of the pixel x and the intensity

mean in the neighborhood N’(x), respectively. The

modified census transform can be presented as

( )

( ) ( ), ( )x N x

x I x I x

where ( ), ( )I x I x is a comparison function which

yields 1 if ( ) ( )I x I x , and yields 0 if ( ) ( )I x I x ,

and is a concatenation operation to form a bit vector

(see Fig. 1). If we consider a 33 neighborhood, we are

able to determine 511 distinct structure kernels. Fig. 1

shows an example of the modified census transform.

In this example, I(x) is 95.11. Each pixel on N’ (x) is

transformed to 0 or 1 by applying ( ), ( )I x I x . Bits

are concatenated in order, as shown as Fig. 1.

2. AdaBoost Algorithm

AdaBoost, short for Adaptive Boosting, is a machine

leaning algorithm, formulated by Y. Freund and R.

Schapire [19]. They introduced the AdaBoost algorithm

briefly in another work and showed that it had good

generalization capability [20]. It is a meta-algorithm, and

can be used in conjunction with many other learning

algorithms to improve the performance. AdaBoost is

adaptive, in the sense that subsequent classifiers that are

built are tweaked in favor of those instances

misclassified by previous classifiers. AdaBoost is

sensitive to noisy data and outliers. However, in some

situation it can be less susceptible to the over fitting

problem than most learning algorithms [21].

AdaBoost calls a weak classifier repeatedly in a series

of rounds t = 1, 2, ... , T from a total of T classifiers. For

each call, a distribution of weights Dt is updated that

indicates the importance of examples in the data set for

the classification. The weights of each incorrectly

classified example are increased (or alternatively, the

Fig. 1. Example modified census transform.


weights of each correctly classified example are

decreased), in each round, so that the new classifier

focuses more on those examples [21].

3. MCT-based AdaBoost Algorithm

In this paper, we use the AdaBoost training method

with MCT-based eye features proposed by I. Choi and D.

Kim [17]. In the AdaBoost training, they constructed a

weak classifier that classifies the eye and non-eye

patterns and then constructed a strong classifier that is a

linear combination of weak classifiers [17].

In the detection, they scanned the eye candidate region

by moving a 12×8 sized part of the scanning window and

obtained a confidence value corresponding to the current

window location using the strong classifier. Then, they

determined the window location whose confidence value

was maximized as the location of the detected eye. Their

MCT-based AdaBoost training was performed using only

the left eye and non-eye training images. Therefore,

when they tried to detect the right eye, they needed to

flip the right subregion of the face image [17].

Fig. 2 shows the flowchart of the eye detection

algorithm. The eye detection algorithm requires a face

image and the coordinates of the face region as inputs. If

the detection completes successfully, the algorithm

outputs the coordinates of the left and right eyes as

results.

Eye candidate regions are cropped for pyramid

downscaling in the first step. The face region is split in

half horizontally and vertically. The top-left subregion is

set as the left eye candidate region and top-right

subregion is set as the right eye candidate region. Then,

the right eye candidate region is flipped, because the

training data for the right eye is the same as for the left.

These candidate images are downscaled to a specific size

to get pyramid images. There are eight pyramid images

(four for the left and four for the right) and the images

sizes are 33×36, 31×34, 29×32, and 27×29. The four

down-scaled images are enough for an indoor

environment where the support distance is 0.5 ~ 8 meters.

These sizes of the down-scaled images were determined

by empirical experiments.

MCT is applied to the downscaled pyramid images in

the second step. The window size for the transform is

3×3. We can get MCT images by sliding the window

across the pyramid image. Then, the MCT-transformed

pyramid images are passed to the cascade classifier to

detect eyes.

The MCT images are scanned by the classifier and

candidate eye locations are extracted in the third step.

There are three stages of classifiers. The window size for

the classifier is 15×15. The classifier slides the window

across the MCT image and extracts the confidence value

of the current window position. If the confidence value is

lower than the specific threshold value, then the

confidence value of the next-stage classifier is calculated.

If the confidence value of the last stage is lower than the

threshold, then an eye candidate is detected in the current

window position. The eye coordinate can be calculated

from the current window position. If there are no

detected eye candidates after sliding, then detection has

failed.

Eye detection for one pyramid image is completed in

the third step. The algorithm repeats these steps for all

the pyramid images. After the left eye detection finishes,

the same process is applied to the right eye pyramid

images. After this repetitive operation, the left eye

candidates list and right eye candidates list are updated.

In the post-processing step, if there are several eye

candidates in the list, the eye candidate that has the

maximum confidence value is selected as the result. Fig. 2. Eye detection algorithm flowchart.


III. PROPOSED HARDWARE ARCHITECTURE

1. Overall Architecture

Fig. 3 shows the overall hardware architecture of the

proposed system. The proposed system consists of four

sub-modules: a pyramid downscaling module, an MCT

module, a parallel classifier module, and a post-

processing module. The face detector and SRAM are not

part of the proposed system.

The face detector receives the image from the camera

and detects faces on the image. The face detector outputs

the top-left and bottom-right coordinates of the face

region. The face detector passes the face region

coordinates to the eye detection module after detecting

the faces. The SRAM Interface module updates every

frame of the image from the camera to SRAM. This

image is a non-uniform eye region image. A stored non-

uniform eye image is then used to generate the fixed size

pyramid image by using the pyramid downscaling

module.

The pyramid downscaling module receives as input the

left and the right eye region images from SRAM. SRAM

cannot be accessed in parallel, so we have to crop the

image in serial. First, the downscaled images of the left

and right eye regions are generated using SRAM data

and stored in registers. After generating the first

downscaled images, we can generate other downscaled

images in parallel, because the registers can be accessed

in parallel. This is different to a software algorithm, in

which pyramid images are generated one-by-one.

All generated pyramid images are passed to the MCT

module in parallel through pipelines. There are eight

MCT modules for eight pyramid images. In the MCT

modules, the 3×3 window slides across the pyramid

images for the census transform. MCT images are passed

to the classifier module in parallel through pipelines.

The classifier module extracts features from the MCT

image. In this module, a window with a pre-defined size

is used to scan through the MCT image from top-left to

bottom-right. At every stage of the window scanning, if

the confidence value of this stage is greater than a

defined threshold, then the current window region is

considered as an eye candidate. The threshold values are

built in advance (by learning), corresponding to the

classifier’s stages. The coordinates of the eye candidate

can be calculated based on the current position of the

scanning window. In contrast to the software

implementation, in this paper we apply the 3-stage

classifier in parallel for eight MCT images. Hence, we

need 24 classifiers (3 stages × 8 images) for detection.

Moreover, our hardware implementation can detect the

left eye and the right one simultaneously. As a result,

several eye candidates can be detected within one left or

right eye region. If there are more than one eye candidate,

the candidate that has the maximum confidence value is

selected as the final detection result.

2. Flip / Pyramid Downscaling Module

Fig. 4 shows the pyramid downscaling images of eye

candidate regions. The pyramid downscaling module

needs a face image and face region coordinates to

generate pyramid images. The face image is stored in

SRAM and face region coordinates can be received from

the face detector.

First, the pyramid module splits the eye candidate

regions. As shown in Fig. 4, the top-left region is set as

the left eye candidate region and the top-right region is

set as the right eye candidate region. Image pixels in

SRAM can be accessed one-by-one every pixel clock

period. Thus, we cannot generate all eight pyramid

images simultaneously. Therefore, we should generate

the first pyramid images of the left and right eye regions

Camera Input

Rig

ht R

egio

n F

lip /

Do

wn

scalin

g

Flip

MC

T T

rans

form

Post Processing

FaceDetector

8-Image x 3-StateParallel Classifier

Left

Right

Stage 1 Stage 2 Stage 3

H1<T1 H2<T2 H3<T3

H’1<T1 H’2<T2 H’3<T3

Detection Result

SRAM

Eye Regionof Faces

UninformedEye Image

Fig. 3. Overall hardware architecture of the proposed system.


in serial. In Fig. 4, image (1) and image (2) are generated

in serial, and these image data are stored in registers. We

use nearest-neighbor interpolation to generate

downscaling images. We can generate other downscaled

images in parallel using these images stored in registers.

In the software algorithm, the right eye is flipped after

cropping the image. As shown in Fig. 4, the right eye

candidate region is scanned from right to left. Thus, flip

is unnecessary in our architecture. The size of pyramid

images is the same as in the software algorithm (33×36,

31×34, 29×32, 27×29).

3. MCT Module

Fig. 5 shows the realization of MCT bit computation.

The average intensity of pixels in the 3×3 window is

needed to calculate the MCT bit vector. We need to use

the divider module to calculate the average intensity.

However, the computation latency of the divider module

is very long compared to the adder module. Thus, we use

the adder and shifter instead of the divider.

We do not calculate average intensity. Instead, we

calculate SUM_I and I(x,y)×9 as shown in Fig. 5. SUM_I

is the sum of pixels in the window. We can calculate this

value using the adder tree. The Ixy×9 value can be

calculated by multiplying the intensity of every pixel by

nine. We implement the computation using a

combination of the shifter and adder, as shown in Figure

8. By comparing SUM_I and Ixy×9, we can get the

number of MCT bits for each pixel in the window.

4. Parallel Classifier Module

In the classifier module, a window of a pre-defined

sized (15×15) slides from top-left to bottom-right on the

MCT images, as shown in Fig. 6. In contrast to the

software algorithm, we apply the 3-stage classifier in

parallel for eight MCT images. Thus, we can calculate

confidence values for each stage classifier

simultaneously. We can detect both eyes simultaneously.

We need 24 classifiers (8 images × 3 stages).

We use a revised version of the parallel classifier

cascade architecture proposed by Jin et al. [22] for eye

detection, as shown in Fig. 7. They designed the

①

②

Left Eye Region

Right Eye Region

Fig. 4. Pyramid downscaling images.

X

1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3

+ + +

+

SUM_I

1,1 1,1 3,3 3,3

<<3 <<3

+ +

I(1,1)x9 I(3,3)x9

SUM_I I(1,1)x9 SUM_I I(3,3)x9

> >

MCT(1,1)

1,1 1,2 1,3

2,1 2,2 2,3

3,1 3,2 3,3

Intensity MCT

1,1 1,2 1,3

2,1 2,2 2,3

3,1 3,2 3,3

MCT(3,3)

Fig. 5. Realization of MCT bit computation. Fig. 6. Parallel classifier cascade for left eye detection.


classifiers using look-up tables (LUTs), since each

classifier consists of the set of feature locations and the

confidence value corresponding to the LBP (local binary

pattern). LUTs contain training data generated by

AdaBoost training. We revise this LUT for MCT rather

than LBP. The three stages of the cascades have an

optimized number of allowed positions: 22, 36, and 100,

respectively. In the software algorithm, the classifiers are

cascaded in serial. In the proposed hardware, however,

all the cascade stages are pipelined and operate in

parallel. The feature extraction result is generated at each

pixel clock time after the fixed pipeline latency.

IV. SYSTEM IMPLEMENTATION

1. Synthesis Result

All hardware modules are implemented using Verilog

HDL and simulated by the ModelSim 6.5 Simulator for

functional verification. We use a camera with a 24.5454

MHz clock frequency for the implementation, so for

consistency we use a 25 MHz clock frequency for the

simulation.

The hardware module is synthesized by the Synplify

logic synthesis tool from Synplicity. The target device

for the implementation is a Virtex-5 LX330 FPGA from

Xilinx. We use the Xilinx ISE 10.1.3 integration tool for

front-end and back-end implementation. The place and

route phase is performed by Xilinx PAR (Place and

Route Program).

Table 1 shows the device utilization summary reported

by the Xilinx ISE. The number of occupied slices and the

total used memory of the proposed system are 39,914

(about 76% of the device) and 288 KB, as shown in

Table 1. Its equivalent gate count is about 200 K gates.

Table 2 shows the timing summary reported by the

Synplify synthesis tool. The requested frequency is 25.0

Mhz and the maximum estimated frequency of the

implemented system is 106.3 MHz. We use a progressive

camera that has a 24.5454 MHz pixel clock. This camera

captures the VGA image at 60 fps. We can expect a rate

of around 240 fps for the processing of standard VGA

images at the reported maximum estimated frequency of

the system, because the frame rate increases linearly with

the pixel clock increment.

2. System Configuration

Fig. 8 shows the system configuration for testing and

verifying the eye detection system. The overall system

consists of four PCBs (Printed Circuit Board): a

CameraLink interface board, two FPGA boards for face

and eye detection, and an SRAM/DVI interface board.

The CameraLink interface board receives sync signals

and image data signals from the camera using the

CameraLink interface. These signals include pixel clock,

VSYNC (vertical sync - frame valid), HSYNC (horizontal

sync - line valid), DVAL (data valid) and streaming

image data. The FPGA boards grab these signals for use

Su

m::

co

nfi

de

nc

e v

alu

e a

t S

tag

e 1

Su

m::

co

nfi

de

nc

e v

alu

e a

t S

tag

e 2

Su

m::

co

nfi

de

nc

e v

alu

e a

t S

tag

e 3

Fig. 7. Parallel classifier cascade and feature evaluation.

Table 1. Device utilization and equivalent gate count

Logic utilization Used Avail. Util.

Num. of slice registers 45,716 207,360 22%

Num. of slice LUTs 133,916 207,360 64%

Num. of occupied slices 39,914 51,840 76%

Memory (KB) 288 10,368 2%

Equivalent gate-count : about 200 K gates

Table 2. Timing summary

Requested Frequency 25.0 MHz

Estimated Frequency 106.3 MHz

Requested Period 40.000 ns

Estimated Period 9.409 ns

Slack 30.591 ns


in face and eye detection. We use two FPGA boards. One

is used for face detection and the other is used for eye

detection. We use the face detector proposed by Jin et al.

[22] to detect the coordinates of face regions. The single

image frame is stored and updated in SRAM to generate

pyramid downscaling images. The eye detection result is

displayed on the monitor using a DVI interface.

V. EXPERIMENTAL RESULT

The robustness of our system was evaluated using both

the provided image database and real people’s faces

captured by our camera system. Images from the facial

database were downloaded and referred from [23]. We

selected the first directory named 2002-07 in the

database for the experiment. In this directory, there are

934 images containing 1394 faces or left-right eye pairs.

The evaluation was made using four criteria. First, the

precision and recall values were analyzed with respect to

all left-right eye pairs. These values were calculated

based on the number of correct detections (true positives),

the number of missed detection (false negatives), and the

number of wrong detections (false positives). Second, the

result was analyzed when out-of-focus images were not

considered. Similarly, in the two remaining criteria,

rotation-in-plane (RIP) and with-glasses were not

considered, respectively. Fig. 9 shows the experimental

results in four different cases. It is noted that our

implementation does not support the case of rotation-out-

of-plane (ROP); thus, all faces rotated out of the plane

(i.e. a face that contains only one eye) were not

considered in the result evaluation. The experimental

results yielded 93% detection accuracy. Fig. 10 shows

sample image results where faces and eyes were

successfully detected under different conditions.

Fig. 11 and Table 3 show the detection results of real

people’s faces captured by our camera system, with and

without glasses, respectively. The detection rate is 87%

from 1200 frames. As shown in Table 3, the detection

rate of people wearing glasses drops significantly. There

can be light spots or blooming near eyes due to light

reflection caused by glasses. These effects can influence

the detection rate. Generally, the detection rate of people

with glasses is lower than without glasses. Therefore, we

need some additional processing to reduce the effects

caused by glasses.

By considering the detection results for the provided

image database (software implementation) and the real

Fig. 8. System configuration for eye detection.

Fig. 9. Evaluation of 4 different cases: all left-right eye pairs,all except out-of-focus eyes, all except RIP of faces or eyes, allexcept eyes with-glasses.

Fig. 10. Sample results yielded from the face image database.


people’s faces captured by our camera system (hardware

implementation), we can see that the detection rate of the

hardware implementation is lower than that of the

software implementation in the case of with-glasses.

However, the processing time of the hardware system is

much better than that of the software: 0.15 ms/frame

versus 20 ms/frame. We used a personal computer for the

evaluation of the software implementation, which had an

AMD Athlon 64 X2 Dual Core Processor 3800+ 2.00

GHz and 2-GB RAM.

Table 4 shows a comparison chart with other eye

detection algorithms. Our method has the largest image

size and fastest processing speed, even though small

decrease of detection accuracy. Our method is suitable

for hardware implementation.

VI. CONCLUSIONS

In this paper, we proposed a dedicated hardware

architecture for eye detection. We implemented it on an

FPGA. The implemented system processes each pyramid

image of the left and the right eye candidate regions in

parallel. The processing time to detect the left and right

eyes of one face is less than 0.15 ms with a VGA image.

The maximum estimated frequency of the implemented

system is 106.3 MHz. We can expect a rate of around

240 fps for the processing of standard VGA images at the

reported maximum estimated frequency of the system,

because the frame rate increases linearly with the pixel

clock increment.

The detection rate of our system is 93% with the

FDDB face database and 87% for real people's faces.

However, the detection rate for people with glasses drops

significantly. This is a weakness of our proposed system.

In the future, we will add a pre-processing step to reduce

the effects caused by glasses.

ACKNOWLEDGMENTS

This work was supported by the Ministry of

Knowledge Economy and the Korea Institute for

Advancement in Technology through the Workforce

Development Program in Strategic Technology.

REFERENCES

[1] R. Chellappa, C. L. Wilson, and S. Sirohey, “Human

and Machine Recognition of Faces: A Survey”,

Proceedings of the IEEE, Vol.83, No.5, May, 1995.

[2] G. C. Feng, and P. C. Yuen, “Variance projection

(a) Correctly detected results (b) Falsely detected results

Fig. 11. Online detection results of a person with glasses.

Table 3. Detection results for persons with and without glasses

Glasses FramesFalsely

Detected Detection

Rate

Person 1 Yes 300 43 85.67 % No 300 10 96.67 %

Person 2 Yes 300 85 71.67 % No 300 17 94.33 %

Total - 1200 155 87.08 %

Table 4. Performance summary of reported software-based eye detection systems

Eye Detection Software Implementations

Image Size

Detection Rate

FPS Features

K. Peng 2005 [24] 92 112 95.2 % 1 fps Feature Localization, Template Matching

P. Wang 2005 [25] 320 240 94.5 % 3 fps Fisher Discriminant Analysis, AdaBoost, Recursive Nonparametric Discriminant Analysis

Q. Wang 2006 [26] 384 286 95.6 % - Binary Template Matching, Support Vector Machines

T. Akashi 2007 [27] 320 240 97.9 % 35 fps Template Matching, Genetic Algorithm

H.-J. Kim 2008 [28] 352 240 94.6 % 5 fps Zernike moments, Support Vector Machines

M. Shafi 2009 [29] 480 540 87 % 1 fps Geometry, shape, density features

H. Liu 2010 [30] 640 480 94.16 % 30 fps Zernike moments, Support Vector Machines, Template Matching

Ours 640 480 93 % 50 fps AdaBoost, Modified Census Transform, Multi-scale Processing


function and its application to eye detection for

human face recognition”, Pattern Recognition

Letters, Vol.19, No.9, pp.899-906, Jul., 1998.

[3] Q. Ji, and X. Yang, “Real-Time Eye, Gaze, and

Face Pose Tracking for Monitoring Driver

Vigilance”, Real-Time Imaging, Vol.8, No.5,

pp.357-377, Oct., 2002.

[4] Robert J. K. Jacob, “The Use of Eye Movements in

Human-Computer Interaction Techniques: What

You Look At is What You Get,” ACM

Transactions on Information Systems (TOIS), Vol.9,

No.2, Apr., 1991.

[5] Z. Zhu, Q. Ji, “Robust real-time eye detection and

tracking under variable lighting conditions and

various face orientations,” Computer Vision and

Image Understanding, Vol.98, No.1, pp.124-154,

Apr., 2005.

[6] A. L. Yuille, P W. Hallinan, and D. S. Cohen,

“Feature Extraction from Faces Using Deformable

Templates,” International Journal of Computer

Vision, Vol.8, No.2, pp.99-111, 1992.

[7] X. Xie, R. Sudhakar, and H. Zhuang, “On

improving eye feature extraction using deformable

templates,” Pattern Recognition, Vol.27, No.6,

pp.791-799, Jun., 1994.

[8] K. Lam, and H. Yan, “Locating and extracting the

eye in human face images,” Pattern Recognition,

Vol.29, No.5, pp.771-779, May, 1996.

[9] A. Pentland, B. Moghaddam, and T. Starner,

“View-Based and Modular Eigenspaces for Face

Recognition,” in Proceedings of IEEE Conference

on Computer Vision and Pattern Recognition

(CVPR'94), pp.84-91, 1994.

[10] W. Huang, and R. Mariani, “Face Detection and

Precise Eyes Location,” in Proceedings of

International Conference on Pattern Recognition

(ICPR'00), Vol.4, pp.722-727, 2000.

[11] J. Huang, H. Wechsler, “Eye Detection Using

optimal Wavelet Packets and Radial Basis

Functions (RBFs),” International Journal of

Pattern Recognition and Artificial Intelligence,

Vol.13, No.7, pp.1009-1025, 1999.

[12] S. A. Sirohey, A. Rosenfeld, “Eye detection in a

face image using linear and nonlinear filters,”

Pattern Recognition, Vol.34, No.7, pp.1367-1391,

2001.

[13] T. D'Orazio, M. Leo, G.Cicirelli, and A. Distante,

“An Algorithm for real time eye detection in face

images”, in Proceedings of the 17th International

Conference on Pattern Recognition (ICPR'04),

Vol.3, 2004.

[14] K. Lin, J. Huang, J. Chen, and C. Zhou, “Real-time

Eye Detection in Video Streams,” Fourth

International Conference on Natural Computation

(ICNC'08), pp.193-197, Oct., 2008.

[15] A. Amir, L. Zimet, A. Sangiovanni-Vincentelli, and

S. Kao, “An embedded system for an eye-detection

sensor”, Computer Vision and Image Understanding,

Vol.98, No.1, pp.104-123. Apr., 2005.

[16] B. Jun, and D. Kim, “Robust Real-Time Face

Detection Using Face Certainty Map”, Lecture

Notes in Computer Science, Vol.4642, pp.29-38,

2007.

[17] I. Choi, and D. Kim, “Eye Correction Using

Correlation Information,” Lecture Notes in

Computer Science, Vol.4843, pp.698-707, 2007.

[18] B. Fröba, and A. Ernst, “Face Detection with the

Modified Census Transform,” in Proceedings of

the Sixth IEEE International Conference on

Automatic Face and Gesture Recognition (FGR'04),

pp.91-96, May 2004.

[19] Y. Freund, and R. E. Schapire, “A Decision-

Theoretic Generalization of On-Line Learning and

an Application to Boosting,” Journal of Computer

and Systems Sciences, Vol.55, pp.119-139, Aug.,

1997.

[20] Y. Freund, and R. E. Schapire, “A Short

Introduction to Boosting,” Journal of Japanese

Society for Artificial Intelligence, Vol.14, No.5,

pp.771-780, 1999.

[21] P. Viola and M. Jones, “Fast and Robust

Classification using Asymmetric AdaBoost and a

Detector Cascade,” Advances in Neural

Information Processing Systems, Vol.2, pp.1311-

1318, 2002.

[22] S. Jin, D. Kim, T. T. Nguyen, D. Kim, M. Kim and

J. W. Jeon, “Design and Implementation of a

Pipelined Datapath for High-Speed Face Detection

Using FPGA,” IEEE Transaction on Industrial

Informatics, Vol.8, No.1, pp.158-167, Feb., 2012.

[23] V. Jain and E. L. Miller, “FDDB: A Benchmark for

Face Detection in Unconstrained Setting,” Technical

Report UM-CS-2010-009, Dept. of Computer

Science, University of Massachusetts, Amherst,


2010. http://vis-www.cs.umass.edu/fddb

[24] K. Peng, L. Chen, S. Ruan, and G. Kukharev, “A

Robust Algorithm for Eye Detection on Gray

Intensity Face without Spectacles,” Journal of

Computer Science and Technology, Vol.5, No.3,

pp.127-132, 2005.

[25] P. Wang and Qiang Ji, “Learning Discriminant

Features for Multi-View Face and Eye Detection,”

in Proceedings of IEEE Conference on Computer

Vision and Pattern Recognition, 2005.

[26] Q. Wang and J. Yang, “Eye Detection in Facial

Images with Unconstrained Background,” Journal

of Pattern Recognition Research, Vol.1, pp.55-62,

2006.

[27] T. Akashi, Y. Wakasa, K. Tanaka, S. Karungaru,

and M. Fukumi, “Using Genetic Algorithm for Eye

Detection and Tracking in Video Sequence,”

Journal of Systemics, Cybernetics, and Informatics,

Vol.5, No.2, pp.72-78, 2007.

[28] H.-J. Kim and W.-Y. Kim, “Eye Detection in

Facial Images using Zernike Moments with SVM,”

ETRI Journal, Vol.30, No.2, 2008.

[29] M. Shafi and P. W. H. Chung, “A Hybrid Method

for Eyes Detection in Facial Images,” International

Journal of Electrical, Computer, and Systems

Engineering, Vol.42, pp.231–236, 2009.

[30] H. Liu and Q. Liu, “Robust Real-Time Eye

Detection and Tracking for Rotated Facial Images

under Complex Conditions,” in Proceedings of

International Conference on Natural Computation,

2010.

Dongkyun Kim received B.S. and

M.S. degrees in electrical and computer

engineering from Sungkyunkwan

University, Suwon, Korea, in 2007

and 2009, respectively. He is currently

working toward a Ph.D. degree with

the School of Information and

Communication Engineering, Sungkyunkwan University.

His research interests include image/speech signal

processing, embedded systems, and real-time applications.

Junhee Jung received a B.S. degree

in the School of Information and

Communication Engineering from

Sungkyunkwan University, Korea, in

2009 and an M.S. degree in the

Department of Electrical and Computer

Engineering from Sungkyunkwan

University, Korea, in 2011, respectively.

His interests include image processing, computer vision

and SoC.

Thuy Tuong Nguyen received a B.S.

(magna cum laude) degree in computer

science from Nong Lam University,

Ho Chi Minh City, Vietnam, in 2006,

M.S. and Ph.D. degrees in electrical

and computer engineering from

Sungkyunkwan University, Suwon,

Korea, in 2009 and 2012, respectively.

His research interests include computer vision, image

processing, and graphics processing unit computing.

Daijin Kim received a B.S. degree in

electronic engineering from Yonsei

University, Seoul, Korea, in 1981, an

M.S. degree in electrical engineering

from the Korea Advanced Institute of

Science and Technology (KAIST),

Taejon, 1984, and a Ph.D. degree in

electrical and computer engineering from Syracuse

University, Syracuse, NY, in 1991.

He is currently Professor in the Department of Computer

Science Engineering at POSTECH. His research interests

include intelligent fuzzy system, soft computing, genetic

algorithms, and hybridization of evolution and learning.


Munsang Kim received his B.S. and

M.S. degrees in mechanical

engineering from the Seoul National

University in 1980 and 1982,

respectively, and the Dr.-Ing. degree

in robotics from the Technical

University of Berlin, Germany, in

1987. Since 1987, he has been working as a research

scientist at the Korea Institute of Science in Korea.

He leads the Advanced Robotics Research Center since

2000 and became the director of the Intelligent Robot-

The Frontier 21 Program since 2003. His current research

interests are the design and control of novel mobile

manipulation systems, haptic device design and control,

and sensor application in intelligent robots.

Key Ho Kwon received a B.S. degree

from the Department of Electronics

Engineering, Seoul National University,

Korea, in 1975, an M.S. degree from

the Department of Electronics

Engineering, Seoul National University,

Korea, in 1978, and a Ph.D. degree

from the Department of Electronics Engineering, Seoul

National University, Korea, in 1989. He is currently a

Professor at the School of Information and Communication

Engineering, Sungkyunkwan University.

His research interests include artificial intelligence, fuzzy

theory, neural networks, and genetic evolutionary

algorithms.

Jae Wook Jeon received B.S. and

M.S. degrees in electronics engineering

from Seoul National University,

Seoul, Korea, in 1984 and 1986,

respectively, and a Ph.D. degree in

electrical engineering from Purdue

University, West Lafayette, IN, in

1990.

From 1990 to 1994, he was a Senior Researcher with

Samsung Electronics, Suwon, Korea. Since 1994, he has

been with the School of Electrical and Computer

Engineering, Sungkyunkwan University, Suwon, first as

an Assistant Professor, and currently as a Professor. His

research interests include robotics, embedded systems,

and factory automation.

Documents

An FPGA-based Parallel Hardware Architecture for Real-time ... · presented an eye detection method using AdaBoost training with MCT-based eye features [17]. The MCT-based AdaBoost