62
HUMAN COMPUTER INTERFACE BASED ON FACE TRACKING FOR PHYSICALLY CHALLENGED USERS A PROJECT REPORT Submitted by ABDUL ASIM A. AFSHAN S. ANAND R. in partial fulfillment for the award of the degree of BACHELOR OF TECHNOLOGY in INFORMATION TECHNOLOGY

Project Report

Embed Size (px)

DESCRIPTION

Abhishek Gupta+91-9415858804+91-9307920131Price - 600/-If you Want a new project & assingment so u r call me orMail - [email protected]

Citation preview

Page 1: Project Report

HUMAN COMPUTER INTERFACE BASED ON

FACE TRACKING FOR PHYSICALLY

CHALLENGED USERS

A PROJECT REPORT

Submitted by

ABDUL ASIM A.

AFSHAN S.

ANAND R.

in partial fulfillment for the award of the degree

of

BACHELOR OF TECHNOLOGY

in

INFORMATION TECHNOLOGY

B.S.ABDUR RAHMAN CRESCENT ENGINEERING COLLEGE,

VANDALUR

ANNA UNIVERSITY: CHENNAI 600 025

APRIL 2009

Page 2: Project Report

ANNA UNIVERSITY : CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “HUMAN COMPUTER INTERFACE BASED

ON FACE TRACKING FOR PHYSICALLY CHALLENGED USERS” is the

bonafide work of “ABDUL ASIM A. (40405205001), AFSHAN S. (40405205005) &

ANAND R. (40405205008)” who carried out the project work under my

supervision.

SIGNATURE SIGNATURE

Dr. T.R RANGASWAMY Dr. ANGELINA GEETHA

HEAD OF THE DEPARTMENT SUPERVISOR

Professor

Department of Information Technology Department of Computer Science

B.S.A CRESCENT ENGINEERING COLLEGE B.S.A CRESCENT ENGINEERING COLLEGE

SEETHAKATHI ESTATE SEETHAKATHI ESTATEG.S.T. Road, Vandalur, G.S.T. Road, Vandalur,Chennai - 600 048, India Chennai - 600 048, India

ii

Page 3: Project Report

ANNA UNIVERSITY : CHENNAI 600 025

VIVA VOCE EXAMINATION

The viva-voce examination of the following students who have submitted the

project work “HUMAN COMPUTER INTERFACE BASED ON FACE TRACKING

FOR PHYSICALLY CHALLENGED USERS” is held on _____________

ABDUL ASIM A. (40405205001)

AFSHAN S. (40405205005)

ANAND R. (40405205008)

INTERNAL EXAMINER EXTERNAL EXAMINER

iii

Page 4: Project Report

ACKNOWLEDGEMENT

We are grateful to our Principal, Dr. V. M. PERIASAMY, B.S.A. Crescent

Engineering College, for providing us an excellent environment to carry out our course

successfully.

We are deeply indebted to our beloved Head of the Department, Dr. T. R.

RANGASWAMY, Department of Information Technology, who moulded us both

technically and morally for achieving greater success in life.

We express our thanks to our project coordinator Ms. R. REVATHY, Senior Lecturer,

Department of Information Technology, for her valuable suggestions at every stage of

our project.

We record our sincere thanks to our guide Dr. ANGELINA GEETHA, Professor,

Department of Computer Science, for being instrumental in the completion of our

project with her exemplary guidance.

We thank all the staff members of our department for their valuable support and

assistance at various stages of our project development.

iv

Page 5: Project Report

TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

ABSTRACT vii

LIST OF TABLE viii

LIST OF FIGURES ix

LIST OF ABBREVATIONS x

1. INTRODUCTION

1.1 Feature Detection 1

1.2 Face Detection 3

1.3 Algorithms on Face Detection 3

1.4 Human Computer Interface for 4

physically challenged users

1.5 HCI based on Mouse Movements 5

1.6 Related Works 6

2. PROBLEM DEFINITION 8

3. DEVELOPMENT PROCESS 9

3.1 Requirement Analysis and Specification 9

3.1.1 Input Requirements

3.1.2 Output Requirements 10

3.1.3 Functional Requirements 10

3.2 Resource Requirements 10

3.2.1 Hardware 11

3.2.2 Software 11

v

Page 6: Project Report

3.3 Design 12

3.3.1 System Architecture 12

3.3.2 Detailed Design 13

3.3.2.1 User Interface 14

3.3.2.2 Module Description 14

3.4 Implementation 19

3.5 Testing 23

4. APPLICATION AND FUTURE ENHANCEMENTS 25

5. CONCLUSION 26

APPENDIX A – SCREENSHOTS 27

REFERENCES 35

vi

Page 7: Project Report

ABSTRACT

Physically challenged people find it difficult to use a computer because

information is presented in an inaccessible form to them. Though many forms of

computer access are available for disabled people, these systems are expensive and

require sophisticated hardware support. In this context, this system focuses on helping

quadriplegic and non-verbal users. The challenge is to develop a Human Computer

Interface for such users which is inexpensive and easy to implement. Human Computer

Interface is a discipline concerned with the design, evaluation and implementation of

interactive computing systems for human use and with the study of major phenomena

surrounding them. We propose an interface for people with severe disabilities based on

face tracking. Body features like the eyes and the lips may also be used for implementing

a human computer interface but with some limitations. In eye tracking, the motion of the

pupil is hard to track with a web camera which would be the primary mode of input in the

proposed system. For a physically challenged user, moving the face itself demands

greater effort and hence finer intricacies eyeball and lip movement cannot be considered.

The system depends on a web camera for input and hence would be affordable by the

target users. User friendliness is enhanced as the system is devoid of any sophisticated

hardware requirement.

vii

Page 8: Project Report

LIST OF TABLES

S.No Table Name Page No

1. Hardware Resource requirement table 11

2. Software Resource requirement table 11

viii

Page 9: Project Report

LIST OF FIGURES

Figure.No Figure Name Page No

1.1 Head tracking system 2

3.1 Architecture diagram 12

3.2 System flow diagram 15

3.3 Code snippet for webcam capture 16

3.4 Code snippet for face detection 17

3.5 Code snippet for mouse pointer movement 18

3.6 Code snippet for playing video clips 18

3.7 Message Board 19

3.8 Algorithm flow diagram 22

ix

Page 10: Project Report

LIST OF ABBREVIATIONS

S.No Acronym Expansion

1. CAMSHIFT Continuous Adaptive Mean Shift

2. HCI Human Computer Interface

3. SDLC Software Development Life Cycle

4. GUI Graphical User Interface

5. MFC Microsoft Class Foundation

6. CLR Common Language Runtime

7. ATL Active Template Library

8. OpenCV Open Computer Vision

9. COM Component Object Model

x

Page 11: Project Report

1. INTRODUCTION

1.1 Feature Detection

Feature detection is a process by which specialized nerve cells in the brain respond

to specific features of a visual stimulus, such as lines, edges, angle, or movement. The

nerve cells fire selectively in response to stimuli that have specific characteristics.

Feature detection was discovered by David Hubel and Torsten Wiesel of Harvard

University.

In computer vision and image processing the concept of feature detection refers to

methods that aim at computing abstractions of image information and making local

decisions at every image point whether there is an image feature of a given type at that

point or not. The resulting features will be subsets of the image domain, often in the form

of isolated points, continuous curves or connected regions.

Feature detection is a low-level image processing operation. That is, it is usually

performed as the first operation on an image, and examines every pixel to see if there is a

feature present at that pixel. If this is part of a larger algorithm, then the algorithm will

typically only examine the image in the region of the features. As a built-in pre-requisite

to feature detection, the input image is usually smoothed by a Gaussian kernel in a scale-

space representation and one or several feature images are computed, often expressed in

terms of local derivative operations. Occasionally, a higher level algorithm may be used

to guide the feature detection stage, so that only certain parts of the image are searched

for features.

Once features have been detected, a local image patch around the feature can be

extracted. This extraction may involve quite considerable amounts of image processing.

The result is known as a feature descriptor or feature vector.

Types of tracking:xi

Page 12: Project Report

Eye Tracking:

Eye tracking is the process of measuring either the point of gaze or the

motion of an eye relative to the head. An eye tracker is a device for measuring eye

positions and eye movements. There are a number of methods for measuring eye

movements. The most popular variant uses video images from which the eye position is

extracted. Other methods use search coils or are based on the electro-oculogram. Two

general types of eye tracking techniques are used: Bright Pupil and Dark Pupil. Their

difference is based on the location of the illumination source with respect to the optics. If

the illumination is coaxial with the optical path, then the eye acts as a retro-reflector as

the light reflects off the retina creating a bright pupil effect similar to red eye. If the

illumination source is offset from the optical path, then the pupil appears dark because

the retro-reflection from the retina is directed away from the camera.

Head Tracking:

Head tracking technology consists of a device transmitting a signal from atop the

computer monitor and tracking a reflector placed on the user's head or eyeglasses. A

mouse alternative as this allows the person to control the mouse cursor by moving his/her

head. Once calibrated, the movement of the user's head relates to what direction the

onscreen cursor will travel. An example of a head tracking system is given in Figure 1.1.

Figure 1.1: Head tracking system

1.2 Face Detection

xii

Page 13: Project Report

Face detection is a computer technology that determines the locations and sizes of

human faces in arbitrary (digital) images. It detects facial features and ignores anything

else, such as buildings, trees and bodies.

Face detection can be regarded as a more general case of face localization; In face

localization, the task is to find the locations and sizes of a known number of faces

(usually one). In face detection, one does not have this additional information.

Early face-detection algorithms focused on the detection of frontal human faces,

whereas newer algorithms attempt to solve the more general and difficult problem of

multi-view face detection which is the detection of faces that are either rotated along the

axis from the face to the observer (in-plane rotation), or rotated along the vertical or left-

right axis (out-of-plane rotation) or both.

Face detection is used in biometrics, often as a part of (or together with) a facial

recognition system. It is also used in video surveillance, human computer interface and

image database management. Some recent digital cameras use face detection for

autofocus. Also, face detection is useful for selecting regions of interest in photo

slideshows that use a pan-and-scale effect.

1.3 Algorithms on Face Detection

Neural Network-Based Face Detection by Rowley, Baluja and Kanade:

This is a neural network-based algorithm to detect upright, frontal views of faces in

gray-scale images. The algorithm works by applying one or more neural networks

directly to portions of the input image, and arbitrating their results. Each network is

trained to output the presence or absence of a face. The algorithms and training methods

are designed to be general, with little customization for faces. Many face detection

researchers have used the idea that facial images can be characterized directly in terms of

pixel intensities. These images can be characterized by probabilistic models of the set of

xiii

Page 14: Project Report

face images or implicitly by neural networks or other mechanisms. The parameters for

these models are adjusted either automatically from example images or by hand.

Algorithm by Henry Schneiderman and Takeo Kanade

This algorithm is a statistical method for three dimensional object detection. The

statistics of both object appearance and non-object is represented using histograms. Each

histogram represents the joint statistics of a subset of wavelet coefficients and their

position on the object. This approach uses many such histograms to represent a wide

variety of visual attributes. The algorithm is the first of its kind to reliably detect human

faces with out-of-plane rotation.

CAMSHIFT Algorithm

CAMSHIFT stands for "Continuously Adaptive Mean Shift.". It combines the

basic Mean Shift algorithm with an adaptive region-sizing step. The kernel is a simple

step function applied to a skin-probability map. The skin probability of each image pixel

is based on color using a method called histogram back projection. Color is represented

as Hue from the HSV color model. While it is a very fast and simple method of tracking,

because CAMSHIFT tracks the center and size of the probability distribution of an

object, it is only as good as the probability distribution that is produced for the object.

1.4 Human Computer Interaction for the Physically Challenged

Human–computer interface (HCI) is the study of interaction between people

(users) and computers. It is often regarded as the intersection of computer science,

behavioral sciences, design and several other fields of study. Interaction between users

and computers occurs at the user interface (or simply interface), which includes both

software and hardware, for example, general-purpose computer peripherals and large-

scale mechanical systems, such as aircraft and power plants.

xiv

Page 15: Project Report

Persons with severe motion impairment like biplegics, quadriplegics etc. face

difficulty in accessing computer-based systems since they cannot use conventional

computer access devices like mouse or keyboards. Alternate computer interfaces based

on tracking of body features needs to be developed for these users. The challenge lies in

designing a system which would serve as a general interface between computers and

physically challenged users.

1.5 HCI Based on Mouse Movements:

Pointing devices like the mouse and trackball enables users to control a pointer and

interact with a graphical user interface. The current human-computer interaction mode,

based primarily on the message board and the mouse, has seen little change since the

advent of modern computing. Currently computers come with cameras as standard

equipment. Hence it is desirable to employ them in designing next-generation human

computer interaction devices. The feasibility of interfaces based on speech driven input

has also been extensively investigated.

Relying on input based on human features has opened up the possibility of

developing interfaces for people who cannot use the keyboard or mouse due to severe

disabilities. Such systems make use of human features such as the head, eyes, lips or face

for tracking the movement of the user and translating the movements into mouse

movements on the screen. The purpose of this project is to develop an interface for

quadriplegic and non-verbal users.

1.6 Related Work

In the works of James Gips, Margrit Betke and Peter Fleming (2000), preliminary

investigations have been carried out for the design of a human computer interface for xv

Page 16: Project Report

quadriplegic and non-verbal users. The system has been broken down into two main

components. The first component is the Vision Computer which receives real-time input

from a camera mounted on the monitor. The second component is the User’s Computer

which runs a special driver program in the background to translate the user’s movement

from the input device into mouse movements on the screen.

A camera mouse system was developed by James Gips, Margrit Betke and Peter

Fleming (2002). The system makes use of body features like the tip of the user’s nose or

finger or face to track the position of the mouse. Various body features are examined for

tracking reliability and user convenience. The visual tracking algorithm used in this

system is based on cropping an online template of the tracked feature from the current

image frame and testing where this template correlates in the subsequent frame. The

location of the highest correlation is interpreted as the new location of the feature in the

subsequent frame. Our system takes into consideration, part of the modules of the

algorithm for regular updating of the image frames.

We study the working of the CAMSHIFT algorithm proposed by Gary R.Bradski

(1998) to develop a Perceptual User Interface. Perceptual interfaces are the ones in which

the computer is given the ability to sense and produce analogs of the human senses. The

CAMSHIFT algorithm is a modification of the mean shift algorithm which is based on

probability distributions. The Continuous Adaptive Mean Shift (CAMSHIFT) algorithm

deals with dynamically changing color probability distributions derived from video

frames. Since CAMSHIFT relies on color distribution alone, errors in color will cause

errors in tracking.

A face detection algorithm based on skin color has been proposed by Sanjay Singh,

D.S. Chauhan, Mayank Vatsa and Richa Singh (2003). The authors have discussed

various algorithms based on skin color. Three main color spaces of RGB, YCbCr and

HIS have been combined to get a new skin color based face detection algorithm which

achieves higher accuracy. Our system involves face localization discussed in this

publication.

xvi

Page 17: Project Report

In the works of Rajesh Kumar and Anupam Kumar (2008), alternate input systems

to replace the traditional mouse and keyboard are discussed. The authors have developed

an input system which uses the head and eyes to track the movements of the user. The

algorithm is based upon image matching using correlation coefficients. The system

comprises of an image tracer module and cursor position is determined by calculating

correlation coefficient of tracing window in image space.

Ian R. Fasel and Javier R. Movellan (2002) have conducted a comprehensive

analysis of some techniques used in neutrally inspired face detectors. Algorithms such as

SNoW, AdaBoost and Bootstrap have been studied. The AdaBoost algorithm is based on

active sampling of images whereas its counterparts use random sampling. It has been

experimentally proven that Adaboost delivered consistent performance under various

conditions.

In the works of Zhaomin Zhu, Takashi Morimoto, Hidekazu Adachi, Osamu

Kiriyama , Tetsushi Koide and Hans Juergen Mattausch (2005), a face detection system

has been proposed based on Haar- like features. The detection technique is based on the

idea of the wavelet template that defines the shape of an object in terms of a subset of the

wavelet coefficients of the image. The object in this case is the human face.

Our system makes use of the Haar face detection algorithm to recognize and track

faces from real time video input. The main tasks involved are webcam capture, face

detection and translation of facial movements into mouse movements. A web camera is a

low-resolution capture device. The Haar face detection algorithm processes the video

feed using a large number of evaluations called classifiers to localize faces. This helps in

achieving a high degree of accuracy.

2. PROBLEM DEFINITION

xvii

Page 18: Project Report

People with severe disabilities resulting from birth or accidents or from

degenerative diseases and bed ridden patients have been excluded from access to

computers and even lack proper means of communication with fellow human beings.

Information is presented in an inaccessible form to them. They are unable to speak and

have very little or no voluntary muscle control. In most cases, these people are able to

move only their heads. Their level of mental functioning might not be known because of

their inability to communicate. People with severe physical disabilities often are isolated,

spending hours in bed or in a wheelchair at home or in an institutional setting.

Computer and communication technology can make all the difference in the world

for people with profound physical disabilities. Our approach is to develop a computer

interface for the disabled using facial tracking. The challenge is to develop a low cost

system devoid of any sophisticated hardware for input. The system should be free from

any special hardware to track the desired feature as this may cause inconvenience to the

user.

The facial movements of the user are captured using a webcam and translated into

mouse pointer movements after preprocessing and applying face detection algorithm.

Thus by moving the face, the user would be able to control the mouse. The interface

contains options for raising an alarm, summoning a nurse and playing audio and video for

entertainment. An on-screen message board has also been provided to enable the user to

communicate effectively.

3. DEVELOPMENT PROCESS

xviii

Page 19: Project Report

A software development process is a structure imposed on the development of a

software product. The activities concerned with the development of a software are

collectively known as Software Development Life Cycle (SDLC). SDLC is any logical

process used by a systems analyst to develop an information system, including

requirements, validation, training, and user ownership. An SDLC should result in a high

quality system that meets or exceeds customer expectations, reaches completion within

time and cost estimates, works effectively and efficiently in the current and planned.

3.1 Requirement Analysis and Specifications

The requirement engineering process consists of feasibility study, requirements

elicitation and analysis, requirements specification, requirements validation and

requirements management. Requirements elicitation and analysis is an iterative process

that can be represented as a spiral of activities, namely requirements discovery,

requirements classification and organization, requirements negotiation and requirements

documentation.

3.1.1 Input Requirements

The input for the human computer interface will be obtained from a web camera.

Since the interface would solely depend on the camera, care should be taken in choosing

the computer camera. A web camera is chosen over other mediums of video capture for

two reasons. First, a web camera is less expensive compared to other visual input devices

and this makes the system affordable to every individual. Also the web camera does not

require any specialized drivers or software support and this makes it easy for the

developer to access real-time video feeds.

3.1.2 Output Requirementsxix

Page 20: Project Report

The output will be the movement of the mouse pointer on the interface. The video

stream from the camera will be displayed at the center of the interface along with the

tracking of the face.

3.1.3 Functional Requirements

The facial movements of the user are captured through the camera in Visual C++.

The live video stream is fed to the face detection algorithm. The detected face is given as

input to the tracker module which translates the facial movements into mouse pointer

movements. This can be then be used to access the user interface.

3.2 RESOURCE REQUIREMENTS

Software requirements is a sub-field of Software engineering that deals with the

elicitation, analysis, specification, and validation of requirements for software.

Requirements analysis in systems engineering and software engineering, encompasses

those tasks that go into determining the needs or conditions to meet for a new or altered

product, taking account of the possibly conflicting requirements of the various

stakeholders, such as beneficiaries or users. Requirements analysis is critical to the

success of a development project. Requirements must be actionable, measurable, testable,

related to identified business needs or opportunities, and defined to a level of detail

sufficient for system design.

3.2.1 Hardware

The minimum hardware requirements for this project are listed in Table 1.

Table 1: Hardware Requirements

xx

Page 21: Project Report

Hardware Requirement

Processor Intel Pentium IV or AMD – 1.8 GHz

Memory 1 GB RAM

Hard Disk 1 GB

Video Capture Device

(Input)

Logitech or Microsoft web camera

3.2.2 Software

The minimum software requirements for this project are listed in Table 2.

Table 2: Software Requirements

Software Requirement

Operating System Windows 2000/XP

Runtime Package Microsoft Visual C++, Intel OpenCV

Webcam Drivers Logitech/Microsoft SDK

3.3 DESIGN

Software design is a process of problem-solving and planning for a software

solution. After the purpose and specifications of software are determined, software

developers will design or employ designers to develop a plan for a solution. It includes

xxi

Page 22: Project Report

low-level component and algorithm implementation issues as well as the architectural

view.

3.3.1 System Architecture

Figure 3.1: Architecture Diagram

The architecture of the system is represented in Figure 3.1. The system receives

real time input from the user via a web camera. The vide o stream is accessed via the

webcam capture module. The vendor-supplied webcam software cannot be used for

interfacing the webcam and the face detection module.

The input from the camera is given to the face detection module. The core of the

face detection module contains the algorithm which works on localizing the facial

xxii

Page 23: Project Report

segments from the rest of the image. The algorithm is adapted to detect faces from

streaming video feeds.

After the face has been detected in the video stream, the movements of the face are

translated into mouse cursor movements on the screen and updated accordingly in real-

time. The position of the face is converted into onscreen coordinates and this is mapped

into mouse pointer coordinates in the tracker module. Hence, when the user moves his

face, the mouse cursor is moved correspondingly. This tracking module is interfaced with

the Graphical User Interface (GUI). Using the mouse movements, the user can interact

with the application interface.

3.3.2 Detailed Design

Our system provides an efficient way for bed ridden people to interact with a

computer and also provides an efficient communication system. The main tasks to be

accomplished in the development of the proposed system are as follows:

• Accessing the video stream from the video camera in real time

• Detecting the facial motion from the captured video

• Development of the user interface to aid the target users

• Translating the facial motion into an input format which can be used to

manipulate the user interface

• Triggering of control signals based on the translated input format

3.3.2.1 User Interface

The system has been developed in Microsoft Visual C++. The system can be

executed by running the project executable file. The web camera has to be setup and

xxiii

Page 24: Project Report

initialized before executing the system. The system will automatically detect the web

camera provided there is only one active camera at execution time.

The web camera must be fixed and focused on the facial region of the target user.

Care should be taken to align the camera in this way. The system tracks the signals

captured by the web camera, analyses and detects the face region. As the video stream

progresses, by applying the algorithm, the facial movement is detected. Once face

detection has been established, control passes to the mouse pointer and the user is able to

move the mouse pointer by moving his/her face.

At the center of the interface is a display window which shows the real time video

stream from the web camera. It displays the detected face which is updated constantly in

real-time. The interface has buttons to invoke various functions. The user is able to raise

an alarm, summon a nurse or play audio and video for entertainment purpose. An

onscreen message board can also be invoked for communication purposes. The invoked

function can be stopped using the stop button and the application can be closed using the

exit button provided in the interface.

3.3.2.2 Module Description

The basic flow of the system is represented in Figure 3.2. The Human computer

interface for physically challenged users is made possible by the video feed from the web

camera. The modules of the proposed system are as follows:

1. Webcam Capture module

2. Face Detector

3. Tracker module

xxiv

Page 25: Project Report

4. Application Interface

Figure 3.2: System Flow diagram

Webcam Capture module:

The input for the system is captured using the web camera. Lighting conditions

should also be favourable. The bundled software supplied with the camera can be used to

capture images and video. But this cannot be interfaced with the application to be

developed. Thus we capture the video stream from the camera in Visual C++ using

Microsoft DirectShow. Microsoft DirectShow is a part of the Microsoft Direct X SDK. It

is a set of low-level application programming interfaces for creating games and other

high performance multimedia applications. DirectShow automatically detects and uses

audio and video acceleration whenever available. The captured video stream is displayed

xxv

Page 26: Project Report

at the center of the user interface. The video stream is given as input to the face detection

module. The code for webcam capture is given in Figure 3.3.

// Capture from the camera

capture = cvCaptureFromCAM(-1);

// Capture the frame and load it in IplImage

frame = cvRetrieveFrame( capture );

// Allocate framecopy as the same size of the frame

if( !frame_copy )

frame_copy = cvCreateImage( cvSize(frame->width,frame->height),

IPL_DEPTH_8U, frame->nChannels );

Figure 3.3 : Code snippet for webcam capture

Face Detector module:

The facial movements of the user are captured from the web camera and given to

the face detector module. The algorithm used in our system is the Multi-view Face

Detection and Recognition Algorithm using Haar-like Features. Haar-like features are

digital image features used in object recognition. They owe their name to their intuitive

similarity with Haar wavelets. The feature set considers rectangular regions of the image

and sums up the pixels in this region. This sum is used to categorize images. We could

thus categorize all images whose Haar-like feature in this rectangular region to be in a

certain range of values as one category and those falling out of this range in another. This

might roughly divide the set of images into ones having a lot of faces and the ones not

having faces. We could thus categorize all images whose Haar-like feature in this

rectangular region to be in a certain range of values as one category and those falling out

xxvi

Page 27: Project Report

of this range in another. This might roughly divide the set of images into ones having a

lot of faces. Once the face has been detected, a coloured box is drawn around the face to

localize it. The algorithm constantly localizes the face in the dynamic video stream.

const char* cascade_name = "haarcascade_frontalface_alt.xml";

// Create a new image based on the input image

IplImage* temp = cvCreateImage( cvSize(img->width/scale,img->height/scale), 8, 3 );

// Detect the objects

CvSeq*faces=cvHaarDetectObjects(img,cascade,storage,1.1,2, CV_HAAR_DO_CANNY_PRUNING, cvSize(40, 40) );

Figure 3.4 : Code snippet for face detection

Tracker module:

The face detector module draws a square around the localized face. The

coordinates of the square are passed as coordinates to the SetCursor function. This

enables the mouse pointer to move when the user moves his/her face. The coordinates are

multiplied by a scaling factor in order to enhance mouse movement. Mouse clicking

function is implemented using a time delay. When the mouse pointer hovers over a

button for a specified time, the button gets clicked. The code snippet for mouse control is given

in Figure 3.5.

//face coordinates

pt1.x = r->x*scale;

pt2.x = (r->x+r->width)*scale;

pt1.y = r->y*scale;

pt2.y = (r->y+r->height)*scale;

xxvii

Page 28: Project Report

pt3.x=(pt1.x)*7;

pt3.y=(pt1.y)*7;

SetCursorPos(pt3.x,pt3.y);

//mouse clicking

mouse_event(MOUSEEVENTF_LEFTDOWN,0,0,0,GetMessageExtraInfo());

mouse_event(MOUSEEVENTF_LEFTUP,0,0,0,GetMessageExtraInfo());

Figure 3.5 : Code snippet for mouse pointer movement

Application Interface:

The user interface is a Microsoft Class Foundation (MFC) Dialog based

application built in VC++. A face tracking display is present at the center of the user

interface to display the facial movements of the user. The following function buttons are

present around the face tracking display.

Emergency button - raises an alarm when clicked

Video - plays a small video clip as entertainment. The code

snippet for playing videos is given in Figure 3.6.

clock1=MCIWndCreate(GetSafeHwnd(),AfxGetInstanceHandle(), WS_VISIBLE|WS_CHILD,"globe.avi");

Figure 3.6 : Code snippet for playing video clips

Audio - plays a small audio clip as entertainment

Message board - enables the user to display small messages to express their

needs. A screenshot is provided in Figure 3.7

xxviii

Page 29: Project Report

Figure 3.7 : Message board

Stop - stops the currently invoked function

Exit - used to exit the application

3.4 IMPLEMENTATION

Software implementation involves compilation and execution of the designed

system. Modular and subsystem programming code will be accomplished during this

stage. Unit testing and module testing are done in this stage by the developers. This stage

is intermingled with the next in that individual modules will need testing before

integration to the main project. Planning in software life cycle involves setting goals,

defining targets, establishing schedules, and estimating budgets for an entire software

project.

Microsoft Visual C++

xxix

Page 30: Project Report

Microsoft Visual C++ 2005 provides a powerful and flexible development

environment for creating Microsoft Windows–based and Microsoft .NET–based

applications. It can be used as an integrated development system, or as a set of individual

tools. Visual C++ is comprised of these components:

The Visual C++ 2005 compiler tools - The compiler has new features supporting

developers that target virtual machine platforms like the Common Language Runtime

(CLR) . There are now compilers to target x64 and Itanium. The compiler continues to

support targeting x86 machines directly, and optimizes performance for both platforms.

The Visual C++ 2005 Libraries - This includes the industry-standard Active

Template Library (ATL) , the MFC libraries, and standard libraries such as the Standard

C++ Library, and the C RunTime Library, which has been extended to provide security

enhanced alternatives to functions known to pose security issues. A new library, the C++

Support Library, is designed to simplify programs that target the CLR.

The Visual C++ 2005 Development Environment - Although the C++ compiler

tools and libraries can be used from the command-line, the development environment

provides powerful support for project management and configuration (including better

support for large projects), source code editing, source code browsing, and debugging

tools. This environment also supports IntelliSense, which makes informed, context-

sensitive suggestions as code is being authored.

In addition to conventional graphical user-interface applications, Visual C++

enables developers to build Web applications, smart-client Windows-based applications,

and solutions for thin-client and smart-client mobile devices. C++ is the world's most

popular systems-level language, and Visual C++ gives developers a world-class tool with

which to build software.

Intel OpenCV Library

xxx

Page 31: Project Report

The Intel Open Source Computer Vision (OpenCV) library is a computer vision

library originally developed by Intel. It is free for commercial and research use under a

BSD license. The library is cross-platform, and runs on Windows, Mac OS X, Linux,

PSP, VCRT (Real-Time OS on Smart camera) and other embedded devices. It focuses

mainly on real-time image processing, as such, if it finds Intel's Integrated Performance

Primitives on the system, it will use these commercial optimized routines to accelerate

itself. Officially launched in 1999, the OpenCV project was initially an Intel Research

initiative to advance CPU-intensive applications, part of a series of projects including

real-time ray tracing and 3D display walls. The library is mainly written in C, which

makes it portable to some specific platforms such as Digital signal processor. But

wrappers for languages such as C# and Python have been developed to encourage

adoption by a wider audience. Our system makes use of some functions present in this

library in the form of DLLs.

Microsoft DirectShow:

DirectShow codename Quartz, is a multimedia framework and API produced by

Microsoft for software developers to perform various operations with media files or

streams. It is the replacement for Microsoft's earlier Video for Windows technology.

Based on the Microsoft Windows Component Object Model (COM) framework,

DirectShow provides a common interface for media across many programming

languages, and is an extensible, filter-based framework that can render or record media

files on demand at the request of the user or developer. The DirectShow development

tools and documentation were originally distributed as part of the DirectX SDK.

Currently, they are distributed as part of the Windows SDK. DirectShow's counterparts

on other platforms include Apple's QuickTime framework and various Linux multimedia

frameworks such as GStreamer or Xine.

xxxi

Page 32: Project Report

Working of the Algorithm:

The algorithm used in our system is the Multi-view Face Detection and

Recognition Algorithm using Haar-like Features. This algorithm is designed for still

images. It has been modified to detect faces from streaming video feeds.

The working of the algorithm is as follows

Rectangular Scaling

Figure 3.8: Algorithm Flow Diagram

The overall algorithm is depicted in Figure 3.8. The detection technique is based

on the idea of a wavelet template that defines the shape of an object in terms of a subset

of the wavelet coefficients of the image.

The input image is scanned across location and scale using a scaling factor of 1.1.

At each location and independent decision is made regarding the location of the face.

xxxii

Input

ImageSum Pixel

Calculation

Rectangle

Node

Selection

Haar-like

Feature

Calculation

Haar-like

features in

Database

Scaling

Haar-like

Feature

Comparison

Face

Detection

Page 33: Project Report

This leads to a large number of classifier evaluations. Each classifier is a simple function

of rectangular sums followed by a threshold.

In each round of boosting, one feature is selected, that with the lowest weighted

error. In subsequent rounds incorrectly labeled examples are given a higher weight while

correctly labeled examples are given a lower weight. In order to reduce the false positive

rate while preserving efficiency, classification is divided into a cascade of classifiers. The

input is passed from one classifier to the next as long as each classifier classifies the

window as a face

An input window is evaluated on the first classifier of the cascade and if that

classifier returns false then computation on that window ends and detector returns false.

If the classifier returns true then the window is passed onto the next classifier in the

cascade. The next classifier evaluates the window in the same way. The more a window

looks like a face, more classifiers are evaluated on it and longer it takes to classify the

window.

3.5 TESTING

Testing is the process of evaluating the correctness, the quality and the

completeness of the system developed. Our system was tested across a variety of

applicants. It was found that the system was able to detect faces successfully in all cases.

The application is also able to pick out faces from considerably large distances. The user

requires some training in order to move the mouse efficiently. Face detection is found to

be efficient even with a normal web camera and under ordinary lighting conditions.

However, care should be taken to align the web camera with the facial region of the user

for optimum face detection.

xxxiii

Page 34: Project Report

4. APPLICATIONS AND FUTURE ENHANCEMENT

Our system is mainly targeted towards physically disabled people who are

quadriplegic and non-verbal and bed ridden patients. But this human computer interface

has other applications as well. It can be used as an alternative to the traditional mouse and

xxxiv

Page 35: Project Report

keyboard. It can be used to control the entire computer, browse the internet, prepare

documents etc. As the system is relatively inexpensive, it can be installed in hospitals as

a communication system for patients. The system may also be used as a hands-free

navigation device to access a computer. This facilitates multitasking. For example, a

doctor while performing a surgery can make use of this system to issue commands to a

computer.

The system can be enhanced with high resolution cameras like infra red cameras to

improve face detection. It can be interfaced with external mobile devices to enhance the

communication part. The system can be enhanced for use in biometric security systems.

5. CONCLUSION

The objective of this project is to provide an automated system which will capture

the facial movements of the target user and correlate it with mouse pointer movements on

the screen. The developed interface will enable quadriplegic and non-verbal users to

access a computer.xxxv

Page 36: Project Report

A system has been developed for use by disabled people and bedridden patients. A

webcam interface captures the facial movements of the user. Face detection algorithm is

implemented and integrated with mouse movements on the screen. The system has been

integrated with four functions to aid physically challenged people. An emergency button

is provided for raising an alarm. Clicking on the audio button plays audio files for

entertainment. The video button is used to play videos for entertainment. An onscreen

message board has been provided for communication purposes. It helps the users to

display short messages to express their needs . The future focus is on enabling the system

to incorporate certain hardware based interfaces such as moving a robot.

Appendix A – Screenshots

MAIN INTERFACE

xxxvi

Page 37: Project Report

FACE TRACKING 1

xxxvii

Page 38: Project Report

FACE TRACKING 2

xxxviii

Page 39: Project Report

FACE TRACKING 3

xxxix

Page 40: Project Report

FACE TRACKING 4

xl

Page 41: Project Report

FACE TRACKING FOR A BED RIDDEN USER

xli

Page 42: Project Report

PLAYING VIDEO

xlii

Page 43: Project Report

MESSAGE BOARD

xliii

Page 44: Project Report

REFERENCES

xliv

Page 45: Project Report

1. Gary R. Bradski (1998), “Computer Vision Face Tracking for Use in a Perceptual User Interface”, Intel Technical Journal Q2 ’98, Microcomputer Research Lab, Santa Clara, CA, Intel Corporation.

2. James Gips, Margrit Betke and Peter Fleming (), “The Camera Mouse: Preliminary Investigation of Automated Visual Tracking For Computer Access”, Computer Science Department, Boston College, Chestnut Hill, MA 02467.

3. James Gips, Margrit Betke and Peter Fleming (2002), ”The Camera Mouse: Visual Tracking of Body Features to Provide Computer Access for People With Severe Disabilities”, IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 10, No. 1.

4. Rajesh Kumar, Anupam Kumar (2008), “Black Pearl: An Alternative for Mouse and Keyboard”, ICGST-GVIP, ISSN 1687-398X, Volume (8), Issue (III).

5. Sanjay Kr. Singh, D. S. Chauhan, Mayank Vatsa, Richa Singh(2003), ”A Robust Skin Color Based Face Detection Algorithm”, Tamkang Journal of Science and Engineering, Vol. 6, No. 4, pp. 227-234.

6. Zhaomin Zhu, Takashi Morimoto, Hidekazu Adachi, Osamu Kiriyama , Tetsushi Koide and Hans Juergen Mattausch (2005), “Multi-View Face Detection and Recognition using Haar-like Features”, Research Center for nano-devices and systems, Hiroshima University.

xlv