FACE RECOGNITION LOCKER FOR
ANDROID DEVICES
Enrol. No. -10103517Name of Student-ANKUR MOGRA
Enrol. No. -10103510Name of Student-SAHIL DABRA
Name of supervisor(s)-Ms. MUKTA GOEL
December – 2013
Submitted in partial fulfillment of the Degree of Bachelor of Technology
inComputer Science Engineering
DEPARTMENT OF COMPUTER SCIENCE ENGINEERING & INFORMATION TECHNOLOGY
JAYPEE INSTITUTE OF INFORMATION TECHNOLOGY, NOIDA
ContentsDeclaration......................................................................................................................................iv
Certificate........................................................................................................................................v
Acknowledgement..........................................................................................................................vi
Summary........................................................................................................................................vii
List of Figures..............................................................................................................................viii
List of Tables..................................................................................................................................ix
List of Symbols and Acronyms.......................................................................................................x
Chapter-1 Introduction................................................................................................................1
1. General Introduction.............................................................................................................1
2. List some relevant current/open problems............................................................................2
3. Problem Statement................................................................................................................3
4. Overview of proposed solution approach and Novelty/benefits...........................................3
Chapter-2 Background Study.....................................................................................................5
2.1 Literature Survey....................................................................................................................5
2.1 List all the sources for formulation of problem statement.....................................................5
2.1.2 Integrated summary of the literature studied....................................................................14
2.1.3 Comparison of other existing approaches to the problem framed....................................17
Chapter 3: Analysis, Design and Modeling...............................................................................21
3.1 Functional Requirements.....................................................................................................21
3.2 Non Functional requirements...............................................................................................22
3.4 Design Documentation.........................................................................................................26
3.4.1Use Case diagram...........................................................................................................26
3.4.2 Control Flow Diagram...................................................................................................26
3.4.3 Activity diagram............................................................................................................27
3.4.4 Algorithms.....................................................................................................................29
3.5 Risk Analysis and Mitigation Plan.......................................................................................33
Chapter-4 Implementation and Testing.....................................................................................37
4.1 Implementation details and issues........................................................................................37
4.2 Testing..................................................................................................................................47
4.2.2 Component decomposition and type of testing required...............................................51
4.2.4 Limitations of the solution............................................................................................55
2
Chapter-5 Findings & Conclusion............................................................................................56
5.1 Findings................................................................................................................................56
5.2 Conclusion...........................................................................................................................56
5.3 Future Work.........................................................................................................................57
References......................................................................................................................................58
Appendices....................................................................................................................................59
A. Project Plan as Gantt chart....................................................................................................59
B. Details of practice with new tool/technology........................................................................59
Brief Resume................................................................................................................................62
A
3
ION CHNOLOGY, NOIDA
Declaration
We hereby declare that this submission is our own work and that, to the best of my knowledge
and belief, it contains no material previously published or written by another person nor material
which has been accepted for the award of any other degree or diploma of the university or other
institute of higher learning, except where due acknowledgment has been made in the text.
Place: JIIT, Noida Signature:
Date: 13/12/13 Name: ANKUR MOGRA
Enrollment No: 10103517
Signature:
Name: SAHIL DABRA
Enrollment No: 10103517
4
Certificate
This is to certify that the work titled “sentiment classification of educational reviews using
machine learning techniques” submitted by Ankur Mogra and Sahil Dabra in partial
fulfillment for the award of degree of B. Tech of Jaypee Institute of Information Technology
University, Noida has been carried out under my supervision. This work has not been submitted
partially or wholly to any other University or Institute for the award of this or any other degree or
diploma.
Signature of Supervisor……………………..
Name of Supervisor Ms Mukta Goel
Designation Assistant Professor
Date 13th Dec.2013
5
Acknowledgement
We take this opportunity to express our profound gratitude and deep regards to our guide Mentor
Ms. Mukta Goel for his exemplary guidance, monitoring and constant encouragement throughout
the course of this thesis. The blessing, help and guidance given by him time to time shall carry
me a long way in the journey of life on which I am about to embark.
We are obliged to staff members of JIIT, for the valuable information provided by them in their
respective fields. We are grateful for their cooperation during the period of our assignment.
Lastly, We thank almighty, our parents, brother, sisters and friends for their constant
encouragement without which this assignment would not be possible.
Signature:
Name: Ankur Mogra
Enrollment No: 10103517
Signature:
Name: Sahil Dabra
Enrollment No: 10103510
Date: 13/12/13
6
Summary
Are you an Android 2.3.x or 3.x user wishing for face recognition technology to unlock your phone? Or a Nexus S user who didn't get face recognition with the Android ICS update? Or an ICS user curious to use other face recognition technologies? FaceRecog App is what you've been waiting for! FaceRecog is a FREE multi-user face recognition App that can be used to lock your phone, block calls and pretend like a replacement for lock screen. You can disable/enable any of these features in the App settings. The App requires one "Administrator" user who can unlock the phone with a password in addition to trained faces.
__________________ __________________
Signature of Student Signature of Supervisor
Name Name
Date Date
__________________
Signature of Student
Name
Date
7
Chapter-1 Introduction
1. General Introduction
Since your phone is with you at all times, the likelihood of it getting left behind at a
bar, restaurant, gym, or other location that you previously visited is probably pretty
high. And since we live in a world that isn’t always filled with angels, the chances of
that left-behind-phone getting stolen and fondled deeply without your approval is
probably even higher. Your first line of defense against evil doers is your lock screen.
In stock Android, you have six different options to choose from for your lock screen,
all of which offer their levels of security. If you use a non-stock Android device like
the Galaxy S3, you may see some differences in functionality between the types we’ll
talk about in a minute, but for the most part they all act in a similar fashion.
First, to access your lock screen options, the universal location tends to be in
Settings>Security. From there, you should see an option towards the top called
“Screen lock,” which then takes you to your lock screen options once tapped.
2. List some relevant current/open problems.
8
Slide is probably the most commonly used lock screen of all – it’s basically the default.
This lock screen is not secure by any means, and only asks that the user of the phone grab
the circle with a lock inside and slide it outside of a larger circle to unlock the phone.
There are no passwords or patterns, it’s simply a way to keep your phone from turning
itself on and then accessing all sorts of info in your pocket or purse without your
knowing.
The nice thing about using Slide, is that you can still access your notification pulldown
without having to fully unlock your phone. None of the other lock screen options allow
for this, as they are technically “secure.”
Face Unlock was introduced back in Ice Cream Sandwich as a fun way to unlock your
phone using your face. In order to set this option up, you have to place your face inside of
a face-shaped ring of dots using your front facing camera until the device decides that it
knows your face enough to be able to unlock with it. Once approved, you’ll also be asked
to provide a backup option in case the device cannot recognize your face. The two
backup options are PIN or pattern.
With Face Unlock setup, you wake your phone and then set point your front facing
camera at your face. If it recognizes you, it will unlock almost immediately. If not, it will
ask that you complete your backup PIN or Pattern unlock.
3. Problem Statement
9
Face recognition systems are usually limited in performance by lighting conditions, facial pose
and expressions. For example you might have used systems where face recognition works
perfectly at home, but not in your office. Or ones where it where it confuses your face with your
friend's. To overcome this, FaceRecog gives you the complete freedom by allowing you to train
your face multiple times in different conditions and setting your own 'confidence threshold'. You
can experiment with these features to find the optimal configuration for the App! You can also
choose to have others use your phone without revealing your password by letting them enroll
their faces. To configure the App for the first time, download the App to your device and follow
the on-screen directions
4. Overview of proposed solution approach and Novelty/benefits
Training:
After you're done with configuring your App for the Administrator account, you will want to
train more profiles in different environments where you are most likely to use the phone. For
example you can have an "Erica indoors" profile and an "Erica outdoors" profile. You add a
user/profile by pressing menu button from the App's main screen and selecting enroll user. The
recommended number of profiles you should train is 2-4 per user. Make sure than you are in a
reasonably well-lit environment before training. An important point to keep in mind while
training is to pose your face with the same expressions and lighting conditions that you'd use
normally. Also, the App tries to locate your eyes for training.To ensure that the App finds your
eyes, maintain a frontal pose in good lighting conditions with nothing blocking the view of
your eyes (for eg: no sunglasses or hair in the way).
10
Playing with confidence scores:
The most important setting that you SHOULD CHANGE are the confidence thresholds. You can
find this setting, by pressing menu and choosing settings. The value of confidence score
determines how sensitive face recognition should be. Ideally, you'd want to set the highest
possible confidence threshold at which the phone can recognize your face in 10-20 secs while
not confusing other faces as yours.. The default value is 0.50. As you add more training profiles
for your face, you must increase the confidence score. For 1 profile, recommended values to try
are 0.50, 0.55, 0.60. For 2 profiles try the range 0.55 to 0.70. For 3, try the range 0.60-0.80.
These ranges are approximate and greatly depends on a number of factors including ones
mentioned before like lighting conditions, facial expressions etc. Have fun by trying different
combinations of profiles and confidence scores! Additionally, you'd also want to ensure that the
camera settings are uniform between training and recognition. For example, if you used "Front
camera" to train, you should continue to use the same for scanning your face and unlocking the
phone.
11
Chapter-2 Background Study
2.1 Literature Survey
2.1 List all the sources for formulation of problem statement
Books:
Pietikäinen, M., Hadid, A., Zhao, G., Ahonen, T.
Computer Vision Using Local Binary Patterns.
Computational Imaging and Vision, Vol. 40
Papers:
Face Recognition with Local Binary Patterns by Timo Ahonen, Abdenour Hadid, and
Matti Pietik¨ainen
Face Recognition: A Literature Survey by W. ZHAO, R. CHELLAPPA, P. J.
PHILLIPS
Face Recognition Project by Thomas Heseltine
Biometrics and Face Recognition Techniques by Renu Bhatia
Tools:
Android SDK
Eclipse
Emulator
Web camera operation.
12
2.1.1 Summary of papers
1. Title of paper : Face Recognition with Local Binary Patterns
Authors: Timo Ahonen, Abdenour Hadid, and Matti Pietik¨aine
Year of Publication: 2010
Summary: This paper gave an introduction to the original LBP operator, introduced by
Ojala et al., is a powerful means of texture description. The operator labels the pixels of
an image by thresholding
the 3x3-neighbourhood of each pixel with the center value and considering the
result as a binary number. Then the histogram of the labels can be used as a
texture descriptor, for an illustration of the basic LBP operator.
Later the operator was extended to use neigbourhoods of different sizes.
Using circular neighbourhoods and bilinearly interpolating the pixel values allow
any radius and number of pixels in the neighbourhood. For neighbourhoods we
will use the notation (P, R) which means P sampling points on a circle of radius
of R.
Weblink: http://masters.donntu.edu.ua/2011/frt/dyrul/library/article8.pdf
2. Title of paper: Face Recognition: A Literature Survey by
Authors: W. ZHAO, R. CHELLAPPA, P. J. PHILLIPS
Year of Publication: 2008
13
Summary: As one of the most successful applications of image analysis and
understanding, face
recognition has recently received significant attention, especially during the past
several years. At least two reasons account for this trend: the first is the wide range of
commercial and law enforcement applications, and the second is the availability of
feasible technologies after 30 years of research. Even though current machine
recognition systems have reached a certain level of maturity, their success is limited by
the conditions imposed by many real applications. For example, recognition of face
images acquired in an outdoor environment with changes in illumination and/or pose
remains a largely unsolved problem. In other words, current systems are still far away
from the capability of the human perception system.
This paper provides an up-to-date critical survey of still- and video-based face
recognition research. There are two underlying motivations for us to write this survey
paper: the first is to provide an up-to-date review of the existing literature, and the
second is to offer some insights into the studies of machine recognition of faces. To
provide a comprehensive survey, we not only categorize existing recognition techniques
but also present detailed descriptions of representative methods within each category.
In addition, relevant topics such as psychophysical studies, system evaluation, and
issues of illumination and pose variation are covered.
Weblink: http://www.cs.ucf.edu/~dcm/Teaching/COT4810-Spring2011/Literature/
DiegoVelasquez-FaceRecognitionLiteratureSurvey.pdf
14
3. Title of paper: Face Recognition Project
Authors: Thomas Heseltine
Year of Publication: 2009
Summary:
The term face recognition encompasses three main procedures. The preliminary step of
face detection (which may include some feature localisation) is often necessary if no
manual (human) intervention is to be used. Many methods have been used to accomplish
this, including template based techniques, motion detection, skin tone segmentation,
principal component analysis and classification by neural networks. All of which present
the difficult task of characterizing “non-face” images. Also, many of the algorithms
currently available are only applicable to specific situations: assumptions are made
regarding the orientation and size of the face in the image, lighting conditions,
background and subject's co-operation. The next procedure is verification. This
describes the process by which two face images are compared, producing a result to
indicate if the two images are of the same person. Another (often more difficult)
procedure is identification. This requires a probe image, for which a matching image is
searched for in a database of known people, thus identifying the probe image as a specific
person.
Web link: http://www-users.cs.york.ac.uk/~nep/research/3Dface/tomh/Biometrics.html
4. Title of paper: Biometrics and Face Recognition Techniques
Authors: Renu Bhatia
Year of Publication: 2009
15
Summary:
Biometrics is automated methods of recognizing a person based on a physiological or
behavioral characteristic. The past
of biometrics includes the identification of people by distinctive body features, scars or a
grouping of other physiological
criteria, such like height, eye color and complexion. The present features are face
recognition, fingerprints, handwriting, hand
geometry, iris, vein, voice and retinal scan. Biometric technique is now becoming the
foundation of a wide array of highly
secure identification and personal verification. As the level of security breach and
transaction scam increases, the need for
well secure identification and personal verification technologies is becoming apparent.
Recent world events had lead to an
increase interest in security that will impel biometrics into majority use. Areas of future
use contain Internet transactions,
workstation and network access, telephone transactions and in travel and tourism. There
have different types of biometrics:
Some are old or others are latest technology. The most recognized biometric technologies
are fingerprinting, retinal
scanning, hand geometry, signature verification, voice recognition, iris scanning and
facial recognition.
Web link: http://www.ijarcsse.com/docs/papers/Volume_3/5_May2013/V3I4-0506.pdf
16
2.1.2 Integrated summary of the literature studied
The LBP feature vector, in its simplest form, is created in the following manner:
Divide the examined window into cells (e.g. 16x16 pixels for each cell).
For each pixel in a cell, compare the pixel to each of its 8 neighbors (on its left-top, left-
middle, left-bottom, right-top, etc.). Follow the pixels along a circle, i.e. clockwise or
counter-clockwise.
Where the center pixel's value is greater than the neighbor's value, write "1". Otherwise,
write "0". This gives an 8-digit binary number (which is usually converted to decimal for
convenience).
Compute the histogram, over the cell, of the frequency of each "number" occurring (i.e., each
combination of which pixels are smaller and which are greater than the center).
Optionally normalize the histogram.
Concatenate (normalized) histograms of all cells. This gives the feature vector for the
window.
In the LBP approach for texture classification, the occurrences of the LBP codes in an
image are collected into a histogram. The classification is then performed by computing
simple histogram similarities. However, considering a similar approach for facial image
representation results in a loss of spatial information and therefore one should codify the
texture information while retaining also their locations. One way to achieve this goal is to
use the LBP texture descriptors to build several local descriptions of the face and
combine them into a global description. Such local descriptions have been gaining
interest lately which is understandable given the limitations of the holistic
17
representations. These local feature based methods are more robust against variations in
pose or illumination than holistic methods.
The basic methodology for LBP based face description proposed by Ahonen et al. (2006)
is as follows: The facial image is divided into local regions and LBP texture descriptors
are extracted from each region independently. The descriptors are then concatenated to
form a global description of the face, as shown in Fig. 4.
Figure 4: Face description with local binary patterns.
This histogram effectively has a description of the face on three different levels of
locality: the LBP labels for the histogram contain information about the patterns on a
pixel-level, the labels are summed over a small region to produce information on a
regional level and the regional histograms are concatenated to build a global description
of the face.
It should be noted that when using the histogram based methods the regions do not need
to be rectangular. Neither do they need to be of the same size or shape, and they do not
necessarily have to cover the whole image. It is also possible to have partially
overlapping regions.
18
The two-dimensional face description method has been extended into spatiotemporal
domain (Zhao and Pietikäinen 2007). Fig. 1 depicts facial expression description using
LBP-TOP. Excellent facial expression recognition performance has been obtained with
this approach.
Since the publication of the LBP based face description, the methodology has already
attained an established position in face analysis research and applications. A notable
example is illumination-invariant face recognition system proposed by Li et al. (2007),
combining NIR imaging with LBP features and Adaboost learning. Zhang et al. (2005)
proposed the extraction of LBP features from images obtained by filtering a facial image
with 40 Gabor filters of different scales and orientations, obtaining outstanding results.
Hadid and Pietikäinen (2009) used spatiotemporal LBPs for face and gender recognition
from video sequences, while Zhao et al. (2009) adopted the LBP-TOP approach to visual
speech recognition achieving leading-edge performance without error-prone
segmentation of moving lips.
19
Chapter 3: Analysis, Design and Modeling
Android SDK:
The Android software development kit (SDK) includes a comprehensive set of development
tools. These include a debugger, libraries, a handset emulator based on QEMU, documentation,
sample code, and tutorials. Currently supported development platforms include computers
running Linux (any modern desktop Linux distribution), Mac OS X 10.5.8 or later, Windows
XP or later; for the moment one can develop Android software on Android itself by using [AIDE
- Android IDE - Java, C++] app and [Android java editor] app. The officially
supported integrated development environment (IDE) is Eclipse using the Android Development
Tools (ADT) Plugin, though IntelliJ IDEAIDE (all editions) fully supports Android development
out of the box, and NetBeans IDE also supports Android development via a plugin. Additionally,
developers may use any text editor to edit Java and XML files, then use command line tools
(Java Development Kitand Apache Ant are required) to create, build and debug Android
applications as well as control attached Android devices (e.g., triggering a reboot, installing
software package(s) remotely).
Enhancements to Android's SDK go hand in hand with the overall Android platform
development. The SDK also supports older versions of the Android platform in case developers
wish to target their applications at older devices. Development tools are downloadable
components, so after one has downloaded the latest version and platform, older platforms and
tools can also be downloaded for compatibility testing.
20
Android applications are packaged in .apk format and stored under /data/app folder on the
Android OS (the folder is accessible only to the root user for security reasons). APK package
contains .dex files (compiled byte code files called Dalvik executables), resource files, etc.
Eclipse:
The Eclipse Platform uses plug-ins to provide all the functionality within and on top of the
runtime system. The Eclipse Platform's runtime system is based on Equinox, an implementation
of theOSGi core framework specification.
In addition to allowing the Eclipse Platform to be extended using other programming
languages such as C and Python, the plug-in framework allows the Eclipse Platform to work with
typesetting languages like LaTeX, networking applications such as telnet and database
management systems. The plug-in architecture supports writing any desired extension to the
environment, such as for configuration management. Java and CVS support is provided in the
Eclipse SDK, with support for other version control systems provided by third-party plug-ins.
With the exception of a small run-time kernel, everything in Eclipse is a plug-in. This means that
every plug-in developed integrates with Eclipse in exactly the same way as other plug-ins; in this
respect, all features are "created equal". Eclipse provides plug-ins for a wide variety of features,
some of which are through third parties using both free and commercial models. Examples of
plug-ins include a UML plug-in for Sequence and other UML diagrams, a plug-in for DB
Explorer, and many others.
The Eclipse SDK includes the Eclipse Java development tools (JDT), offering an IDE with a
built-in incremental Java compiler and a full model of the Java source files. This allows for
advancedrefactoring techniques and code analysis. The IDE also makes use of a workspace, in
21
this case a set of metadata over a flat filespace allowing external file modifications as long as the
corresponding workspace "resource" is refreshed afterwards.
Eclipse implements widgets through a widget toolkit for Java called SWT, unlike most Java
applications, which use the Java standard Abstract Window Toolkit (AWT) or Swing. Eclipse's
user interface also uses an intermediate graphical user interface layer called JFace, which
simplifies the construction of applications based on SWT.
Language packs developing by the "Babel project" provide translations into over a dozen natural
languages.
22
3.4 Design Documentation
3.4.1Use Case diagram
Figure 1: use case
3.4.4 Algorithms
The original LBP operator, introduced by Ojala et al., is a powerful means of
texture description. The operator labels the pixels of an image by thresholding
the 3x3-neighbourhood of each pixel with the center value and considering the
result as a binary number. Then the histogram of the labels can be used as a
texture descriptor, for an illustration of the basic LBP operator.
Later the operator was extended to use neigbourhoods of different sizes.
Using circular neighbourhoods and bilinearly interpolating the pixel values allow
any radius and number of pixels in the neighbourhood. For neighbourhoods we
will use the notation (P, R) which means P sampling points on a circle of radius
of R, circular (8,2) neighbourhood.
Another extension to the original operator uses so called uniform patterns
23
A Local Binary Pattern is called uniform if it contains at most two bitwise
transitions from 0 to 1 or vice versa when the binary string is considered circular.
For example, 00000000, 00011110 and 10000011 are uniform patterns. Ojala et al.
noticed that in their experiments with texture images, uniform patterns account
for a bit less than 90 % of all patterns when using the (8,1) neighbourhood and
for around 70 % in the (16,2) neighbourhood.
We use the following notation for the LBP operator: LBPu2
P,R. The subscript
represents using the operator in a (P, R) neighbourhood. Superscript u2 stands
for using only uniform patterns and labelling all remaining patterns with a single
label.
A histogram of the labeled image fl(x, y) can be defined as
Hi =
x,y
I {fl(x, y) = i}, i = 0,...,n−1,
transitions from 0 to 1 or vice versa when the binary string is considered circular.
For example, 00000000, 00011110 and 10000011 are uniform patterns. Ojala et al.
noticed that in their experiments with texture images, uniform patterns account
for a bit less than 90 % of all patterns when using the (8,1) neighbourhood and
for around 70 % in the (16,2) neighbourhood.
We use the following notation for the LBP operator: LBPu2
P,R. The subscript
represents using the operator in a (P, R) neighbourhood. Superscript u2 stands
24
for using only uniform patterns and labelling all remaining patterns with a single
label.
A histogram of the labeled image fl(x, y) can be defined as
Hi =
x,y
I {fl(x, y) = i}, i = 0,...,n−1,
In this histogram, we effectively have a description of the face on three different
levels of locality: the labels for the histogram contain information about the
patterns on a pixel-level, the labels are summed over a small region to produce
information on a regional level and the regional histograms are concatenated to
build a global description of the face.
3.5 Risk Analysis and Mitigation Plan Table 1 Risk analysis and mitigation
Risk
ID
Description of
Risk
Risk Area Prob.
( P)
Impact
(I)
RE
(P*I)
Risk
Selected
for
mitigation
(Y/N)
Mitigation
plan if 7 is
‘Y’
Contingency
plan if any
1. Portability of
Server to
different
languages
Platform M(3) M(3) 9 Y Converting
the code to
Java at a
later stage
because it is
platform
25
independent.
2. Risk of
delivering
quality product
Quality
assurance
H
(5)
H
(5)
25 Y Performing
both black
box and
white box
testing. Give
inputs to
check
outputs,
monitor flow
of code
throughout.
3. Consistency
No tool
consistently
distinguishes
between
positive and
negative
reviews.
Requirement
Risk
M(3) H(5) 15 Y Various
machine
learning
algorithms
can be used
so as to get
result from
each and
then improve
overall result
by majority.
Also
improving a
classifier by
feedback.
4. Risk of Implementation M H 15 Y Constant
26
implementing
the code
(3) (5) review of
functioning
modules and
their
integration
5. Wide scope of
functionality
of project →
unrealistic
expectations
from the
project
Project
Scope
L
(1)
H
(5)
5 N In such a
case the
expectations
can be seen
as new
features to
be
implemented
and will be
added
thereafter.
6. Random input
from user →
desired
outcome is not
achieved.
External Input L
(1)
H
(5)
5 N In such a
situation the
program
should
display
“improper
input” and
should ask
the user to
input again.
7. Software
packages
integration →
Development
Environment
Risk
H(5) H(5) 25 Y Performed
unit testing
first and then
27
part of code
doesn’t run
combined
each
interface.
Thus in the
end
performed
integration
testing.
8. Lack of
knowledge on
project
domain →
improper
implementation
and testing
Testing
environment
risk
and Personnel
Related
H(3) H(5) 15 Y Did adequate
amount of
literature
survey,
adequate
research so
as to gain
knowledge
of the project
domain
28
Chapter-4 Implementation and Testing
4.1 Implementation details and issues
We implement our project in python s it has a number of very useful libraries for text processing
and sentiment analysis, plus it’s easy to code in.
First Step is gathering opinionated data i.e. educational reviews in our project.
we crawled data from website StudentAdvisor. About 16000 positive and negative reviews.
1. Issue: Training the Classifiers
Implementation: The classifiers need to be trained and to do that, we need to list manually
classified reviews.
we started with 500 positive , 500 negative and 100 neutral reviews.
Reviews clubbed together and is stored in a yml file with label.
Issue: Reviews can have some valuable info about it's sentiment and rest of the words may not
really help in determining the sentiment.
Implementation: Therefore, it makes sense to preprocess the tweets.
Preprocess reviews
Lower Case - Convert the tweets to lower case.
URLs - I don't intend to follow the short urls and determine the content of the site, so we
can eliminate all of these URLs via regular expression matching or replace with generic
word URL.
Punctuations and additional white spaces - remove punctuation at the start and ending of
the tweets. E.g: ' the day is beautiful! ' replaced with 'the day is beautiful'. It is also
helpful to replace multiple whitespaces with a single whitespace.
29
Replace more than one ‘.’ with single ‘.’ etc.
Feature Vector
Feature vector is the most important concept in implementing a classifier. A good feature vector
directly determines how successful your classifier will be. The feature vector is used to build a
model which the classifier learns from the training data and further can be used to classify
previously unseen data.
we can use the presence/absence of words that appear in reviews as features. In the training data,
consisting of positive, negative and neutral tweets, we can split each tweet into words and add
each word to the feature vector. Some of the words might not have any say in indicating the
sentiment of a review (stopwords) and hence we can filter them out. Adding individual (single)
words to the feature vector is referred to as 'unigrams' approach.
Some of the other feature vectors also add 'bi-grams' in combination with 'unigrams'. Here,
initially, we will only consider the unigrams.
3 . Issue: The feature vector will explode.
Implementaion : Before adding the words to the feature vector, we pre-process them in order to
filter.
Filtering for feature vector
Stop words - a, is, the, with etc. These words don't indicate any sentiment and can be
removed. we use nltk corpus of stopwords to eliminate them.
Repeating letters - if you look at the review, sometimes people repeat letters to stress the
emotion. E.g. hunggrryyy, huuuuuuungry for 'hungry'. We can look for 2 or more repetitive
letters in words and replace them by 2 of the same.
Punctuation - we can remove punctuation such as comma, single/double quote, question
marks at the start and end of each word. E.g. beautiful!!!!!! replaced with beautiful
Words must start with an alphabet - For simplicity sake, we can remove all those words
which don't start with an alphabet. E.g. 15th, 5.34am
4.2 Testing
30
4.2.1 Testing Plan
Table 2 Test plan
Type of Test Test
Performed?
Comments Software Component
Unit Testing Yes The application is mainly divided into
the main components – data collection,
feature extraction and classification.
Preprocessing Steps
Bayes Classifier
Maximum Entropy
Classifier
SVM Classifier
Accuracy code
Integration
Testing
Yes Our application's modules are inter-
dependent.
Preprocessing and
individual classifiers
Combining all classifiers
After adding GUI
Incorporating GUI with
all three classifiers
Combined code with
feed back.
Performance
Testing
Yes Performance testing is necessary as
samples should be processed quickly
correctly .
Naive Bayes Classifier
Maximum Entropy
Classifier
SVM Classifier
31
Stress Testing No At this stage, we have not identified
stress points for this application as
feature development is higher priority.
Security
Testing
No Feature development is higher on
priority.
Load Testing No Load Testing is not important at this
stage as we are not dealing with real
time data right now.
4.2.2 Test Team details:
Task Done by
Study of research papers and journal SAHIL DABRA
ANDROID SDK ANKUR MOGRA
ECLIPSE ANKUR MOGRA
FACE DETECTION ANKUR MOGRA
LBP SAHIL DABRA, ANKUR MOGRA
4.2.3 Test Schedule:
Table 3 Test Schedule
Activity Start Date Completion
Date
Hour Comments
32
Face Detection 3rd sept. 2013 5th sept. 2013 4 hrs Successfully
Done
Local Binary Pattern 10th sept. 2013 20th sept. 2013 7 hrs Successfully
Done
SQLite 30th sept. 2013 7th oct. 2013 1 hrs Successfully
Done
Face Matching 10th oct. 2013 15th oct. 2013 3 hrs Successfully
Done
Confidence factor 25th oct. 2013 30th oct. 2013 2 hrs Successfully
Done
Histogram 3rd Nov . 2013 7th Nov . 2013 45
mins
Successfully
Done
App making 10th Nov . 2013 15th Nov . 2013 2.5
hrs
Successfully
Done
measuring accuracy 20th Nov . 2013 25th Nov . 2013 10
hrs
Successfully
Done
Implementing feedback from
reviews
1st Dec . 2013 5th dec . 2013 2 hrs Successfully
Done
Test Environment:
Software Items:
Android SDK:
The Android software development kit (SDK) includes a comprehensive set of development
tools. These include a debugger, libraries, a handset emulator based on QEMU, documentation,
sample code, and tutorials. Currently supported development platforms include computers
running Linux (any modern desktop Linux distribution), Mac OS X 10.5.8 or later, Windows
XP or later; for the moment one can develop Android software on Android itself by using [AIDE
- Android IDE - Java, C++] app and [Android java editor] app. The officially
33
supported integrated development environment (IDE) is Eclipse using the Android Development
Tools (ADT) Plugin, though IntelliJ IDEAIDE (all editions) fully supports Android development
out of the box, and NetBeans IDE also supports Android development via a plugin. Additionally,
developers may use any text editor to edit Java and XML files, then use command line tools
(Java Development Kitand Apache Ant are required) to create, build and debug Android
applications as well as control attached Android devices (e.g., triggering a reboot, installing
software package(s) remotely).
Enhancements to Android's SDK go hand in hand with the overall Android platform
development. The SDK also supports older versions of the Android platform in case developers
wish to target their applications at older devices. Development tools are downloadable
components, so after one has downloaded the latest version and platform, older platforms and
tools can also be downloaded for compatibility testing.
Android applications are packaged in .apk format and stored under /data/app folder on the
Android OS (the folder is accessible only to the root user for security reasons). APK package
contains .dex files (compiled byte code files called Dalvik executables), resource files, etc.
Eclipse:
The Eclipse Platform uses plug-ins to provide all the functionality within and on top of the
runtime system. The Eclipse Platform's runtime system is based on Equinox, an implementation
of theOSGi core framework specification.
In addition to allowing the Eclipse Platform to be extended using other programming
languages such as C and Python, the plug-in framework allows the Eclipse Platform to work with
typesetting languages like LaTeX, networking applications such as telnet and database
34
management systems. The plug-in architecture supports writing any desired extension to the
environment, such as for configuration management. Java and CVS support is provided in the
Eclipse SDK, with support for other version control systems provided by third-party plug-ins.
With the exception of a small run-time kernel, everything in Eclipse is a plug-in. This means that
every plug-in developed integrates with Eclipse in exactly the same way as other plug-ins; in this
respect, all features are "created equal". Eclipse provides plug-ins for a wide variety of features,
some of which are through third parties using both free and commercial models. Examples of
plug-ins include a UML plug-in for Sequence and other UML diagrams, a plug-in for DB
Explorer, and many others.
The Eclipse SDK includes the Eclipse Java development tools (JDT), offering an IDE with a
built-in incremental Java compiler and a full model of the Java source files. This allows for
advancedrefactoring techniques and code analysis. The IDE also makes use of a workspace, in
this case a set of metadata over a flat filespace allowing external file modifications as long as the
corresponding workspace "resource" is refreshed afterwards.
Eclipse implements widgets through a widget toolkit for Java called SWT, unlike most Java
applications, which use the Java standard Abstract Window Toolkit (AWT) or Swing. Eclipse's
user interface also uses an intermediate graphical user interface layer called JFace, which
simplifies the construction of applications based on SWT.
Language packs developing by the "Babel project" provide translations into over a dozen natural
languages
35
Hardware Items:
The only hardware is computer systems which will act as server, data centre, front end for the
user
PC 1.6 Ghz or higher.
512 Mb Ram or higher.
5 GB disc Space or higher.
Internet Access.
Operating System - Win98 or higher
4.2.2 Component decomposition and type of testing required
Table 4 : Component decomposition and type of testing required
S.N
o
List of Various
Components that
require
testing
Type of Testing Required Technique for writing
test
cases
1. Preprocessing Unit testing Black box
2. Feature vector Unit testing Black box
3. Local Binary Pattern Unit testing , Integration
testing
Black box
4. Eclipse Unit testing , Integration
testing
Black box
5. App making Unit testing, Integration testing Black box
6. GUI Unit testing, Integration testing Black box
7. Default screen System testing Black box
36
8. Accuracy module Unit testing, Integration testing Black box
9. Feedback module System testing Black box
Chapter-5 Findings & Conclusion
5.1 Findings
Turning the App into a lock screen replacement:
While Android's security policy does not allow Apps to replace lock screens, you can set
FaceRecog to work almost like a lock screen. The only difference is that the phone will not be
locked with FaceRecog if you press the power button and the App isn't already launched. If the
App is launched manually by you, you can make it to behave like a lock screen by enabling the
"Disable lockscreen" option from settings menu. Additionally you'd also want to block incoming
calls.
5.2 Conclusion
Face Unlock was introduced back in Ice Cream Sandwich as a fun way to unlock your phone
using your face. In order to set this option up, you have to place your face inside of a face-shaped
ring of dots using your front facing camera until the device decides that it knows your face
enough to be able to unlock with it. Once approved, you’ll also be asked to provide a backup
option in case the device cannot recognize your face. The two backup options are PIN or pattern.
37
With Face Unlock setup, you wake your phone and then set point your front facing camera at
your face. If it recognizes you, it will unlock almost immediately. If not, it will ask that you
complete your backup PIN or Pattern unlock.
People have found ways to trick Face Unlock from time to time, so I’d say that while it’s more
secure than Slide, it’s not as secure as the next three.
5.3 Future Work
We have planned the following things that have to be done in the future:
Increase the performance of our algorithm to give better results in less amount of time.
Extend this project by using illumination and different postures.
Face recognition systems are usually limited in performance by lighting conditions, facial
pose and expressions. For example you might have used systems where face recognition
works perfectly at home, but not in your office. Or ones where it where it confuses your face
with your friend's. To overcome this, FaceRecog gives you the complete freedom by
allowing you to train your face multiple times in different conditions and setting your own
'confidence threshold'
38
References
Face Recognition with Local Binary Patterns by Timo Ahonen, Abdenour Hadid, and
Matti Pietik¨ainen
Face Recognition: A Literature Survey by W. ZHAO, R. CHELLAPPA, P. J.
PHILLIPS
Face Recognition Project by Thomas Heseltine
Biometrics and Face Recognition Techniques by Renu Bhatia
Appendices
A. Project Plan as Gantt chart
Domain FinalizationRead Basic NLP Papers
Read papers on Sentiment analysisInitial topic finalization - Politics
Read Papers on PoliticsImplementation using POS Tagger and dictionary
New topic finalization and extraction of opinionated dataRead papers on Supervised Learning
Implemented Naïve Bayes and Maximum EntropyRead paper on SOM - Unsupervised Machine learning algorithm
Implemented tf-idf using gensimImplemented Supervised learning algorithm - SVM
Implemented python graphics using wxPythonCalculated and improved accuracy using Majority voting
Implemented feedback for each classifierDetecting invalid or spam inputs
Creating a google chrome extensionProject Report and presentation
Discussion with mentor
6-May 25-Jun 14-Aug 3-Oct 22-Nov 11-Jan
39
B. Details of practice with new tool/technology
Image Histogram:
An image histogram is a type of histogram that acts as a graphical representation of
the tonal distribution in a digital image. It plots the number of pixels for each tonal value. By
looking at the histogram for a specific image a viewer will be able to judge the entire tonal
distribution at a glance.
Image histograms are present on many modern digital cameras. Photographers can use them as
an aid to show the distribution of tones captured, and whether image detail has been lost to
blown-out highlights or blacked-out shadows.
The horizontal axis of the graph represents the tonal variations, while the vertical axis represents
the number of pixels in that particular tone. The left side of the horizontal axis represents the
black and dark areas, the middle represents medium grey and the right hand side represents light
and pure white areas. The vertical axis represents the size of the area that is captured in each one
of these zones. Thus, the histogram for a very dark image will have the majority of its data points
on the left side and center of the graph. Conversely, the histogram for a very bright image with
few dark areas and/or shadows will have most of its data points on the right side and center of
the graph.
40
Brief Resume
1. Name: Ankur Mogra
Enrollment Number : 10103517
Email ID : [email protected]
Mobile Number: : 9810558237
2 Name : Sahil Dabra
Enrollment Number : 10103510
Email ID : [email protected]
Mobile Number : 9971970399
41