Upload
britton-lane
View
296
Download
8
Embed Size (px)
Citation preview
AdaBoost & Its Applications
主講人:虞台文
Outline
Overview The AdaBoost Algorithm How and why AdaBoost works? AdaBoost for Face Detection
AdaBoost & Its Applications
Overview
Introduction
AdaBoostAdaptive
A learning algorithm
Building a strong classifier a lot of weaker ones
Boosting
AdaBoost Concept
1 { 1 }) , 1(h x
...
weak classifiers
slightly better than random
1
( )( )T
ttT th xH x sign
2 { 1 }) , 1(h x
{ 1( }) , 1Th x
strong classifier
Weaker Classifiers
1 { 1 }) , 1(h x
...
weak classifiers
slightly better than random
1
( )( )T
ttT th xH x sign
2 { 1 }) , 1(h x
{ 1( }) , 1Th x
strong classifier
Each weak classifier learns by considering one simple feature
T most beneficial features for classification should be selected
How to– define features?– select beneficial features?– train weak classifiers?– manage (weight) training
samples?– associate weight to each weak
classifier?
The Strong Classifiers
1 { 1 }) , 1(h x
...
weak classifiers
slightly better than random
1
( )( )T
ttT th xH x sign
2 { 1 }) , 1(h x
{ 1( }) , 1Th x
strong classifier
How good the strong one will be?
How good the strong one will be?
AdaBoost & Its Applications
The AdaBoost Algorithm
The AdaBoost Algorithm
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
Initialization: 11( ) , 1, ,mD i i m
For :1, ,t T
• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
:probability distribution of 's at time ( )t iD i x t:probability distribution of 's at time ( )t iD i x t
• Weight classifier: 11
ln2
tt
t
• Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
minimize weighted errorminimize weighted error
for minimize exponential lossfor minimize exponential loss
Give error classified patterns more chance for learning.Give error classified patterns more chance for learning.
The AdaBoost Algorithm
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
Initialization: 11( ) , 1, ,mD i i m
For :1, ,t T
• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
• Weight classifier: 11
ln2
tt
t
• Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
Boosting illustration
Weak Classifier 1
Boosting illustration
WeightsIncreased
Boosting illustration
Weak Classifier 2
Boosting illustration
WeightsIncreased
Boosting illustration
Weak Classifier 3
Boosting illustration
Final classifier is a combination of weak classifiers
AdaBoost & Its Applications
How and why AdaBoost works?
The AdaBoost Algorithm
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
Initialization: 11( ) , 1, ,mD i i m
For :1, ,t T
• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
• Weight classifier: 11
ln2
tt
t
• Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
What goal the AdaBoost wants to reach?
The AdaBoost Algorithm
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
Initialization: 11( ) , 1, ,mD i i m
For :1, ,t T
• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
• Weight classifier:
• Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
What goal the AdaBoost wants to reach?
11ln
2t
tt
They are goal dependent.
They are goal dependent.
Goal
Minimize exponential loss
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ,( ) yH x
x yloss H x E e
Goal
Minimize exponential loss
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ,( ) yH x
x yloss H x E e ( )yH x
Maximize the margin yH(x)
Goal Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ,( ) yH x
x yloss H x E e Minimize
( ) ( ), |t tyH x yH x
x y x yE e E E e x
Define 1( ) ( ) ( )t t t tH x H x h x with 0 ( ) 0H x
Then, ( ) ( )TH x H x
1[ ( ) ( )] |t t ty H x h xx yE E e x
1 ( ) ( ) |t t tyH x y h xx yE E e e x
1 ( ) ( ( )) ( ( ))t t tyH xx t tE e e P y h x e P y h x
( ) ( ), |t tyH x yH x
x y x yE e E E e x
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ,( ) yH x
x yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x with 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ( ) ( ( )) ( ( ))t t tyH xx t tE e e P y h x e P y h x
( ), 0tyH x
x yt
E e
Set
1 ( ) ( ( )) ( ( )) 0t t tyH xx t tE e e P y h x e P y h x
0
?t
Final classifier:1
( ) ( )T
t tt
sign H x h x
Minimize
Define 1( ) ( ) ( )t t t tH x H x h x with 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ( ) ( ( )) ( ( )) 0t t tyH xx t tE e e P y h x e P y h x
0
( ( ))1ln
2 ( ( ))t
tt
P y h x
P y h x
11
ln2
tt
t
(error)t P
?t ( )
exp ,( ) yH xx yloss H x E e
( , ) ( )i i tP x y D i( , ) ( )i i tP x y D i
1
( )[ ( )]m
t i j ii
D i y h x
( )exp ,( ) yH x
x yloss H x E e
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ( ) ( ( )) ( ( )) 0t t tyH xx t tE e e P y h x e P y h x
0
( ( ))1ln
2 ( ( ))t
tt
P y h x
P y h x
11
ln2
tt
t
?t Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
I nitialization: 11( ) , 1, ,mD i i m
For :1, ,t T
•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
•Weight classifier:11
ln2
tt
t
•Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
I nitialization: 11( ) , 1, ,mD i i m
For :1, ,t T
•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
•Weight classifier:11
ln2
tt
t
•Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
(error)t P
( , ) ( )i i tP x y D i( , ) ( )i i tP x y D i
1
( )[ ( )]m
t i j ii
D i y h x
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ~ ,( ) yH x
x D yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ( ) ( ( )) ( ( )) 0t t tyH xx t tE e e P y h x e P y h x
0
( ( ))1ln
2 ( ( ))t
tt
P y h x
P y h x
11
ln2
tt
t
1 ?tD
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
I nitialization: 11( ) , 1, ,mD i i m
For :1, ,t T
•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
•Weight classifier:11
ln2
tt
t
•Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
I nitialization: 11( ) , 1, ,mD i i m
For :1, ,t T
•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
•Weight classifier:11
ln2
tt
t
•Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
(error)t P
( , ) ( )i i tP x y D i( , ) ( )i i tP x y D i
1
( )[ ( )]m
t i j ii
D i y h x
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ~ ,( ) yH x
x D yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ?tD
1, ,
t t t tyH yH y hx y x yE e E e e
1 2 2 2,
11
2tyH
x y t t t tE e y h y h
1 2 2 2,
1arg min 1
2tyH
t x y t th
h E e y h y h
2 2 1y h
1 2,
1arg min 1
2tyH
t x y t th
h E e y h
1 21arg min 1 |
2tyH
t x y t th
h E E e y h x
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ~ ,( ) yH x
x D yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ?tD
1 21arg min 1 |
2tyH
t x y t th
h E E e y h x
1arg min |tyHt x y t
hh E E e y h x
1arg max |tyHt x y
hh E E e yh x
1 1( ) ( )arg max 1 ( ) ( 1| ) ( 1) ( ) ( 1| )t tH x H xt x
hh E h x e P y x h x e P y x
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ~ ,( ) yH x
x D yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ?tD
( )1, ~ ( | )arg max ( )yH xtt x y e P y xh
h E yh x maximized when ( ) y h x x
( ) ( )1 1, ~ ( | ) , ~ ( | )( ) ( 1| ) ( 1| )yH x yH xt tt x y e P y x x y e P y x
h x sign P y x P y x
( )1, ~ ( | )( ) |yH xtt x y e P y x
h x sign E y x
1 1( ) ( )arg max 1 ( ) ( 1| ) ( 1) ( ) ( 1| )t tH x H xt x
hh E h x e P y x h x e P y x
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ~ ,( ) yH x
x D yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ?tD
( ) ( )1 1, ~ ( | ) , ~ ( | )( ) ( 1| ) ( 1| )yH x yH xt tt x y e P y x x y e P y x
h x sign P y x P y x
1 ( ), ~ ( | )tyH xx y e P y xAt time t
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ~ ,( ) yH x
x D yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ?tD
1 ( ), ~ ( | )tyH xx y e P y xAt time t
At time 1
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
I nitialization: 11( ) , 1, ,mD i i m
For :1, ,t T
•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
•Weight classifier:11
ln2
tt
t
•Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
I nitialization: 11( ) , 1, ,mD i i m
For :1, ,t T
•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
•Weight classifier:11
ln2
tt
t
•Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
, ~ ( | )x y P y x ( | ) 1i iP y x 11
1 1(1)D
Z m
At time t+1( ), ~ ( | )tyH xx y e P y x ( )t tyh x
tD e
1
( ) exp[ ( ), is for normaliza i
]( ) t ont t i t i
t tt
D i y h xD i Z
Z
AdaBoost & Its Applications
AdaBoost for
Face Detection
The Task ofFace Detection
Many slides adapted from P. Viola
Basic Idea
Slide a window across image and evaluate a face model at every location.
Challenges
Slide a window across image and evaluate a face model at every location.
Sliding window detector must evaluate tens of thousands of location/scale combinations.
Faces are rare: 0–10 per image– For computational efficiency, we should try to spend as little ti
me as possible on the non-face windows– A megapixel image has ~106 pixels and a comparable number of
candidate face locations– To avoid having a false positive in every image image, our false
positive rate has to be less than 106
The Viola/Jones Face Detector
A seminal approach to real-time object detection Training is slow, but detection is very fast Key ideas
– Integral images for fast feature evaluation– Boosting for feature selection– Attentional cascade for fast rejection of non-face window
s
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001.
P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.
Image Features
Rectangle filters Feature Value (Pixel in white area)
(Pixel in black area)
Feature Value (Pixel in white area)
(Pixel in black area)
Image Features
Rectangle filters
Feature Value (Pixel in white area)
(Pixel in black area)
Feature Value (Pixel in white area)
(Pixel in black area)
Size of Feature Space
Feature Value (Pixel in white area)
(Pixel in black area)
Feature Value (Pixel in white area)
(Pixel in black area)
• How many number of possible rectangle features for a 24x24 detection region?
Rectangle filters
12 24
1 1
8 24
1 1
12 12
1 1
2 (24 2 1)(24 1)
2 (24 3 1)(24 1)
(24 2 1)(24 2 1)
w h
w h
w h
w h
w h
w h
A+B
C
D
160,000
Feature Selection
• How many number of possible rectangle features for a 24x24 detection region?
A+B
C
D
160,000What features are good for face detection?
12 24
1 1
8 24
1 1
12 12
1 1
2 (24 2 1)(24 1)
2 (24 3 1)(24 1)
(24 2 1)(24 2 1)
w h
w h
w h
w h
w h
w h
Feature Selection
• How many number of possible rectangle features for a 24x24 detection region?
A+B
C
D
160,000
Can we create a good classifier
using just a small subset of all
possible features?
How to select such a subset?
12 24
1 1
8 24
1 1
12 12
1 1
2 (24 2 1)(24 1)
2 (24 3 1)(24 1)
(24 2 1)(24 2 1)
w h
w h
w h
w h
w h
w h
Integral images
The integral image computes a value at each pixel (x, y) that is the sum of the pixel values above and to the left of (x, y), inclusive.
(x, y)
,
( , ) ( , )x x y y
ii x y i x y
( , 1)ii x y
( 1, )s x y
Computing the Integral Image
The integral image computes a value at each pixel (x, y) that is the sum of the pixel values above and to the left of (x, y), inclusive.
This can quickly be computed in one pass through the image.
(x, y)
,
( , ) ( , )x x y y
ii x y i x y
( , )i x y
( , ) ( 1, ) ( , )s x y s x y i x y
( , ) ( , 1) ( , )ii x y ii x y s x y
Computing Sum within a Rectangle
A B C Dsum ii ii ii ii D B
C AOnly 3 additions are
required for any size of
rectangle!
Only 3 additions are
required for any size of
rectangle!
Scaling Integral image
enables us to evaluate all rectangle sizes in constant time.
Therefore, no image scaling is necessary.
Scale the rectangular features instead!
1 2
3 4
5 6
Boosting
Boosting is a classification scheme that works by combining weak learners into a more accurate ensemble classifier– A weak learner need only do better than chance
Training consists of multiple boosting rounds– During each boosting round, we select a weak
learner that does well on examples that were hard for the previous weak learners
– “Hardness” is captured by weights attached to training examples
Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999.
The AdaBoost Algorithm
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
Initialization: 11( ) , 1, ,mD i i m
For :1, ,t T
• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
:probability distribution of 's at time ( )t iD i x t:probability distribution of 's at time ( )t iD i x t
• Weight classifier: 11
ln2
tt
t
• Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
minimize weighted errorminimize weighted error
for minimize exponential lossfor minimize exponential loss
Give error classified patterns more chance for learning.Give error classified patterns more chance for learning.
The AdaBoost Algorithm
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
Initialization: 11( ) , 1, ,mD i i m
For :1, ,t T
• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
• Weight classifier: 11
ln2
tt
t
• Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
Weak Learners for Face Detection
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
Initialization: 11( ) , 1, ,mD i i m
For :1, ,t T
• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
• Weight classifier: 11
ln2
tt
t
• Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
What base learner is proper for face detection?
What base learner is proper for face detection?
Weak Learners for Face Detection
1 if ( )( )
0 otherwiset t t t
t
p f x ph x
window
value of rectangle feature parity threshold
Boosting Training set contains face and nonface examples
– Initially, with equal weight For each round of boosting:
– Evaluate each rectangle filter on each example– Select best threshold for each filter – Select best filter/threshold combination– Reweight examples
Computational complexity of learning: O(MNK)– M rounds, N examples, K features
Features Selected by Boosting
First two features selected by boosting:
This feature combination can yield 100% detection rate and 50% false positive rate
ROC Curve for 200-Feature Classifier
A 200-feature classifier can yield 95% detection rate and a false positive rate of 1 in 14084.
Not good enough!Not good enough!
To be practical for real application, the false positive rate must be closer to 1 in 1,000,000.
To be practical for real application, the false positive rate must be closer to 1 in 1,000,000.
Attentional Cascade
Classifier 3Classifier 3Classifier 2Classifier 2Classifier 1Classifier 1IMAGESUB-WINDOW
T T FACET
NON-FACE
F F
NON-FACE
F
NON-FACE
We start with simple classifiers which reject many of the negative sub-windows while detecting almost all positive sub-windows
Positive response from the first classifier triggers the evaluation of a second (more complex) classifier, and so on
A negative outcome at any point leads to the immediate rejection of the sub-window
Attentional Cascade
Classifier 3Classifier 3Classifier 2Classifier 2Classifier 1Classifier 1IMAGESUB-WINDOW
T T FACET
NON-FACE
F F
NON-FACE
F
NON-FACE
Chain classifiers that are progressively more complex and have lower false positive rates
0
1000 50
% False Pos
%
Det
ecti
on
ROC Curve
Detection Rate and False Positive Rate for Chained Classifiers
Classifier 3Classifier 3Classifier 2Classifier 2Classifier 1Classifier 1IMAGESUB-WINDOW
T T FACET
NON-FACE
F F
NON-FACE
F
NON-FACE
The detection rate and the false positive rate of the cascade are found by multiplying the respective rates of the individual stages
A detection rate of 0.9 and a false positive rate on the order of 106 can be achieved by a 10-stage cascade if each stage has a detection rate of 0.99 (0.9910 ≈ 0.9) and a false positive rate of about 0.30 (0.310 ≈ 6106 )
11,f d 22 ,f d 33 ,f d
11
, K
i
K
ii i
F f D d
11
, K
i
K
ii i
F f D d
,F D,F D
Training the Cascade
Set target detection and false positive rates for each stage
Keep adding features to the current stage until its target rates have been met – Need to lower AdaBoost threshold to maximize detectio
n (as opposed to minimizing total classification error)– Test on a validation set
If the overall false positive rate is not low enough, then add another stage
Use false positives from current stage as the negative training examples for the next stage
Training the Cascade
ROC Curves Cascaded Classifier to Monlithic Classifier
ROC Curves Cascaded Classifier to Monlithic Classifier
There is little difference between the two in terms of accuracy.
There is a big difference in terms of speed.
– The cascaded classifier is nearly 10 times faster since its first stage throws out most non-faces so that they arenever evaluated by subsequent stages.
There is little difference between the two in terms of accuracy.
There is a big difference in terms of speed.
– The cascaded classifier is nearly 10 times faster since its first stage throws out most non-faces so that they arenever evaluated by subsequent stages.
The Implemented System
Training Data– 5000 faces
All frontal, rescaled to 24x24 pixels
– 300 million non-faces 9500 non-face images
– Faces are normalized Scale, translation
Many variations– Across individuals– Illumination– Pose
Structure of the Detector Cascade
Combining successively more complex classifiers in cascade– 38 stages– included a total of 6060 features
Reject Sub-WindowReject Sub-Window
11 22 33 44 55 66 77 88 3838 Face
F F F F F F F F F
T T T T T T T T T
All Sub-WindowsAll Sub-Windows
Structure of the Detector Cascade
Reject Sub-WindowReject Sub-Window
11 22 33 44 55 66 77 88 3838 Face
F F F F F F F F F
T T T T T T T T T
All Sub-WindowsAll Sub-Windows
2 features, reject 50% non-faces, detect 100% faces
10 features, reject 80% non-faces, detect 100% faces
25 features
50 features
by algorithm
Speed of the Final Detector
On a 700 Mhz Pentium III processor, the face detector can process a 384 288 pixel image in about .067 seconds”– 15 Hz– 15 times faster than previous detector of comparable acc
uracy (Rowley et al., 1998)
Average of 8 features evaluated per window on test set
Image Processing
Training all example sub-windows were variance normalized to minimize the effect of different lighting conditions
Detection variance normalized as well
2 2 21Nm x
Can be computed using integral image
Can be computed using integral image
of squared image
Scanning the Detector
Scaling is achieved by scaling the detector itself, rather than scaling the image
Good detection results for scaling factor of 1.25 The detector is scanned across location
– Subsequent locations are obtained by shifting the window [s] pixels, where s is the current scale
– The result for = 1.0 and = 1.5 were reported
Merging Multiple Detections
ROC Curves for Face Detection
Output of Face Detector on Test Images
Other Detection Tasks
Facial Feature Localization
Male vs. female
Profile Detection
Other Detection Tasks
Facial Feature Localization Profile Detection
Other Detection Tasks
Male vs. Female
Conclusions
How adaboost works? Why adaboost works? Adaboost for face detection
– Rectangle features– Integral images for fast computation– Boosting for feature selection– Attentional cascade for fast rejection of negativ
e windows