Latent SVMs for Human Detection with a Locally Affine Deformation Field



Latent SVMs for Human Detection with a Locally Affine Deformation Field. Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1. 1 University of Oxford 2 Oxford Brookes University. Object Detection. Find all objects of interest Enclose them tightly in a bounding box. - PowerPoint PPT Presentation

Citation preview

Latent SVMs for Human Detection with a Locally Affine Deformation Field

Ľubor Ladický1 Phil Torr2 Andrew Zisserman1

1 University of Oxford 2 Oxford Brookes University

Object Detection

• Find all objects of interest

• Enclose them tightly in a bounding box

HOG Detector

Dalal & Triggs CVPR05

• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

Dalal & Triggs CVPR05

HOG Detector

• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

Dalal & Triggs CVPR05

HOG Detector

• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

Dalal & Triggs CVPR05

HOG Detector

• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

Dalal & Triggs CVPR05

HOG Detector

• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

Dalal & Triggs CVPR05

HOG Detector

• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

Classifier response :

HOG Detector

Dalal & Triggs CVPR05

The weights w* and the bias b* learnt using the Linear SVM as:

Dalal & Triggs CVPR05

Does not fit well !• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

HOG Detector

Deformable Part-based Model

Felzenszwalb et al. CVPR08

• Allows parts to move relative to the centre

• Effectively allows the template to deform

• Multiple models based on an aspect ratio

Felzenszwalb et al. CVPR08

Deformable Part-based Model

• Allows parts to move relative to the centre

• Effectively allows the template to deform

• Multiple models based on an aspect ratio

Cells c displaced by the deformation field d:

Our approach

Cells c displaced by the deformation field d:

Our approach

Regularisation takes the form of a pairwise MRF:

Classifier response :

Cells c displaced by the deformation field d:

Our approach

The weights w* and the bias b* learnt using the Latent Linear SVM as:

Classifier response :

Comparison with our approach

HOG template

(no deformation)

Part-based model

(rigid movable parts)

Our model

(deformation field)

Our approach

Why hasn’t anyone tried it before?

Our approach

Why hasn’t anyone tried it before?

• Latent models with many latent variables tend to overfit

• Inference not feasible for a sliding window

Our approach

Why hasn’t anyone tried it before?

• Latent models with many latent variables tend to overfit

• Inference not feasible for a sliding window

• Deformation field used before only for

• Classification task (Duchenne et al ICCV11)

• Rescoring of detections (Ladický, PhD thesis)

Our approach

We restrict the deformation field to be locally affine ( ):

Our approach

We restrict the deformation field to be locally affine ( ):

Our approach

We restrict the deformation field to be locally affine ( ):


Weights / bias (w*, b*) and the deformation fields dk estimated iteratively


Weights / bias (w*, b*) and the deformation fields dk estimated iteratively

Given the deformation fields the problem is a standard linear SVM:


Weights / bias (w*, b*) and the deformation fields dk estimated iteratively

Given (w*, b*) the problem is a constrained MRF optimisation:

The last can be decomposed as :

By defining the optimisation becomes:


• The location of the cells in the first row and in the first column fully

determine the location of each cell

• Any locally affine deformation field can be reached by two moves :

• each column i moves by (Δcdix ,Δcdi


• each row j moves by (Δrdjx ,Δrdj



• The location of the cells in the first row and in the first column fully

determine the location of each cell

• Any locally affine deformation field can be reached by two moves :

• each column i moves by (Δcdix ,Δcdi


• each row j moves by (Δrdjx ,Δrdj



• The location of the cells in the first row and in the first column fully

determine the location of each cell

• Any locally affine deformation field can be reached by two moves :

• each column i moves by (Δcdix ,Δcdi


• each row j moves by (Δrdjx ,Δrdj


• Such moves do not alter the local affinity


• The location of the cells in the first row and in the first column fully

determine the location of each cell

• Any locally affine deformation field can be reached by two moves :

• each column i moves by (Δcdix ,Δcdi


• each row j moves by (Δrdjx ,Δrdj


• Such moves do not alter the local affinity

• Both moves can be solved quickly using dynamic programming

Learning multiple poses / viewpoints

Learning multiple poses / viewpoints

• We define a similarity measure between two training samples as :


Learning multiple poses / viewpoints

• We define a similarity measure between two training samples as :


• K-medoid clustering of S matrix clusters the data into multi model


• Buffy dataset (typically used for pose estimation)

• Contains large variety of poses, viewpoints and aspect ratios

• Consists of 748 images

• Episode s5e3 used for training

• Episode s5e4 used for validation

• Episodes s5e2, s5e5 and s5e6 used for testing

Ferrari et al. CVPR08

Clustering of training samples

Each row corresponds to one model (out of 10 models)

Qualitative results

Quantitative results


• We propose

• Novel inference for locally affine deformation field (LADF)

• Object detector using LADF

• Clustering using LADF


Thank you