Upload
donna-brown
View
229
Download
2
Tags:
Embed Size (px)
Citation preview
Describing People: A Poselet-Based Approach to
Attribute Classification
Lubomir Bourdev1,2 Subhransu Maji1
Jitendra Malik1
1EECS U.C. Berkeley 2Adobe Systems Inc.
Goal: Extract attributes from images of people
Who has long hair?
Who has short pants?
Male or female?
Prior work on poselets and on
attributes
Prior work on Poselets• Introduced by [Bourdev and Malik, ICCV09]• Detection with poselets [Bourdev et al, ECCV10]• Applications
• Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11]• Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et
al, ICCV11]• Human parsing [Wang et al, CVPR11]• Semantic contours [Hariharan et al, ICCV11]• Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Poselets• Introduced by [Bourdev and Malik, ICCV09]• Detection with poselets [Bourdev et al, ECCV10]• Applications
• Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11]• Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et
al, ICCV11]• Human parsing [Wang et al, CVPR11]• Semantic contours [Hariharan et al, ICCV11]• Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Poselets• Introduced by [Bourdev and Malik, ICCV09]• Detection with poselets [Bourdev et al, ECCV10]• Applications
• Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11]• Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et
al, ICCV11]• Human parsing [Wang et al, CVPR11]• Semantic contours [Hariharan et al, ICCV11]• Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Poselets• Introduced by [Bourdev and Malik, ICCV09]• Detection with poselets [Bourdev et al, ECCV10]• Applications
• Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11]• Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et
al, ICCV11]• Human parsing [Wang et al, CVPR11]• Semantic contours [Hariharan et al, ICCV11]• Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Poselets• Introduced by [Bourdev and Malik, ICCV09]• Detection with poselets [Bourdev et al, ECCV10]• Applications
• Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11]• Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et
al, ICCV11]• Human parsing [Wang et al, CVPR11]• Semantic contours [Hariharan et al, ICCV11]• Subordinate level categorization [Farrell et al, ICCV11]
Prior work on AttributesAttributes as intermediate
parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes &
attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on AttributesAttributes as intermediate
parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes &
attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on AttributesAttributes as intermediate
parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes &
attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on AttributesAttributes as intermediate
parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes &
attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on AttributesAttributes as intermediate
parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes &
attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on AttributesAttributes as intermediate
parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes &
attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on AttributesAttributes as intermediate
parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes &
attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on AttributesAttributes as intermediate
parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes &
attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on AttributesAttributes as intermediate
parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes &
attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on AttributesAttributes as intermediate
parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes &
attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on AttributesAttributes as intermediate
parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes &
attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Poselets for Attribute Classification
Male or female?
Gender recognition is easier if we factor out the
pose
Poselets
[Bourdev & Malik ICCV09]
Poselets
Examples may differ visually but have common semantics
How do we train a poselet?
Finding correspondences at training time
Given part of a human pose
How do we find a similar pose configuration in the training set?
We use keypoints to annotate the joints, eyes, nose, etc. of people
Left Hip
Left Shoulder
Finding correspondences at training time
Residual Error
Finding correspondences at training time
Training poselet classifiers
Residual Error:
0.15
0.20
0.10
0.35
0.15
0.85
1. Given a seed patch2. Find the closest patch for every
other person3. Sort them by residual error4. Threshold them
Training poselet classifiers
1. Given a seed patch2. Find the closest patch for every
other person3. Sort them by residual error4. Threshold them5. Use them as positive training
examples to train a linear SVM with HOG features
Attribute Classification Algorithm at Test Time
Goal: Extract attributes of this person
Goal: Extract attributes of this person
Target person boundsBounds of other nearby people
Input:
Step 1: Detect poselet activations
[Bourdev et al, ECCV10]
Step 2: Cluster the activations
[Bourdev et al, ECCV10]
Step 3: Predict person bounds
[Bourdev et al, ECCV10]
Step 4: Identify the correct cluster
Max-flow in bipartite graph
PoseletActivations
Start with its poselet activations
Features
FeaturesFeatures
PoseletActivations
• Pyramid HOG• LAB histogram• Skin features
• Hands-skin• Legs-skin
Poseletpatch
B .* CSkinmask
Armsmask
PoseletActivations
Features
Poselet-levelAttributeClassifiers
Attribute Classification Overview
PoseletActivations
Features
Poselet-levelAttributeClassifiers
Person-levelAttributeClassifiers
Attribute Classification Overview
PoseletActivations
Features
Poselet-levelAttributeClassifiers
Person-levelAttributeClassifiers
Context-levelAttributeClassifiers
Attribute Classification Overview
Results
Our dataset• Source: VOC 2010 trainval for Person + H3D
• ~8000 annotations (4000 train + 4000 test)
• 9 binary attributes specified by 5 independent annotators via AMT
• Ground truth label: If 4 of the 5 agree
• Dataset will be made publicly available
Visual search on our test set
“Female”
“Wears hat”
“Has long hair”
“Wears glasses”
“Wears shorts”
“Has long sleeves”
“Doesn’t have long sleeves”
Our baseline• Canny-modulated HOG with SPM kernel [Lazebnik et al
CVPR06]
• To help the baseline trained separate SPM for four viewpoints:
• For each attribute we pick the best SPM as our baseline
Full view Head zoom Upper body Legs
Precision/recall on our test setLabel
frequency- - - -
___SPM
___No context
___FullModel
State-of-the-art Gender Recognition
• We outperform Cognitec (top-notch face recognizer)
• We outperform any gender recognizer based on frontal faces (are there others?)• 61% of our test have frontal faces.• Even with perfect classification of frontal faces,
max AP=80.5% vs. our AP of 82.4%
Men most confused as women
Confusions
Women most confused as menbaseball hat
long hair
hair hidden
Short pants most confused to be long pants
Non-T-shirt most confused to be T-shirt
annotationerrors
Are these pants short?
wrong person
occlusion
Best poselets per attribute
Gender:
Long Hair:
Wears glasses:
“A woman with long hair, glasses and long pants”(??)
We can describe a picture of a person
Conclusion
How poselets help in high-level vision
The image is a complex function of the viewpoint, pose,
appearance, etc.
Poselets decouple pose and camera view
from appearance
Google “poselets” to get:
• The set of published poselet papers• H3D data set + Matlab tools• Java3D annotation tool + video tutorial• Matlab code to detect people using poselets• Our latest trained poselets
“A man with short hair, glasses, short sleeves and shorts”
“A man with short hair and long sleeves”“A person with short hair,
no hat and long sleeves”
“A woman with long hair, glasses, short sleeves andlong pants”
“A person with long pants”
“A computer vision professor who likes machine learning”
Failure modePoselets website
http://eecs.berkeley.edu/~lbourdev/poselets
• The set of published poselet papers• H3D data set + Matlab tools• Java3D annotation tool + video tutorial• Matlab code to detect people using poselets• Our latest trained poselets