The Promise and Perils of Benchmark Datasets and Challenges
David Forsyth, Alyosha Efros, Fei-Fei Li, Antonio Torralba and Andrew Zisserman
So many datasets …• Cover many areas of Computer Vision
• Tremendous growth both in number of datasets and size of datasets over the last decade
• Datasets drive and enable research and success
• The Tyranny of datasets
UIUC2002
Caltech‐42003
1970 1990 2000 2010
time
Numberof categories
1
4
20
COIL‐201996.
101
all
PASCAL2007
80 millionimages
Feret
Caltech 101
Middlebury Stereo Datasets
Berkeley Segmentation Data Set 500
Large scale instance retrievalOxford Buildings Dataset INRIA Holidays Dataset
The Indoor Scene Dataset
• 67 indoor categories
• 15620 images
• At least 100 images per category
• Training 67 x 80 images
• Testing 67 x 20 images
• A. Quattoni, and A.Torralba. Recognizing Indoor Scenes. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
Caltech Pedestrian Dataset
• 350,000 labeled pedestrian bounding boxes
• 250,000 frames
Fine grained visual categorization
Caltech-UCSD birds 200The Oxford Flowers 102
Material recognition
Exploring Features in a Bayesian Framework for Material RecognitionCe Liu, Lavanya Sharan, Edward H. Adelson, and Ruth RosenholtzCVPR 2010
Person layout
Oxford Buffy Stickmen276 frames x 6 = 1656 body parts (sticks)
PASCAL VOC “Person Layout”
Berkeley H3DETHZ Pascal stickmen set
549 images x6 = 3294 body parts (sticks)
Human action recognitionHollywood2 dataset
Goals of this session• Tease out what it is about datasets that makes them
useful
• Recommendations on how to move forwards in designing and selecting new datasets
Program• Three examples of successful datasets and challenges
1. LabelMe, 80M tiny – Antonio 2. PASCAL – AZ3. ImageNet – Fei Fei
• Perils & Promise – Alyosha & Antonio
• Promise – David Forsyth
• Discussion