Upload
evonne
View
228
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Large Scale Visual Recognition Challenge (ILSVRC) 2013: Classification spotlights. Additions to the ConvNet Image Classification Pipeline Andrew Howard – Andrew Howard Consulting. Changes to Training: - PowerPoint PPT Presentation
Citation preview
Large Scale Visual Recognition Challenge (ILSVRC) 2013:
Classification spotlights
Additions to the ConvNet Image Classification PipelineAndrew Howard – Andrew Howard Consulting
Changes to Training:Use more pixels: Train on square patches from rectangular image instead of cropped central squareAdditional color manipulation of contrast, brightness, color balance used on training patches
Changes to Testing:Make Predictions at different scales and different views which use all pixelsPrevious: Used 10 predictions (2 flips * 5 translations)This Submission: Used 90 predictions (2 flips * 5 translations * 3 scales * 3 views)The number of predictions can be reduced with no loss of accuracy with stagewise regression
Higher Resolution Models:Use a fully trained model and fine tune on image patches from a higher resolution imageThis can be trained in about 1/3 the number of epochsPredictions on higher resolution images give complimentary predictions to the base model
Final Vision System achieves 13.6% error and is made of 5 base models and 5 higher resolution modelsStructure is the same as last year with fully connected layers twice as large, which doesn’t add much value
Use Patches From:
Instead of Patches From:
View 1: View 2: View 3:
Cognitive Psychology Inspired Image Classification using Deep Neural Network
Kuiyuan Yang, Microsoft ResearchYalong Bai, Harbin Institute of Technology
Yong Rui, Microsoft Research
CognitiveVision team
Our Classification Scheme
Dog Cat
French bulldog
English setter
Maltese dog
Basic CategoryClassification Easy to
distinguish
DogClassification
Given a image, predict its basic category firstly.
…
Egyptian cat
Siamese cat
tiger cat
CatClassification
dalmatian
…
Predict sub category
CognitiveVision team
Caffe: Open-Sourcing Deep LearningYangqing Jia, Trevor Darrell, UC Berkeley
• Convolutional Architecture for Fast Feature Extraction– Seamless switching between CPU and GPU– Fast computation (2.5ms / image with GPU)– Full training and testing capability– Reference ImageNet model available
• A framework to support multiple applications:
Publicly available at http://caffe.berkeleyvision.org/
Classification Embedding Detection Your nextApplication!
Experiments for large scale visual recognition
Deep CNN (following Krizhevsky et al’12)
We tried:+
Low level features &spatial granularities
Where did we fail?
Television (0.18) Hair spray (0.18) Coffee mug (0.10) Flute (0.10)
- TV vs. Screen,
- Coffee mug vs. Cup,
- Flute vs. Microphone,
- …
top 1 acc = 0.567
Appliance and instrument are confusing for us, including
8:30 Classification&localization
10:30 Detection
Noon Discussion panel
14:00 Invited talk by Vittorio Ferrari: Auto-annotation and self-assessment in ImageNet
14:40 Fine-Grained Challenge 2013
Agenda
http://www.image-net.org/challenges/LSVRC/2013/iccv2013
8:50 9:05 9:20 9:35 9:50 Spotlights
10:50 11:10 11:30 11:40Spotlights