50
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/CAP6412.html Boqing Gong Feb 02, 2016

CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

CAP6412AdvancedComputerVision

http://www.cs.ucf.edu/~bgong/CAP6412.html

Boqing GongFeb02,2016

Page 2: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Today

• Administrivia• R-CNNReview&ProjectI• ImageCaptioning,byHarish• Neuralnetworks&Backpropagation(PartV)

Page 3: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Pastdue(02/02Tuesday,12pm)

• Assignment3:Reviewthefollowingpaper

{Major}Karpathy,Andrej,andLiFei-Fei."Deepvisual-semanticalignmentsforgeneratingimagedescriptions."arXiv preprintarXiv:1412.2306 (2014).

Templateforpaperreview:http://www.cs.ucf.edu/~bgong/CAP6412/Review.docx

Page 4: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Upcomingdue(02/04Tuesday,12pm)

• Assignment4:Reviewthefollowingpaper

{Major}Xu,Kelvin,JimmyBa,RyanKiros,AaronCourville,RuslanSalakhutdinov,RichardZemel,andYoshua Bengio.“Show,attendandtell:Neuralimagecaptiongenerationwithvisualattention.”arXivpreprintarXiv:1502.03044(2015).

Templateforpaperreview:http://www.cs.ucf.edu/~bgong/CAP6412/Review.docx

Page 5: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

NextweekWeek2 CNNvisualization&objectrecognition

Week3 CNN&objectlocalization

Week4 CNN &transferlearning

Week5 CNN&segmentation,super-resolution

Week6 CNN&videos(opticalflow,pose)

Week7 Imagecaptioning&attentionmodel

Week8 Visualquestionanswering

Week9 Attentionmodel,aligningbookswithmovies

Week10--16 Video:tracking,action,surveillanceHuman-centered CV3DCVLow-levelCV,etc.

Page 6: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Nextweek:CNN&Segmentationandsuper-resolution

Tuesday(02/09)

Jose Sanchez

[Super-resolution] Dong, Chao, Chen Change Loy, Kaiming He, andXiaoou Tang. “Learning a deep convolutional network for imagesuper-resolution.” In Computer Vision–ECCV 2014, pp. 184-199.Springer International Publishing, 2014. (Extended version on ArXiv)& Secondary papers

Thursday(02/11)

Goran Igic

[Edge detection] Xie, Saining, and Zhuowen Tu. “Holistically-NestedEdge Detection.” In Proceedings of the IEEE International Conferenceon Computer Vision, 2015.& Secondary papers

Page 7: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Today

• Administrivia• R-CNNReview&ProjectI• ImageCaptioning,byHarish• Neuralnetworks&Backpropagation(PartV)

Page 8: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Slidecredit:RossGirshick

Page 9: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

ProjectI:R-CNNattesttime

• INPUT:animage• 1. Extractdetectionproposals(cf.Samer’s presentationon01/26)• 2.Warpproposalsto227-by-227• 3. ExtractCNNfeaturesforeachproposal(region)byCaffe• Forclassc=1,2,…20

• 4. OutputadetectionscoreforeachproposalbySVM(proposal,classc)• 5. Nonmaximumsuppressionusingthescoresofclassc• 6. Regressionforthesurvivedproposals

• OUTPUT:bounding boxeseachwithaclasslabel&adetectionscore

Page 10: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

ProjectI:R-CNNattrainingtime(bonus)

• INPUT:animage• 1.Extractdetectionproposals(10pts)• 2.Warpproposalsto227-by-227• 3.ExtractCNN featuresforeachproposal(region)byCaffe (30pts)• Forclassc=1,2,…20

• 4.OutputadetectionscoreforeachproposalbySVM(proposal,classc)(10pts)• 5.Nonmaximumsuppressionusingthescoresofclassc• 6.Regression forthesurvivedproposals(10pts)

• OUTPUT:bounding boxeseachwithaclasslabel&adetectionscore

Page 11: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

ProjectI:Gradingcriteria

• Total:100points+60bonuspoints +x pointstopromoteinnovation

• Quantitativeresults(65pts)• DetectionaverageprecisiononVOC2012validation(40pts)• DetectionaverageprecisiononVOC2012validationbeforeregression(10pts)• DetectionaverageprecisiononVOC2012validationwith1000proposals(15pts)

• Qualitativeresults(35pts)

Page 12: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

ProjectI:Resources

• Technicalreportathttp://arxiv.org/abs/1311.2524• Ross‘Github repository:https://github.com/rbgirshick/rcnn

Page 13: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

ProjectI:Objective

• Getfamiliarwiththestate-of-the-artobjectdetectionpipeline• LearnaboutPASCALVOC• Knowhowtobenchmarkdifferentalgorithms

• Benchmarkdatasets• Taskspecification• Evaluationprocedureandmetrics

• Benefitfutureresearch/R&D

Page 14: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Today

• Administrivia• R-CNNReview&ProjectI• ImageCaptioning,byHarish• Neuralnetworks&Backpropagation(PartV)

Page 15: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Uploadslidesafterclass

• See“PaperPresentation”onUCFwebcourse

• Sharingyourslides• Refertotheoriginalssourcesofimages,figures,etc.inyourslides• ConvertthemtoaPDFfile• UploadthePDFfileto“PaperPresentation”afteryourpresentation

Page 16: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Deep Visual-Semantic Alignments for Generating Image Descriptions

Andrej Karpathy & Li Fei-FeiStanford University

Presented by Harish [email protected]

Page 17: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Motivation

• Humans can do it!

• “Build a bridge between natural language & images” – Karpathy

Page 18: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Problem Statement

• Generate Dense Image Descriptions

• Build a better correspondence between image and their sentence descriptions

Figures from http://bit.ly/rankingdemo

Page 19: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Main Contributions

Slide credit : Karpathy

Page 20: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Approach Outline

• Alignment Inference Model

– R-CNN

– BRNN (Bidirectional Recurrent Neural Network)

– MRF

• Multimodal RNN

Page 21: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

R-CNN Stage

• Use whole image + top 19 detected locations (total 20) from RCNN

• CNN pre-trained on ImageNet & fine-tuned

– 𝐼𝑏 - pixels inside bounding box

– 𝐶𝑁𝑁𝜃𝑐(𝐼𝑏) – FC7 output

– 𝑏𝑚 - bias (to be learned)

–𝑊𝑚 - Weight Matrix (to be learned)

Page 22: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

BRNN

Figure from M. Schuster and K. K. Paliwal. Bidirectional recurrent neural

Page 23: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

BRNN Training

Figure from M. Schuster and K. K. Paliwal. Bidirectional recurrent neural

Page 24: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

• BRNN input – sequence of N words

• BRNN output – N h-dimensional vectors

Page 25: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Inferring Word Alignments

Slide credit : Karpathy

Page 26: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

MRF (Markov Random Field)

• Purpose – Smoothing

• Encourage nearby words to point to the same region

Page 27: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Simple RNN

w(t) – one hot representation of current word𝑓1() – sigmoid function𝑔1() – softmax function

Figure from Mao et. Al : Explain Images with Multimodal Recurrent Neural Networks

Page 28: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Multimodal RNN

Page 29: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Experiments

• Datasets

– Flickr8K

– Flickr30K

– MSCOCO

• Preprocessing

– Convert to lowercase

– Eliminate OoV (Out of Vocabulary)

Page 30: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Generated Descriptions – Full Frame

Figures from http://bit.ly/neuraltalkdemo

Page 31: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Figures from http://bit.ly/neuraltalkdemo

Page 32: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Generated Descriptions – Region

Page 33: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution
Page 34: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution
Page 35: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Related Work

Junhua Mao1,2,Wei Xu1, 𝑌𝑖 𝑌𝑎𝑛𝑔1, 𝐽𝑖𝑎𝑛𝑔 𝑊𝑎𝑛𝑔1, 𝐴𝑙𝑎𝑛 𝐿. 𝑌𝑢𝑖𝑙𝑙𝑒2

1Baidu Research

2University of California, Los Angeles

Explain Images With Multimodal Recurrent Neural Networks

Page 36: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

• Goal : Generate novel sentence descriptions to explain the contents of images

Figure from Mao et. Al : Explain Images with Multimodal Recurrent Neural Networks

Page 37: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

• Tasks

– Sentence generation

– Sentence retrieval

– Image retrieval

Page 38: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution
Page 39: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Oriol Vinyals, Alexander Toshev, Samy Bengio & Dumitru Erhan

Google

Show and Tell : A Neural Image Caption Generator

Page 40: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

• Goal : Generate novel sentence descriptions to explain the contents of images

Figures from Vinyals et. al : Show and Tell

Page 41: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution
Page 42: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Xinlei Chen1, C. Lawrence Zitnick2

1Carnegie Mellon University

2Microsoft Research

Mind’s Eye: A Recurrent Visual Representation for Image Caption

Generation

Page 43: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

• Goal : Generate novel captions, reconstructing image features given an image description

Page 44: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution
Page 45: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Comparative Results

Page 46: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution
Page 47: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Conclusion

• Region based dense descriptions

• Multimodal RNN

• Novel model to infer alignments

Page 48: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Future Directions

• Use LSTM in the m-RNN model

• Try different CNNs – VGGNet, GoogLeNet

• Changing the RNN hidden layer function from Sigmoid to ReLU

• Adding Mind’s Eye paper approach – will it work?

Page 49: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution

Some Useful Videos

• Recurrent Neural Networks and LSTMhttps://www.youtube.com/watch?v=56TYLaQN4N8

• Automated Image Captioning with ConvNets and Recurrent Netshttps://www.youtube.com/watch?v=xKt21ucdBY0

Page 50: CAP 6412 Advanced Computer Vision - UCF Computer Sciencebgong/CAP6412/lec7.pdf · Week 3 CNN & object localization Week 4 CNN& transfer learning Week 5 CNN& segmentation, super-resolution