Recurrent Image Annotator

Recurrent Image Annotator for Arbitrary Length Image TaggingJIREN JINNAKAYAMA LAB

1. Introduction to Automatic Image Annotation

Automatic Image Annotation (AIA)

Difficulties of the TaskMost previous work focus on several problems:• label sparsity• label imbalance• incorrect/incomplete labels

The basic way is to utilize:• image-to-tag correlation• tag-to-tag correlation

Existing Methods• generative models (distribution over image features and annotation tags), Yu et al.• discriminatively trained classifiers, Claudio et al.• based on K-nearest-neighbor (KNN), Guillaumin et al.• based on Object detection, Song et al.

2. The Missing Part: Annotation Length

Missing Part: Annotation LengthConventional evaluation has a fixed annotation length • annotate k most relevant keywords • evaluate retrieval performance per keyword• average over keywords• typical k value is 5 or 3Why did they do this?• for ease of comparison with previous results• most existing methods cannot trivially predict proper number of tags

Why Annotation Length MattersFixed annotation length:• not the natural way that we humans annotate images • not the fact of realistic images

Problem to solve: predict results with

arbitrary length AL:arbitrary length

T5: top-5

Ground truth

3. Our Solution: Recurrent Image Annotator

Sequence generation• just output them one by one -> arbitrary annotation length• previous outputs influence the current output -> tag-to-tag correlation

Inspired by machine translation and image captioning• image or language A’s sentence to be encoded• image description or language B’s sentence to be decoded

Natural Way for Arbitrary Length Outputs

Karpathy, et al. (2014)

Vinyals, Oriol, et al. (2014).

What Else We NeedAn order of the tags• Both image captioning and machine translation aim to generate sentences, which have a natural order. • Unfortunately, in image annotation task, order is not available.• We have to choose or learn an order. Points for a useful order “rule”:• should be based on semantic image and tag information • tag sequences in each training example should follow the same rule to be sorted

Easy to learn Good for generation

Contributions1.analyze the insufficiency in existing methods:

◦ unable to generate image dependent number of tags

2.first to form image annotation as a sequence generation problem◦ propose a novel RNN based model Recurrent

Image Annotator 3.propose and evaluate several orders for

sorting the tag inputs ◦ show the importance of tag order in tag sequence

generation problem

Recurrent Image Annotator (RIA)

4. Submodules of Recurrent Image Annotator

Neural Networks

Hidden layer: linear transformation + nonlinear activation function (e.g., sigmoid function)Simple network from

Wikipedia

Fully-connected

Convolutional Neural Networks

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

Local connectivity

Shared weights

3D volumes of neurons

Recurrent Neural Networks

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

Long Short Term Memory NetworksAn improved version of RNN:• Remember information for long periods of time• Use gating units to control information flow through time steps

S.Hochreiter and J.Schmidhuber, 1997

Core idea of LSTM:

the cell state

easy for information to just

flow along it unchanged http://colah.github.io/posts/2015-08-Understanding-LSTMs/

5. Experimentation

Dataset 1: Corel 5KVocabulary size

Number of images

Words per image

3.4 (maximum is 5)

Images per word

58.6 (maximum is 1004)

Dataset 2: ESP GAMEVocabulary size

Number of images

18,689

Words per image

4.7(maximum is 15)

Images per word

362.7 (maximum is 4553)

Dataset 3: IAPR-TC12Vocabulary size

Number of images

17,665

Words per image

5.7 (maximum is 23)

Images per word

347.7 (maximum is 4999)

Evaluation Measures• precision, P (averaged over classes)• recall, R (averaged over classes)• f-measure, F (averaged over classes)• the number of classes with non-zero recall value, N+

Different Orders for Tag Sequences• dictionary order: alphabetical order• random order: random sorting tags in each training example• frequent-first order: put the frequent tags ahead rare tags• rare-first order: put the rare tags ahead frequent tags

6. Analysis and Conclusion

Arbitrary Length Annotation (1)

Compare Influence of Different Orders

P: precisionR: recallF: f-measureN+: the number of class with non-zero recall valuesLarger value represents better performance.

Analysis of Results for Different Orders Why rare-first outperforms frequent-first:• “rare” means rare in the datasets, however, for the single image, it may represent more importance• frequent tags are easier to predict than rare tags naturally, while frequent-first order makes the easy task easier, but the difficult task more difficult• correctly predicting rare tags is more important in the per-class evaluation measure

Top-5 Annotation

P: precisionR: recallF: f-measureN+: the number of class with non-zero recall values

Much faster testing speed: Constant time (5ms) for each testing image,instead of O(N) in KNN based methods.N: number of training images

Conclusion• transform image annotation to sequence generation problem • achieve comparable performance to state-of-the-art methods • decide appropriate annotation length automatically • obtain a much faster testing speed• confirm the importance of a proper tag sequence order

Output of This Work1.Accepted by International Conference

on Pattern Recognition (ICPR) 2016 (oral)

2.Web demo for RIA: www.nlab.ci.i.u-tokyo.ac.jp/annotator

Future workImprove the strategy to obtain the tag sequence order• e.g., use reinforcement learning to learn the order automatically

Extend to personal preference annotation• consider eye-catching effect, etc.

Recurrent Image Annotator

Documents

Recurrent residual U-Net for medical image segmentation

DRAW: A Recurrent Neural Network For Image Generation · Overview DRAW:Deep Recurrent Attentive Writer Ideas: generateimagessequentially spatialattentionmechanism,thatlearnswheretolook

Referring Image Segmentation via Recurrent Reﬁnement … · Referring Image Segmentation via Recurrent Reﬁnement Networks Ruiyu Li y, Kaican Li , Yi-Chun Kuo , Michelle Shuz,

EUDICO Linguistic Annotator (ELAN) - mpi.nl

Automatic Artifact Annotator for EEG Waves Using Recurrent

Recurrent Multimodal Interaction for Referring Image ...openaccess.thecvf.com/content_ICCV_2017/papers/Liu... · Recurrent Multimodal Interaction for Referring Image Segmentation

Review of EUDICO Linguistic Annotator (ELAN)

Using “Annotator Rationales” to Improve Machine …ozaidan/rationales/Zaidan_etal_rationales_handout.… · 1 Zaidan et. al – Annotator RationalesUsing “Annotator Rationales”

SACODEYL Annotator

Oracle interMedia Annotator€¦ · Oracle interMedia Annotator User’s Guide Release 9.0.1 June 2001 Part No. A88784-01 Oracle interMedia Annotator is a utility that extracts metadata

EUDICO Linguistic Annotator (ELAN) version 1.4 Manual … · 2003-08-15 · EUDICO Linguistic Annotator (ELAN) version 1.4 Manual ... ELAN (EUDICO Linguistic Annotator) is an annotation

iAnnotate 2016 - Demo Pundit web annotator

Mind’s Eye: A Recurrent Visual Representation for Image Caption … · 2015-05-24 · Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation Xinlei Chen1,

Scale-recurrent Network for Deep Image Deblurringjiaya.me/papers/scaledeblur_cvpr18.pdf · 2018-08-09 · Scale-recurrent Network for Deep Image Deblurring Xin Tao 1 ; 2Hongyun Gao

Stellar Optical Photography Annotator - Stanford Universityyj296hj2790/Wilson_Stellar... · Stellar Optical Photography Annotator Robert Wilson Department of Electrical Engineering

Recurrent Relational Memory Network for Unsupervised Image

Demo Pundit Web annotator

Anoto Medical Image Annotator - vis.berkeley.eduvis.berkeley.edu › ... › c › c0 › Anototators_Final_Report.pdf · 2. Problem and Solution Overview Radiologists and medical

Recurrent Multimodal Interaction for Referring Image

Annotator 300 Brochure