scene description

FROM IMAGES TO SENTENCES

SCENE DESCRIPTION

KHUSHALI ACHARYA1517MECE30008

AGENDA

1 Introduction

2 Motivation

3 Related research work

4 Our Approach

5 Conclusion

INTRODUCTION TO SCENE DESCRIPTION

WHAT IS IT?

“Interpreting images and generating sentences. ”

“Scene interpretation means understanding every-day occurrences or recognizing rare events.”

“Scene interpretations are Controlled Hallucinations.”

WHY DO WE NEED SUCH SYSTEMS?

Isn't a picture enough to depict the things clearly?

Ever imagined a cricket match without commentary? Or A movie without dialogues?

It has been estimated that more than 80% of the activities we do online are

text-based.

A laymen can’t understand the medical reports unless the doctor makes him understand or it is in written form.

Medical reports

Tagged location becomes clear if the place name is described.

Thus we saw that the description of an image adds an interestingness measure

to it.

MOTIVATION TO THE PROBLEM

WHAT ARE THE REAL TIME APPLICATIONS?

• Self-aware cognitive robots• Assist Visually impaired people• Soccer game analysis• Image search/ retrieval systems• Street traffic observations• Criminal act recognition• Agricultural sector

Some Applications of scene description

HOW IT IS HELPFUL TO SOCIETY?

Assists Visually Impaired People

Screen Reader

Screen Readers are software programs that convert text into synthesized speech and blind people are able to listen to web content.

LIMITATIONS:

• Screen readers cannot describe images.• Screen readers cannot survey the entirety of a web page as a visual user might do. It cannot always intelligently skip over extraneous content, such as advertisements or navigation bars.

Scene description can be helpful to blind people in this manner.

Images are captured and unusual activities are recorded.

The features of images are extracted which thus help in crime investigation.

Criminal Act Recognition

Efficient and consistent scene interpretation is a

prerequisite self aware cognitive robots to work.

Human Computer Interaction

Object Recognition and scene interpretation

Spatial Relation Extraction

RELATED EXISTING RESEARCH WORK

RESEARCH PAPERS

Sr. No. PAPER PROPOSED DATASET CONCLUSION

1. “Midge: Generatingimage descriptions fromcomputer visiondetection”U. of Aberdeen and Oregon Healthand Science University, Stony BrookUniversity, U. of Maryland, ColumbiaUniversity, U. of Washington, MIT.

This paper introduces anovel generationsystem that composeshumanlike descriptionsof images from computervision detections.

For training:700,000(Flickr, 2011) imageswithassociateddescriptions from thedataset in Ordonezet al. (2011).For evaluation:840PASCALimages.

Midge generates a well-formed description of animage by filtering attributedetections that are unlikelyand placing objects into anordered syntacticstructure.

2. “Every picture tells a

story: Generating

sentences

from images.”Farhadi, A., Hejrati, S. M. M.,

Sadeghi, M. A., Young, P.,

Rashtchian, C., Hockenmaier, J., and

Forsyth, D. A.

(2010). Springer.

attempts

to “generate” sentences by

first learning from

a set of human annotated

examples, and producing

the same sentence if both

images and sentence

share common properties

in terms of their triplets:

(Nouns-Verbs-Scenes).

PASCAL 2008 imageswith humanannotation

Sentences are rich, compactand subtle representationsof information. Even so, wecan predict good sentencesfor images that people like.The intermediate meaningrepresentation is one keycomponent in our model asit allows benefiting fromdistributional semantics.

Sr. No. PAPER PROPOSED DATASET CONCLUSION

3. “Babytalk: Understandingand generating simpleimage descriptions”G Kulkarni, V Premraj, V Ordonez.IEEE TRANSACTIONS ON PATTERNANALYSIS AND MACHINEINTELLIGENCE, VOL. 35, NO. 12,DECEMBER 2013

It uses detector for objectscene detection and makequadruplet: (Nouns-Verbs-Scenes-preposition).

PASCAL 2008images

Human-forced choice

experiments

demonstrate the quality of the

generated sentences

over previous approaches. One

key to the success of our

system was automatically mining

and parsing large text

collections to obtain statistical

models for visually descriptive

language.

4. “Choosing Linguistics over

Vision to Describe

Images”Ankush Gupta, Yashaswi Verma, C.V. JawaharInternational Institute ofInformation Technology, Hyderabad,India – 500032

Problem of automaticallygenerating human-likedescriptions for unseenimages,given a collection ofimages and theircorrespondinghuman-generateddescriptions.

PASCAL dataset They proposed a novel approachfor generating relevant, fluentand human-like descriptions forimages without relyingon any object detectors,classifiers, hand-written rules orheuristics.

THEIR APPROACH

1.) Choosing Linguistics over Vision to Describe Images

i. Given an unseen image,

ii. find K images most similar to it from the training images,and using the phrases extracted from their descriptions

iii. generate a ranked list of triples which is then used tocompose description for the new image.

i. input image ii.) Neighboring images with extracted phrases

iii.) Triple section and sentence generation

FAILURE SCENARIO

A motor racer is speedingthrough a splash mud.

A water cow is grazingalong a roadside.

An orange fixture is hangingin a messy kitchen.

OUR APPROACH

OPEN CV , NLP & SVM

OPEN CV

• Open source computer vision and machine learning software library.

• More than 2500 optimized algorithms.

• C++, C, Python, Java and MATLAB interfaces

• Supports Windows, Linux, Android and Mac OS

NLP(Natural Language Processing)

It is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions

between computers and human (natural) languages.

SVM(Support Vector Machines)

A discriminative classifier formallydefined by a separatinghyperplane. In other words, givenlabeled training data (supervisedlearning), the algorithm outputs anoptimal hyperplane whichcategorizes new examples.

SYSTEM FLOW

Take query image as input

Detect objects from query image

Corpus data & extract shortest sentences

RDF (Resource Description Framework)Parser

<object1,predicate1,object2>

Google image API

retrieve top 10 images

Match query image and compute score for each retrieval image

Highest score images are our matching triplet

EXPERIMENTAL SETUP

DATA SET

PASCAL (Pattern Analysis, Statistical Modeling and Computational Learning)

It provides standardized image data sets for object class recognition

Technology

JAVA/PYTHON

Thus we saw the fundamentalsof scene description, itsapplications, previous work inthis field and our approach fordesigning this system.

5. CONCLUSION

THANK YOU!

scene description

Technology

OpenFlight Scene Description Database Specification · 1 OpenFlight® Scene Description This document describes the OpenFlight Scene Description Database Specification, commonly referred

MONEY BACK GUARANTEE...Scene Heading Indent: Left: 0.0" Right: 0.0" Width: 6.0" A scene heading is a one-line description of the location and time of day of a scene, also known as

Selection of SIFT Feature Points for Scene Description in ... · Selection of SIFT Feature Points for Scene Description in Robot Vision Yuya Utsumi, Masahiro Tsukada, Hirokazu Madokoro,

Scene by scene

PAGE ONE PRODUCTION TITLE: Galahad and the Princess SCENE ...€¦ · SCENE: Two LOCATION: A forest landscape EXT. PAGE two Shot Number: í Shot Type: W/L Description: Galahad is

IMAGE DATA PROCESSING OR GENERATION, IN … · G06T 2210/61 scene description - scene graphs, scene description languages, e.g. VRML G06T 2210/62 semi-transparency - screen-door effect,

SDL: Scene Description Language - Autodesk · iii SDL: The Alias Scene Description Language 1 Introduction 2 C Pre-Processor "#include" Statements 7 The Rendering Pipeline 9 Command

Sandy Hook Shooting: Vanghele Description Of Scene Inside School

ONVIF Video Analytics Service Specification · Description Interface including data types and data transport mechanisms. Rules describe how the scene description is interpreted and

manoa.hawaii.edumanoa.hawaii.edu/.../uploads/1973.A-Streetcar-Named-Desire.pdf · Scene Scene Scene Scene Scene Scene 11: 111: Vil: Scene Vlll: Scene IX: ... "A Streetcar Named Desire"

Attend, Infer, Repeat: Fast Scene Understanding with ... · scene, and the likelihood px (x|z) is our model of how a scene description is rendered to form an image. Both can take

Crime Scene Investigation - Central Bucks School District Tex… · Crime Scene Investigation ... and the analysis of very minute quantities of physical evidence ... Model Type Description

Text to 3D Scene Generation with Rich Lexical Groundingnlp.stanford.edu/pubs/chang-acl2015-lexground.pdfIn the text to 3D scene generation task, the input is a natural language description,

Web viewShot List. Shot Cluster 1: Location – 108 St Peters Rd, West Lynn, King’s Lynn, Norfolk. Description – this scene will be a very calm scene, there

Image Description with a Goal: Building Efﬁcient Discriminating …chenlab.ece.cornell.edu/people/Amir/publications/CVPR... · 2012-06-27 · description of a scene might be different

An Adaptive Scene Description for Activity Analysis in ...b1morris/docs/morris_icpr2008.pdf · An Adaptive Scene Description for Activity Analysis in Surveillance Video Brendan Morris

Greengate - Cooper Industries...P2 = Scene 1, Scene 2, Scene 3, Scene 4, Scene 5, All Off (6TSB) P3 V= All On, Scene 1, Scene 2, Scene 3, Scene 4, All Off (6TSB) P4 G= Scene 1, Scene

Visual Scene Description and Recall: On Differences ...s-space.snu.ac.kr/bitstream/10371/94181/1/08.최문홍(197-224).pdf · Visual Scene Description and Recall: On Differences between

Neural Scene De-renderingnsd.csail.mit.edu/papers/nsd_cvpr.pdf · scene description, which we named scene XML, to an image. By doing so, the encoder is forced to perform the inverse

Crime Scene Investigation Basic Concepts · Crime Scene Investigation ... the investigation oWhen they arrived and who was already there oA description of the crime scene (weather,