Video retrieval using inference network A.Graves, M. Lalmas In Sig IR 02

Video retrieval using inference network

A.Graves, M. Lalmas

In Sig IR 02

Outline

Background Mpeg 7 Inference network model Experiment Conclusion

Background

CBVR: find the video upon demand The semantic gap: high level concepts and low level

features Traditional method: retrieve by example Relevance feedback: tedious for video retrieval Retrieval by semantics?

Mpeg 7

Multimedia Content description interface Attach metadata to multimedia content

Semantics: event, actor, place……Structure: shot, scene, group……

Video as a structured document The information contained in the Mpeg 7 annotation

can be exploited to perform semantic based video retrieval

Mpeg 7 does not provide the solution to extract the annotation

Mpeg 7

Description definition language Descriptors Description schemes

Mpeg 7

Structured document and description <TextAnnotation>

<FreeTextAnnotation> Basil attempts to mend the car without success

</FreeTextAnnotation> <StructuredAnnotation>

<Who>Basil</Who> <WhatObject>Car</WhatObject> <WhatAction>Mend</WhatAction> <Where>Carpark</Where>

</StructuredAnnotation> </TextAnnotation> <Video structures>……

<Video shot 13>… …</>

Inference network

Perform a ranking given many sources of evidence Document network (DN)

Constructed from the document data, represents all the retrievable units

Query network (QN)Constructed from the query, represents the information needs

Inference network

The document networkDocument layer (retrievable units collection)Contextual layer (represents the contextual information about

the document-concept links)Concept layer (represent all the concepts in the network)

Inference network

Document network

Inference network

Link weight calculationStructural: duration ratio (Between document nodes)Contextual (Between contextual nodes) sibling number,

context size, frequencyContext – concept (tf, idf)

Inference network

Query networkA framework of nodes that represents the information needConcept nodes Constraint operators (and, or, sum, not…), context-conceptual

constraints

Inference network

Inference network

Attachment and EvaluationAttachment: match the DN and QN, get a set of links between

their nodesAll the constraint satisfiedLink inheritance: in the DN, document node can share the

context nodes of its parent node Evaluation: calculated quantized similarity for each

document node in the DN

Inference network

AttachmentLink the QN and DN such that:

The concept nodes that contains same concepts (need a synonym dictionary) with weight 1 (firm)

For constraint queries, adjust the weight

Inference network

Attachment

Inference network

EvaluationThe evaluation process can be done toward all

document nodes, it is calculated according to the query network

Can be used in different applications:Retrieve scene, shot from one videoRetrieve video from video collections…

Inference network

EvaluationBack-propagate the QN-DN link weight back to each

document nodesAll the node will have a valueCan derive a rank at different granularity level (Video, scene,

shot)

Experiment

3 manually annotated video are employed as test data, totally 329 shots

Annotations:<abstract> <Structure Annotation> <Freetext Annotation> for video shots and

scenes Avg. precision: 69.25 Similarity order are as expected

Experiment

The performance are quite dependent on the quality of the annotation data

Efficient annotation methods will be quite helpful

Conclusion

Propose a semantic video retrieval model based on Inference Network model, fully exploits the structural, conceptual, and contextual aspects of Mpeg 7.

Parry the semantic gap problem

Documents

Video retrieval using inference network A.Graves, M. Lalmas In Sig IR 02