mpeg7_VisualPart

8/7/2019 mpeg7_VisualPart

1/79

I6, 2006

The MPEG-7

MultimediaContentDescription Interface

,

/

....


2/79

2

Outline

MPEG-7 motivation and scope

Visual Descriptors (color, texture, shape)

MPEG-7 retrieval evaluation criterion

Similarity measures and MPEG-7 visual descriptors

Building MPEG-7 Descriptors and Descriptors Schemes with Description DefinitionLanguage

MPEG-7 VXM current state Towards MPEG-7 Query Format Framework (Queries and visual descriptor tools

employed by the queries)

Summary


3/79

3

Proliferationof audio-visualcontent

MPEG-7 motivation and designscenarios (possible queries)

Music/audio: play a few notes and return music with similarmusic/audio

Images/graphics: draw a sketch and return images with similargraphics

Text/keywords: find AV material with subject corresponding to akeyword

Movement: describe movements and return video clips with thespecified temporal and spatial relations

Scenario: describe actions and return scenarios where similaractions take place

Standardizemultimedia metadatadescriptions (facilitate

multimedia content-based

retrieval) for various typesof audiovisual information

Consumercontent

news

sports

Scientificcontent

Digital

artgalleries

Recordedmaterial


4/79

4

- How to extract descriptions(feature extraction, indexing

process,annotation & authoring tools,...)

Scope of the Standard

DescriptionProduction

(extraction)

DescriptionConsumption

StandardDescription

Normative part ofMPEG-7 standard

- How to use descriptions (search engine, filteringtool, retrieval process, browsing device, ...)

- The similarity between contents->The goal is to define the minimum that enables interoperability.

* MPEG-7 does not specify (non normative parts of MPEG-7):


5/79

5

Information flow


6/79

6

Color Descriptors

Dominant ColorScalable ColorColor LayoutColor StructureGoF/GoP Color

Texture DescriptorsHomogeneous TextureTexture BrowsingEdge Histogram

Shape DescriptorsRegion ShapeContour Shape3D Shape

Visual Descriptors LocalizationRegion LocatorSpatio-TemporalLocator

OtherFace Recognition

Motion Descriptors

for VideoCamera MotionMotion TrajectoryParametric MotionMotion Activity

(Normative, basic, forlocalization)


7/79

7

Color Descriptors

Constrained color spaces:->Scalable Color Descriptor uses HSV->Color Structure Descriptor usesHMMD

Color Descriptors

Dominant Color Scalable Color

- HSV space

Color Structure

-HMMD space

Color Layout

-YCbCr space

GroupOfFrames/Pictures

Color Space:- R, G, B

- Y, Cr, Cb- H, S, V- Monochrome- Linear transformation of R, G, B- HMMD


8/79

8

Scalable Color Descriptor (CSD)

A color histogram in HSV color space Encoded by Haar TransformFeature vector: {NoCoef, NoBD, Coeff[..],

CoeffSign[..]}


9/79

9

SCD extraction

to4bits/bin

to11bits/bin Nbits/bin

(#bin


10/79

10

GoF/GoP Color Descriptor

Histograms Aggregation methods: Average..but sensitivity to outliers (lighting changes,

occlusion, text overlays)

Median..increased comp. complexity for sorting Intersection..differs: a least common color trait viewpoint

Extends Scalable Color Descriptor for a video segmentor a group of pictures (joint color hist. is then possessedas CSD- Haar transform encoding)

Extraction


11/79

11

GoF/GoP Color Descriptor

Applications: Browsing a large collection of images to

find similar images

- > Use HistogramIntersection as a colorsimilarity measure for clustering acollection of images

->Represent each cluster by GoP descriptor


12/79

12

Dominant Color Descriptor (DCD)

Clustering colors into a small number ofrepresentative colors (salient colors)

F = { {ci, pi, vi}, s} ci : Representative colors pi : Their percentages in the region

vi : Color variances

s : Spatial coherency


13/79

13

DCD Extraction (based on Lloyd gen.algorithm)

ci centroid of cluster ;

x(n) color vector at pixel;

v(n) perceptual weight for pixel .

+spatialcoherency:

Average number ofconnecting pixels of a

dominant color using3x3 masking window

H.V.P more sensitive to smooth regions


14/79

14

http://debut.cis.nctu.edu.tw/Demo/Conte
http://debut.cis.nctu.edu.tw/Demo/ContentBasedVideoRetrieval/CBVR/Dominant/index.htmlhttp://debut.cis.nctu.edu.tw/Demo/ContentBasedVideoRetrieval/CBVR/Dominant/index.html


15/79

15

Color Layout Descriptor (CLD)

Clustering the image into 64 (8x8)blocks

Deriving the average color of

each block (or using DCD) Applying (8x8)DCT and encoding

Efficient for Sketch-based image retrieval Content Filtering using image indexing

.

.

...

. .

.


16/79

16

If the time domain data is smooth (with little variation indata) then frequency domain data will make low frequencydata larger and high frequency data smaller.

-> derived average colors are transformed into a series

of coefficients by performing DCT(data in time domain - >data in frequency domain).

-> A few low-frequency coefficients are selected using

zigzag scanning and quantized to form a CLD (largequantization step in quantizing AC coef / small quantization step inquantizing DC).->The color space adopted for CLD is YCrCb.

CLD extraction

F ={CoefPattern, YDCCoef,CbDCCoef,CrDCCoef,YACCoef, CbACCoef, CrACCoef}


17/79

17

Color Structure Descriptor (CSD)

Scanning the image by an 8x8struct. element Counting the number of blocks

containing each color Generating a color histogram

(HMMD/4CSQ operating

points)

8 x 8 structuringelement

COLOR BIN

C0

C1 +1

C2

C3 +1

C4

C5

C6

C7 +1


18/79

18

CSD extraction

If

Then sub samplingfactor p is given by:

F = {colQuant, Values[m]}


19/79

19

CSD scaling


20/79

20

Texture Descriptors

Homogenous Texture Descriptor

Non-Homogenous Texture Descriptor (EdgeHistogram)

Texture Browsing


21/79

21

Homogenous Texture Descriptor (HTD)

Partitioning the frequency domain into 30channels (modeled by a 2D-Gabor function)

Computing the energy and energy deviation foreach channel

Computing mean and standard variation offrequency coefficients- > F = {f

DC, f

SD, e

1,, e

30, d

1,, d

30}

An efficient implementation: Radon transform followed by Fouriertransform


22/79

22

HTD Extraction How to get 2-Dfrequency layoutfollowing the HVS

2-D image f(x,y)

1D P (R, )

Radontransform

1D F(P (R, ))

Resultedsamplinggrid in

polarcoords


23/79

23

- > 2D-Gabor Function deployed to defineGabor filter banks

It is a Gaussian

weighted sinusoid It is used to model

individual channels

Each channelfilters a specifictype of texture

HTD Extraction - Data sampling infeature channel


24/79

24

Radon Transform Transforms images with lines into a domain of

possible line parameters Each line will be transformed to a peak point inthe resulted image


25/79

25

HTD properties

One can perform Rotation invariance

matching Intensity invariance

matching (fCD

removed fromthe feature vector) Scale-Invariant matching

F = {fDC, f

SD, e

1,, e

30, d

1,

, d30}


26/79

26

Texture Browsing Descriptor

-> Same sp. filtering procedure as the HTD..

Scale andorientation

selective band-passfilters

regularity(periodic to random)

Coarseness(grain to coarse)

Directionality (/300)

->the texture browsing descriptor can be used to find aset of candidates with similar perceptual properties and thenuse the HTD to get a precise similarity match list among the

candidate images.

e.g look for textures that are very regular andoriented at 300


27/79

27

Edge Histogram Descriptor(EHD)

Represents the spatial distribution offive types of edges vertical, horizontal, 45, 135, and non-

directional

Dividing the image into 16 (4x4) blocks Generating a 5-bin histogram for each

block It is scale invariant

Retain strong edgesby thresholding

canny edge operator

F = {BinCounts[k]}

,k=80


28/79

28

EHD extraction

Basic (80 bins) Extended (150 bins)

+13 clusters for semi-global

basic Semi-global

global

Egde map

image usingCannyedgeoperator

.


29/79

29

ETD valuation

Cannot be used for object-based imageretrieval

Thedge if set to 0 ETD applies for binary

edge images (sketch-based retrieval) Extended HTD achieves better results

but does not exhibits rotation invariantproperty


30/79

30

Shape Descriptors

Region-based Descriptor

Contour-based Shape Descriptor 2D/3D Shape Descriptor

3D Shape Descriptor


31/79

31

Region-based Descriptor (RBD)

( ) ( ) ( ) ( ) ==

2

0

1

0,,,,,, ddfVfVF nmnmnm

( ) ( ) jmAm exp21

=

( )( )

==

0cos2

01

nn

nRn

m = 0, ..12

n = 0, ..3

F ={MagnitudeOfART[k]} ,k=nxm


32/79

32

Region-based Descriptor (2)

Applicable to figures (a) (e) Distinguishes (i) from (g) and (h)

(j), (k), and (l) are similar

Advantages:Describes complex shapes withdisconnected regions Robust to segmentation noise Small size Fast extraction and matching


33/79

33

Contour-Based Descriptor (CBD)

It is based on Curvature Scale-Spacerepresentation


34/79

34

Curvature Scale-Space

Finds curvature zerocrossing points of theshapes contour (key points)

Reduces the number of keypoints step by step, byapplying Gaussian smoothing

The position of key pointsare expressed relative tothe length of the contourcurve


35/79

35

CBD Extraction

Location xCSS of curvature zero-crossing points

Filtering pass ycss

Repetitive smoothing of X and Y contour coordinates by the low-

pass kernel (0.25, 0,5, 0,25) until the contour becomes convex

F = {NofPeaks, GlobalCurv[ecc][circ], PrototypeCurv[ecc][circ],HighestPeakY, peakX[k], peakY[k]}


36/79

36

CBD Applicability

Applicable to (a)

Distinguishesdifferences in (b)

Find similarities in(c) - (e)

Advantages:

Captures the shape verywell Robust to the noise,scale, and orientation

It is fast and compact


37/79

37

Comparison (RB/CB descriptors)

Blue: Similar shapes by Region-Based

Yellow: Similar shapes by Contour-Based


38/79

38

How MPEG-7 compare descriptors?

ANMRR (average modified retrieval rank):

-normalized measures thattake into account different

sizes of ground truth setsand the actual ranksobtained from the retrievalwere defined ->retrievals that miss itemsare assigned a penalty.

Traditionalmetric


39/79

39

Similarity between features

Typically descriptors: multidimensional vectors (of lowlevel features)

Similarity of two images in the vector feature space:

the range query:all the points within a hyperrectanglealigned with the coordinate axes the nearest-neighbouror within-distance(cut)query:a particular metric in the feature space dissimilarity between statistical distributions: thesame metrics or specific measures


40/79

40

http://nayana.ece.ucsb.edu/M7TextureDe

An example of CBIR system using HTDperforming range query and NN query
http://nayana.ece.ucsb.edu/M7TextureDemo/Demo/client/M7TextureDemo.htmlhttp://nayana.ece.ucsb.edu/M7TextureDemo/Demo/client/M7TextureDemo.html


41/79

41

Criticism on MPEG-7 distancemeasures MPEG-7 adopts feature vector space distances based on

geometric assumptions of descriptor space, e.g

..but these quantitative measures (low-level information) do not fitideally with human similarity perception->researchers from other areas have developed alternative

predicate-based models (descriptors are assumed to contain justbinary elements in opposition to continuous data) which expressthe existence of properties and express high level information

See Pattern difference :

2K

bc K:NofPredicates in thedata vectors Xi, Xj

b: property exists in Xic: property exists in Xj


42/79

42

Vector Space Distances


43/79

43

Distances/Similarity measures


44/79

H th t k

M 7 t


45/79

45

How that worksDescription Definition Language:

->XML Schema (flexibility)- XMLS struct.lang.components- XMLS datatype lang.components

- mpeg-7 spesific extentions+

- >Binary version (efficiency)

Mpeg7 supportfor vectors,matrices and

typedreferences

Text formatBiM formatmix

(XML)

A DDL example (instantiation)


46/79

46

A DDL example (instantiation)

schema

CNN 6 oclock News

David James

1999

CNN

This permits VideoDoc elements, as well as types derived from VideoDoc

to be used as a child of VideoCatalogue, e.g., NewsDoc

instance

Descriptions enabled by the MPEG 7


47/79

47

Descriptions enabled by the MPEG-7tools

PerceptualDescriptions:

- contents spatio-

temporal structure- info on low-levelfeatures- semantic info related

to the reality capturedby the content

Archival-orientedDescriptions:

-contentscreation/production

- info on using the content

- info on storing andrepresenting thecontent

Additional info fororganizing, managing andaccessing the content:

- How objs are related andgathered in collections

-summaries/variations/transcoding to support efficientbrowsing

- User interaction info

Organization/Naviga-tion/Access/ User

Interaction Tools

Content description

Tools

Content managementTools

T hi h f t l l


48/79

48

Type hierarchy for top levelselements


49/79

49

...


50/79

50

What DS tochoose..?

MPEG-7 provides DSs fordescription of thestructureand semanticsof AV content + content

management

Cont.Manag.Info can beattached toindividualSegments

Vi i t f th t t S t


51/79

51

Viewpoint of the structure: Segments

d i i


52/79

52

Structure description

Video Segment

Segment decomposition

Time Color

Motion Texture Shape Annotation

Time Mosaic Annotation

Moving

region

Relation Linkabove

Video Segments

Movingregions

Segment decomposition

Segments decomposition


53/79

53

Segment Decomposition

timeconnectivity

Content structural aspects


54/79

54

Content structural aspects(Segment DS tree) Annotatethe whole

image withStillRegionpatial segmentationat different levels

Among different regions we could use

SegmentRelationship description tools

Content structural asp

ects


55/79

55

Content structural aspects

Temporal segments

(Segment Relationship DS graph)

Viewpoint of conceptual notions


56/79

56

Viewpoint of conceptual notions


57/79

57

Content Semanticaspects(SemanticGraph)

Example of Structure Semantic Link DS


58/79

58

Example of Structure-Semantic Link DS

Content abstraction aspects (CoAbstr)-


59/79

59

Content abstraction aspects (CoAbstr)-Hierarchical summary of a video

f0f0

f0

f00

f01f02

- > enables rapid browsing, navigation(also sequential summary)

(CoAbstr)-Partitions and decompositions


60/79

60

(CoAbstr) Partitions and decompositions(ViewDecomposition DS)

Frequency-space graph


61/79

61

(CoAbstr) Content Variation

Universal Multimedia Access: Adapt delivery tonetwork and terminal characteristics

C Abst A c ll cti n (C ll i


62/79

62

CoAbstr A collection (CollectionStructureDS)

- >groups segments, events, or objects

into collection clusters and specifiesproperties that are common to theelements:The CollectionStructure DS describesalso statistics and models of theattribute values of the elements, such as

a mean color histogram for a collectionof images.The CollectionStructure DS alsodescribes relationships among collectionclusters.

R f S ft th XM


63/79

63

Reference Software: the XM

XM implements MPEG-7 Descriptors (Ds) MPEG-7 Description Schemes (DSs) Coding Schemes DDL

extraction


64/79

64

Beyond mpeg-7 version 1 (D&DS in VXM)

ColorTemperature: This descriptor specifies the perceptual temperaturefeeling of illumination color in an image for browsing and display preference control

purposes (user friendly). Four perceptual temperature browsing categories areprovided; hot, warm, moderate, and cool. Each category is used for browsing imagesbased upon its perceptual meaning. uses dominant color descriptor

Illumination Invariant Color: wraps the color descriptors. One or more colordescriptors processed by the illumination invariant method can be included in this

descriptor.

Shape Variation: can describe shape variations in terms of Shape Variation Map andthe statistics of the region shape description of each binary shape image in thecollection. Shape Variation Map consists of StaticShapeVariation andDynamicShapeVariation. The former corresponds to 35 quantized ART coefficients

on a 2-dimensional histogram of group of shape images and the latter to the inverseof the histogram except the background.

Media-centric description schemes: Three visual description schemes are designedto describe several types of visual contents. The StillRegionFeatureType containsseveral elementary descriptors to describe the characteristics of arbitrary shaped

still regions.

Vi l CE t h


65/79

65

Visual CE current phase

CE explore new technologies on identifying original imagesand their modified versions (N-1 modified versions),focused on the accuracy and robustness of identification

- > robustness is measured as the accuracy (HitRatio = k/(N))separately calculated with each level of modification

Modifications:

Brightness Size reduction Color to Monochrome

JPEG compr. with varying quality factors

Colorreduction Crop Histogram Equalization

Blur Geometric Transformation

T d MPEG 7 Q F t


66/79

66

Towards MPEG-7 Query Format

- >Though, the interface to support queries in anMPEG-7 database is not yet supported,requirements have been drafted

Output Query Format

Client

Application

MPEG-7

Database

Input Query Format

Query Management Tools

e.g-query by textualdescription

-Combinations ofquery conditions-spesification ofthe structure ofthe result set

e.g.structure ofthe response

containingthe resulting

set

e.g-spesification ofthe exceptions

-relevantfeedback

Basic search functionalities may


67/79

67

Basic search functionalities mayinclude:

Query by Description (the clientapplication provides possible querycriteria)


68/79

68

Query by examplea) b)a) b)

Query

Segment-based search(selecting subparts orROI to refine the searchcriteria)

=>


69/79

69

Compositional search :from aglobalization page the user may select

a number of interesting (or relevant)images to refine the search criteria

+ =>

Current state of MPEG 7 VXM in CBIR


70/79

70

Current state of MPEG-7 VXM in CBIR

Query by modified sketch

[segmentation/simplification by assigning arepres. color in each segment/ modification]

Query within ROI Situation-based clustering (Simple clustering/

Clustering on Visual semantic Hints) Category-based clustering (local-concept lexicon:multiple low-level features of locasl regions usedin learning and detecting local concepts)

Query within ROI


71/79

71

Query within ROI uses EHD and CLD for describing local

properties

- >example: photos search by matching the backgroundregions only

4x4 EHD

CLD on 8x8 DCT Plane

8x8 IDCT

8x8 Spatial DomainAverage color forEach block

CombinedFeatureFor Each4x4 Block

4x4 EHD

CLD on 8x8 DCT Plane

8x8 IDCT

8x8 Spatial DomainAverage color forEach block

CombinedFeatureFor Each4x4 Block

Situation based clustering based on visual


72/79

72

Situation based clustering based on visualsemantic hints (visual sensation-vs)

Colorfulness (CoF) hint: degree of v.s.according to the purity of colors

- >Utilizes ScalableColorDescriptor

{ } { }256,128,64,32,16where,,,,,,, 321 = SCDNjSCD Nfffff SCDF



73/79

73

S tuat on based cluster ng based on v sualsemantic hints (visual sensation-vs) (2)

Color Coherence (CoC) Hint: degree of v.s.according to spatial coherency of colors

- > utilizes DominantColorDescriptor

( ){ }DCDjjjDCD

Njsup ,,3,2,1where,,,, == cF



74/79

74

semantic hints (visual sensation-vs) (3)

Level of Detail (LoD) Hint: degree of a v.sfor objects appearing more or lessdetailed

- > defines a relative compression ratio per

pixel based on the JPEG compression thephoto has gone through

=



75/79

75

gsemantic hints (visual sensation-vs) (4)

Homogeneous Texture (HoT) Hint: degree ofa v.s according homogeneous texture on photo

-> expresses texture regularity usingTextureBrowsingDescriptor

Heterogeneous Texture (HeT) Hint:

degree of a v.s. on how continoous orstrong the boundaries are on photo-> utilizes EdgeHistogramDescriptor

Category-based clustering


76/79

76

Category based clusteringlocal-concept lexicon:multiple low-level features

of local regions used in learning and detectinglocal concepts, once the local concepts have beenbuilt , confidence values for each sub-region aremeasured for all local concepts

MPEG l ti ti iti


77/79

77

MPEG relative activities

Functionalities described before are especially useful for thedeveloper of MPEG-A Photo Player:

offers a standardized solution for the carriage of images

and associated metadata, to facilitate simple and fullyinteroperable exchange across different devices and platforms.

- >The set of metadata includes MPEG-7visual content descriptions, as well asacquisition-based metadata (such as date,time and camera settings). This allowscompliant devices to support new, content-enhanced functionality, such as intelligentbrowsing, content-based search or

automatic categorization


78/79

78

Summary

MPEG-7 Standard- MPEG-7 visual and content structure description tools (Ds &DSs using DDL)

MPEG-7 requirements on Queries Format

MPEG-7 VXM current phase (descriptors and CBIR)

Multimedia segmentation, understanding, andsearching, among others, are still a challenge


79/79

The end.

Most of the pictures or their basic ideas are takenfrom the listed papers and web pages.

Documents

mpeg7_VisualPart