Upload
vishal-pandey
View
214
Download
0
Embed Size (px)
Citation preview
8/7/2019 mpeg7_VisualPart
1/79
I6, 2006
The MPEG-7
MultimediaContentDescription Interface
,
/
....
8/7/2019 mpeg7_VisualPart
2/79
2
Outline
MPEG-7 motivation and scope
Visual Descriptors (color, texture, shape)
MPEG-7 retrieval evaluation criterion
Similarity measures and MPEG-7 visual descriptors
Building MPEG-7 Descriptors and Descriptors Schemes with Description DefinitionLanguage
MPEG-7 VXM current state Towards MPEG-7 Query Format Framework (Queries and visual descriptor tools
employed by the queries)
Summary
8/7/2019 mpeg7_VisualPart
3/79
3
Proliferationof audio-visualcontent
MPEG-7 motivation and designscenarios (possible queries)
Music/audio: play a few notes and return music with similarmusic/audio
Images/graphics: draw a sketch and return images with similargraphics
Text/keywords: find AV material with subject corresponding to akeyword
Movement: describe movements and return video clips with thespecified temporal and spatial relations
Scenario: describe actions and return scenarios where similaractions take place
Standardizemultimedia metadatadescriptions (facilitate
multimedia content-based
retrieval) for various typesof audiovisual information
Consumercontent
news
sports
Scientificcontent
Digital
artgalleries
Recordedmaterial
8/7/2019 mpeg7_VisualPart
4/79
4
- How to extract descriptions(feature extraction, indexing
process,annotation & authoring tools,...)
Scope of the Standard
DescriptionProduction
(extraction)
DescriptionConsumption
StandardDescription
Normative part ofMPEG-7 standard
- How to use descriptions (search engine, filteringtool, retrieval process, browsing device, ...)
- The similarity between contents->The goal is to define the minimum that enables interoperability.
* MPEG-7 does not specify (non normative parts of MPEG-7):
8/7/2019 mpeg7_VisualPart
5/79
5
Information flow
8/7/2019 mpeg7_VisualPart
6/79
6
Color Descriptors
Dominant ColorScalable ColorColor LayoutColor StructureGoF/GoP Color
Texture DescriptorsHomogeneous TextureTexture BrowsingEdge Histogram
Shape DescriptorsRegion ShapeContour Shape3D Shape
Visual Descriptors LocalizationRegion LocatorSpatio-TemporalLocator
OtherFace Recognition
Motion Descriptors
for VideoCamera MotionMotion TrajectoryParametric MotionMotion Activity
(Normative, basic, forlocalization)
8/7/2019 mpeg7_VisualPart
7/79
7
Color Descriptors
Constrained color spaces:->Scalable Color Descriptor uses HSV->Color Structure Descriptor usesHMMD
Color Descriptors
Dominant Color Scalable Color
- HSV space
Color Structure
-HMMD space
Color Layout
-YCbCr space
GroupOfFrames/Pictures
Color Space:- R, G, B
- Y, Cr, Cb- H, S, V- Monochrome- Linear transformation of R, G, B- HMMD
8/7/2019 mpeg7_VisualPart
8/79
8
Scalable Color Descriptor (CSD)
A color histogram in HSV color space Encoded by Haar TransformFeature vector: {NoCoef, NoBD, Coeff[..],
CoeffSign[..]}
8/7/2019 mpeg7_VisualPart
9/79
9
SCD extraction
to4bits/bin
to11bits/bin Nbits/bin
(#bin
8/7/2019 mpeg7_VisualPart
10/79
10
GoF/GoP Color Descriptor
Histograms Aggregation methods: Average..but sensitivity to outliers (lighting changes,
occlusion, text overlays)
Median..increased comp. complexity for sorting Intersection..differs: a least common color trait viewpoint
Extends Scalable Color Descriptor for a video segmentor a group of pictures (joint color hist. is then possessedas CSD- Haar transform encoding)
Extraction
8/7/2019 mpeg7_VisualPart
11/79
11
GoF/GoP Color Descriptor
Applications: Browsing a large collection of images to
find similar images
- > Use HistogramIntersection as a colorsimilarity measure for clustering acollection of images
->Represent each cluster by GoP descriptor
8/7/2019 mpeg7_VisualPart
12/79
12
Dominant Color Descriptor (DCD)
Clustering colors into a small number ofrepresentative colors (salient colors)
F = { {ci, pi, vi}, s} ci : Representative colors pi : Their percentages in the region
vi : Color variances
s : Spatial coherency
8/7/2019 mpeg7_VisualPart
13/79
13
DCD Extraction (based on Lloyd gen.algorithm)
ci centroid of cluster ;
x(n) color vector at pixel;
v(n) perceptual weight for pixel .
+spatialcoherency:
Average number ofconnecting pixels of a
dominant color using3x3 masking window
H.V.P more sensitive to smooth regions
8/7/2019 mpeg7_VisualPart
14/79
14
http://debut.cis.nctu.edu.tw/Demo/Conte
http://debut.cis.nctu.edu.tw/Demo/ContentBasedVideoRetrieval/CBVR/Dominant/index.htmlhttp://debut.cis.nctu.edu.tw/Demo/ContentBasedVideoRetrieval/CBVR/Dominant/index.html8/7/2019 mpeg7_VisualPart
15/79
15
Color Layout Descriptor (CLD)
Clustering the image into 64 (8x8)blocks
Deriving the average color of
each block (or using DCD) Applying (8x8)DCT and encoding
Efficient for Sketch-based image retrieval Content Filtering using image indexing
.
.
...
. .
.
8/7/2019 mpeg7_VisualPart
16/79
16
If the time domain data is smooth (with little variation indata) then frequency domain data will make low frequencydata larger and high frequency data smaller.
-> derived average colors are transformed into a series
of coefficients by performing DCT(data in time domain - >data in frequency domain).
-> A few low-frequency coefficients are selected using
zigzag scanning and quantized to form a CLD (largequantization step in quantizing AC coef / small quantization step inquantizing DC).->The color space adopted for CLD is YCrCb.
CLD extraction
F ={CoefPattern, YDCCoef,CbDCCoef,CrDCCoef,YACCoef, CbACCoef, CrACCoef}
8/7/2019 mpeg7_VisualPart
17/79
17
Color Structure Descriptor (CSD)
Scanning the image by an 8x8struct. element Counting the number of blocks
containing each color Generating a color histogram
(HMMD/4CSQ operating
points)
8 x 8 structuringelement
COLOR BIN
C0
C1 +1
C2
C3 +1
C4
C5
C6
C7 +1
8/7/2019 mpeg7_VisualPart
18/79
18
CSD extraction
If
Then sub samplingfactor p is given by:
F = {colQuant, Values[m]}
8/7/2019 mpeg7_VisualPart
19/79
19
CSD scaling
8/7/2019 mpeg7_VisualPart
20/79
20
Texture Descriptors
Homogenous Texture Descriptor
Non-Homogenous Texture Descriptor (EdgeHistogram)
Texture Browsing
8/7/2019 mpeg7_VisualPart
21/79
21
Homogenous Texture Descriptor (HTD)
Partitioning the frequency domain into 30channels (modeled by a 2D-Gabor function)
Computing the energy and energy deviation foreach channel
Computing mean and standard variation offrequency coefficients- > F = {f
DC, f
SD, e
1,, e
30, d
1,, d
30}
An efficient implementation: Radon transform followed by Fouriertransform
8/7/2019 mpeg7_VisualPart
22/79
22
HTD Extraction How to get 2-Dfrequency layoutfollowing the HVS
2-D image f(x,y)
1D P (R, )
Radontransform
1D F(P (R, ))
Resultedsamplinggrid in
polarcoords
8/7/2019 mpeg7_VisualPart
23/79
23
- > 2D-Gabor Function deployed to defineGabor filter banks
It is a Gaussian
weighted sinusoid It is used to model
individual channels
Each channelfilters a specifictype of texture
HTD Extraction - Data sampling infeature channel
8/7/2019 mpeg7_VisualPart
24/79
24
Radon Transform Transforms images with lines into a domain of
possible line parameters Each line will be transformed to a peak point inthe resulted image
8/7/2019 mpeg7_VisualPart
25/79
25
HTD properties
One can perform Rotation invariance
matching Intensity invariance
matching (fCD
removed fromthe feature vector) Scale-Invariant matching
F = {fDC, f
SD, e
1,, e
30, d
1,
, d30}
8/7/2019 mpeg7_VisualPart
26/79
26
Texture Browsing Descriptor
-> Same sp. filtering procedure as the HTD..
Scale andorientation
selective band-passfilters
regularity(periodic to random)
Coarseness(grain to coarse)
Directionality (/300)
->the texture browsing descriptor can be used to find aset of candidates with similar perceptual properties and thenuse the HTD to get a precise similarity match list among the
candidate images.
e.g look for textures that are very regular andoriented at 300
8/7/2019 mpeg7_VisualPart
27/79
27
Edge Histogram Descriptor(EHD)
Represents the spatial distribution offive types of edges vertical, horizontal, 45, 135, and non-
directional
Dividing the image into 16 (4x4) blocks Generating a 5-bin histogram for each
block It is scale invariant
Retain strong edgesby thresholding
canny edge operator
F = {BinCounts[k]}
,k=80
8/7/2019 mpeg7_VisualPart
28/79
28
EHD extraction
Basic (80 bins) Extended (150 bins)
+13 clusters for semi-global
basic Semi-global
global
Egde map
image usingCannyedgeoperator
.
8/7/2019 mpeg7_VisualPart
29/79
29
ETD valuation
Cannot be used for object-based imageretrieval
Thedge if set to 0 ETD applies for binary
edge images (sketch-based retrieval) Extended HTD achieves better results
but does not exhibits rotation invariantproperty
8/7/2019 mpeg7_VisualPart
30/79
30
Shape Descriptors
Region-based Descriptor
Contour-based Shape Descriptor 2D/3D Shape Descriptor
3D Shape Descriptor
8/7/2019 mpeg7_VisualPart
31/79
31
Region-based Descriptor (RBD)
( ) ( ) ( ) ( ) ==
2
0
1
0,,,,,, ddfVfVF nmnmnm
( ) ( ) jmAm exp21
=
( )( )
==
0cos2
01
nn
nRn
m = 0, ..12
n = 0, ..3
F ={MagnitudeOfART[k]} ,k=nxm
8/7/2019 mpeg7_VisualPart
32/79
32
Region-based Descriptor (2)
Applicable to figures (a) (e) Distinguishes (i) from (g) and (h)
(j), (k), and (l) are similar
Advantages:Describes complex shapes withdisconnected regions Robust to segmentation noise Small size Fast extraction and matching
8/7/2019 mpeg7_VisualPart
33/79
33
Contour-Based Descriptor (CBD)
It is based on Curvature Scale-Spacerepresentation
8/7/2019 mpeg7_VisualPart
34/79
34
Curvature Scale-Space
Finds curvature zerocrossing points of theshapes contour (key points)
Reduces the number of keypoints step by step, byapplying Gaussian smoothing
The position of key pointsare expressed relative tothe length of the contourcurve
8/7/2019 mpeg7_VisualPart
35/79
35
CBD Extraction
Location xCSS of curvature zero-crossing points
Filtering pass ycss
Repetitive smoothing of X and Y contour coordinates by the low-
pass kernel (0.25, 0,5, 0,25) until the contour becomes convex
F = {NofPeaks, GlobalCurv[ecc][circ], PrototypeCurv[ecc][circ],HighestPeakY, peakX[k], peakY[k]}
8/7/2019 mpeg7_VisualPart
36/79
36
CBD Applicability
Applicable to (a)
Distinguishesdifferences in (b)
Find similarities in(c) - (e)
Advantages:
Captures the shape verywell Robust to the noise,scale, and orientation
It is fast and compact
8/7/2019 mpeg7_VisualPart
37/79
37
Comparison (RB/CB descriptors)
Blue: Similar shapes by Region-Based
Yellow: Similar shapes by Contour-Based
8/7/2019 mpeg7_VisualPart
38/79
38
How MPEG-7 compare descriptors?
ANMRR (average modified retrieval rank):
-normalized measures thattake into account different
sizes of ground truth setsand the actual ranksobtained from the retrievalwere defined ->retrievals that miss itemsare assigned a penalty.
Traditionalmetric
8/7/2019 mpeg7_VisualPart
39/79
39
Similarity between features
Typically descriptors: multidimensional vectors (of lowlevel features)
Similarity of two images in the vector feature space:
the range query:all the points within a hyperrectanglealigned with the coordinate axes the nearest-neighbouror within-distance(cut)query:a particular metric in the feature space dissimilarity between statistical distributions: thesame metrics or specific measures
8/7/2019 mpeg7_VisualPart
40/79
40
http://nayana.ece.ucsb.edu/M7TextureDe
An example of CBIR system using HTDperforming range query and NN query
http://nayana.ece.ucsb.edu/M7TextureDemo/Demo/client/M7TextureDemo.htmlhttp://nayana.ece.ucsb.edu/M7TextureDemo/Demo/client/M7TextureDemo.html8/7/2019 mpeg7_VisualPart
41/79
41
Criticism on MPEG-7 distancemeasures MPEG-7 adopts feature vector space distances based on
geometric assumptions of descriptor space, e.g
..but these quantitative measures (low-level information) do not fitideally with human similarity perception->researchers from other areas have developed alternative
predicate-based models (descriptors are assumed to contain justbinary elements in opposition to continuous data) which expressthe existence of properties and express high level information
See Pattern difference :
2K
bc K:NofPredicates in thedata vectors Xi, Xj
b: property exists in Xic: property exists in Xj
8/7/2019 mpeg7_VisualPart
42/79
42
Vector Space Distances
8/7/2019 mpeg7_VisualPart
43/79
43
Distances/Similarity measures
8/7/2019 mpeg7_VisualPart
44/79
H th t k
M 7 t
8/7/2019 mpeg7_VisualPart
45/79
45
How that worksDescription Definition Language:
->XML Schema (flexibility)- XMLS struct.lang.components- XMLS datatype lang.components
- mpeg-7 spesific extentions+
- >Binary version (efficiency)
Mpeg7 supportfor vectors,matrices and
typedreferences
Text formatBiM formatmix
(XML)
A DDL example (instantiation)
8/7/2019 mpeg7_VisualPart
46/79
46
A DDL example (instantiation)
schema
CNN 6 oclock News
David James
1999
CNN
This permits VideoDoc elements, as well as types derived from VideoDoc
to be used as a child of VideoCatalogue, e.g., NewsDoc
instance
Descriptions enabled by the MPEG 7
8/7/2019 mpeg7_VisualPart
47/79
47
Descriptions enabled by the MPEG-7tools
PerceptualDescriptions:
- contents spatio-
temporal structure- info on low-levelfeatures- semantic info related
to the reality capturedby the content
Archival-orientedDescriptions:
-contentscreation/production
- info on using the content
- info on storing andrepresenting thecontent
Additional info fororganizing, managing andaccessing the content:
- How objs are related andgathered in collections
-summaries/variations/transcoding to support efficientbrowsing
- User interaction info
Organization/Naviga-tion/Access/ User
Interaction Tools
Content description
Tools
Content managementTools
T hi h f t l l
8/7/2019 mpeg7_VisualPart
48/79
48
Type hierarchy for top levelselements
8/7/2019 mpeg7_VisualPart
49/79
49
...
8/7/2019 mpeg7_VisualPart
50/79
50
What DS tochoose..?
MPEG-7 provides DSs fordescription of thestructureand semanticsof AV content + content
management
Cont.Manag.Info can beattached toindividualSegments
Vi i t f th t t S t
8/7/2019 mpeg7_VisualPart
51/79
51
Viewpoint of the structure: Segments
d i i
8/7/2019 mpeg7_VisualPart
52/79
52
Structure description
Video Segment
Segment decomposition
Time Color
Motion Texture Shape Annotation
Time Mosaic Annotation
Moving
region
Relation Linkabove
Video Segments
Movingregions
Segment decomposition
Segments decomposition
8/7/2019 mpeg7_VisualPart
53/79
53
Segment Decomposition
timeconnectivity
Content structural aspects
8/7/2019 mpeg7_VisualPart
54/79
54
Content structural aspects(Segment DS tree) Annotatethe whole
image withStillRegionpatial segmentationat different levels
Among different regions we could use
SegmentRelationship description tools
Content structural asp
ects
8/7/2019 mpeg7_VisualPart
55/79
55
Content structural aspects
Temporal segments
(Segment Relationship DS graph)
Viewpoint of conceptual notions
8/7/2019 mpeg7_VisualPart
56/79
56
Viewpoint of conceptual notions
8/7/2019 mpeg7_VisualPart
57/79
57
Content Semanticaspects(SemanticGraph)
Example of Structure Semantic Link DS
8/7/2019 mpeg7_VisualPart
58/79
58
Example of Structure-Semantic Link DS
Content abstraction aspects (CoAbstr)-
8/7/2019 mpeg7_VisualPart
59/79
59
Content abstraction aspects (CoAbstr)-Hierarchical summary of a video
f0f0
f0
f00
f01f02
- > enables rapid browsing, navigation(also sequential summary)
(CoAbstr)-Partitions and decompositions
8/7/2019 mpeg7_VisualPart
60/79
60
(CoAbstr) Partitions and decompositions(ViewDecomposition DS)
Frequency-space graph
8/7/2019 mpeg7_VisualPart
61/79
61
(CoAbstr) Content Variation
Universal Multimedia Access: Adapt delivery tonetwork and terminal characteristics
C Abst A c ll cti n (C ll i
8/7/2019 mpeg7_VisualPart
62/79
62
CoAbstr A collection (CollectionStructureDS)
- >groups segments, events, or objects
into collection clusters and specifiesproperties that are common to theelements:The CollectionStructure DS describesalso statistics and models of theattribute values of the elements, such as
a mean color histogram for a collectionof images.The CollectionStructure DS alsodescribes relationships among collectionclusters.
R f S ft th XM
8/7/2019 mpeg7_VisualPart
63/79
63
Reference Software: the XM
XM implements MPEG-7 Descriptors (Ds) MPEG-7 Description Schemes (DSs) Coding Schemes DDL
extraction
8/7/2019 mpeg7_VisualPart
64/79
64
Beyond mpeg-7 version 1 (D&DS in VXM)
ColorTemperature: This descriptor specifies the perceptual temperaturefeeling of illumination color in an image for browsing and display preference control
purposes (user friendly). Four perceptual temperature browsing categories areprovided; hot, warm, moderate, and cool. Each category is used for browsing imagesbased upon its perceptual meaning. uses dominant color descriptor
Illumination Invariant Color: wraps the color descriptors. One or more colordescriptors processed by the illumination invariant method can be included in this
descriptor.
Shape Variation: can describe shape variations in terms of Shape Variation Map andthe statistics of the region shape description of each binary shape image in thecollection. Shape Variation Map consists of StaticShapeVariation andDynamicShapeVariation. The former corresponds to 35 quantized ART coefficients
on a 2-dimensional histogram of group of shape images and the latter to the inverseof the histogram except the background.
Media-centric description schemes: Three visual description schemes are designedto describe several types of visual contents. The StillRegionFeatureType containsseveral elementary descriptors to describe the characteristics of arbitrary shaped
still regions.
Vi l CE t h
8/7/2019 mpeg7_VisualPart
65/79
65
Visual CE current phase
CE explore new technologies on identifying original imagesand their modified versions (N-1 modified versions),focused on the accuracy and robustness of identification
- > robustness is measured as the accuracy (HitRatio = k/(N))separately calculated with each level of modification
Modifications:
Brightness Size reduction Color to Monochrome
JPEG compr. with varying quality factors
Colorreduction Crop Histogram Equalization
Blur Geometric Transformation
T d MPEG 7 Q F t
8/7/2019 mpeg7_VisualPart
66/79
66
Towards MPEG-7 Query Format
- >Though, the interface to support queries in anMPEG-7 database is not yet supported,requirements have been drafted
Output Query Format
Client
Application
MPEG-7
Database
Input Query Format
Query Management Tools
e.g-query by textualdescription
-Combinations ofquery conditions-spesification ofthe structure ofthe result set
e.g.structure ofthe response
containingthe resulting
set
e.g-spesification ofthe exceptions
-relevantfeedback
Basic search functionalities may
8/7/2019 mpeg7_VisualPart
67/79
67
Basic search functionalities mayinclude:
Query by Description (the clientapplication provides possible querycriteria)
8/7/2019 mpeg7_VisualPart
68/79
68
Query by examplea) b)a) b)
Query
Segment-based search(selecting subparts orROI to refine the searchcriteria)
=>
8/7/2019 mpeg7_VisualPart
69/79
69
Compositional search :from aglobalization page the user may select
a number of interesting (or relevant)images to refine the search criteria
+ =>
Current state of MPEG 7 VXM in CBIR
8/7/2019 mpeg7_VisualPart
70/79
70
Current state of MPEG-7 VXM in CBIR
Query by modified sketch
[segmentation/simplification by assigning arepres. color in each segment/ modification]
Query within ROI Situation-based clustering (Simple clustering/
Clustering on Visual semantic Hints) Category-based clustering (local-concept lexicon:multiple low-level features of locasl regions usedin learning and detecting local concepts)
Query within ROI
8/7/2019 mpeg7_VisualPart
71/79
71
Query within ROI uses EHD and CLD for describing local
properties
- >example: photos search by matching the backgroundregions only
4x4 EHD
CLD on 8x8 DCT Plane
8x8 IDCT
8x8 Spatial DomainAverage color forEach block
CombinedFeatureFor Each4x4 Block
4x4 EHD
CLD on 8x8 DCT Plane
8x8 IDCT
8x8 Spatial DomainAverage color forEach block
CombinedFeatureFor Each4x4 Block
Situation based clustering based on visual
8/7/2019 mpeg7_VisualPart
72/79
72
Situation based clustering based on visualsemantic hints (visual sensation-vs)
Colorfulness (CoF) hint: degree of v.s.according to the purity of colors
- >Utilizes ScalableColorDescriptor
{ } { }256,128,64,32,16where,,,,,,, 321 = SCDNjSCD Nfffff SCDF
Situation based clustering based on visual
8/7/2019 mpeg7_VisualPart
73/79
73
S tuat on based cluster ng based on v sualsemantic hints (visual sensation-vs) (2)
Color Coherence (CoC) Hint: degree of v.s.according to spatial coherency of colors
- > utilizes DominantColorDescriptor
( ){ }DCDjjjDCD
Njsup ,,3,2,1where,,,, == cF
Situation based clustering based on visual
8/7/2019 mpeg7_VisualPart
74/79
74
semantic hints (visual sensation-vs) (3)
Level of Detail (LoD) Hint: degree of a v.sfor objects appearing more or lessdetailed
- > defines a relative compression ratio per
pixel based on the JPEG compression thephoto has gone through
=
Situation based clustering based on visual
8/7/2019 mpeg7_VisualPart
75/79
75
gsemantic hints (visual sensation-vs) (4)
Homogeneous Texture (HoT) Hint: degree ofa v.s according homogeneous texture on photo
-> expresses texture regularity usingTextureBrowsingDescriptor
Heterogeneous Texture (HeT) Hint:
degree of a v.s. on how continoous orstrong the boundaries are on photo-> utilizes EdgeHistogramDescriptor
Category-based clustering
8/7/2019 mpeg7_VisualPart
76/79
76
Category based clusteringlocal-concept lexicon:multiple low-level features
of local regions used in learning and detectinglocal concepts, once the local concepts have beenbuilt , confidence values for each sub-region aremeasured for all local concepts
MPEG l ti ti iti
8/7/2019 mpeg7_VisualPart
77/79
77
MPEG relative activities
Functionalities described before are especially useful for thedeveloper of MPEG-A Photo Player:
offers a standardized solution for the carriage of images
and associated metadata, to facilitate simple and fullyinteroperable exchange across different devices and platforms.
- >The set of metadata includes MPEG-7visual content descriptions, as well asacquisition-based metadata (such as date,time and camera settings). This allowscompliant devices to support new, content-enhanced functionality, such as intelligentbrowsing, content-based search or
automatic categorization
8/7/2019 mpeg7_VisualPart
78/79
78
Summary
MPEG-7 Standard- MPEG-7 visual and content structure description tools (Ds &DSs using DDL)
MPEG-7 requirements on Queries Format
MPEG-7 VXM current phase (descriptors and CBIR)
Multimedia segmentation, understanding, andsearching, among others, are still a challenge
8/7/2019 mpeg7_VisualPart
79/79
The end.
Most of the pictures or their basic ideas are takenfrom the listed papers and web pages.