Upload
vandat
View
222
Download
1
Embed Size (px)
Citation preview
A System for Content-based Indexing and Retrieval in
Multimedia Databases
System Overview & Applications
by Serkan KIRANYAZ.
MUVIS (v1.6) Overview
MBrowserQuery
BrowsingVideo
SummarizationDisplay
DbsEditor
Encoding-Decoding-Rendering
DatabaseCreation
FeX- AFeXManagement
AVDatabase
Capturing
Encoding
Recording
AV DatabaseCreation
Still Images
*.jpg
*.gif *.bmp
*.pct
*.pcx *.png
*.pgm
*.wmf
*.eps
*.tga
Real TimeVideo-Audio
Stored MM(Video-Audio)
An Image
A Frame
A Video-Audio Clip
Image and MM filesAppending - Deleting
Appending into Dbs.
Image and MMfiles - typesConvertions
DatabaseEditing
*.jp2
*.yuv
AVDatabase
ImageDatabase
HybridDatabase
FeX Modules AFeXModules
Fex & AFeX API
Indexing Retrieval
MUVIS Multimedia
MUVIS Audio MUVIS Video
CodecsSampling
Freq. Channel File Formats Codecs Frame Rate FrameSize File Formats
MP3 8 & 11.025 Khz Mono MP3 H263+ 1..25 fps Any AVI
AAC 12 & 16 KHz Stereo AAC MPEG-4 MP4
G721 22.050 KHz AVI YUV 4:2:0
G723 24 KHz MP4 RGB 24
PCM 32 KHz
44.1 KHz
PGMWMFEPSPCXTGA PCTGIFPCXPNGTIFFBMPJPEG 2KJPEG
MUVIS Images
Multimedia Databases
3 types of databases in MUVIS:Video Database – vdbs : Containing video clips, keyframes and associated feature informationImage Database – idbs : Containing images and associated feature informationHybrid Database – hdbs : Containing video clips, keyframes, images and associated feature information
Applications of MUVISAVDatabase
Real-time video indexing. Main audio/video database creator.Create and Appendmodes.Real time audio/video capturing, encoding, recording and display
Applications of MUVIS..AVDatabase
Capturing:• Various picture formats, i.e. YUV 4:2:0, RGB24,
YUY2, UYVY.• Various frame rates, i.e. 1fps to 25fps • Various sizes, i.e. QSIF, QCIF, SIF, CIF, etc.
Video codecs: built in last generation codecs, i.e. MPEG-4, H263+.Audio/Video encoding parameters can be defined during recording time, i.e. VLBR to High bit-ratesCapability of recording into interlaced (i.e. AVI, MP4) file formats.Support for several Audio codecs: PCM, G721, G723, MPEG 1, 2, 2.5 layer 3 (MP3), MPEG-2,4 AAC.
Applications of MUVIS..DbsEditor
Creating initial (dummy) audio/video or image MUVIS databases. Extracting new features or removing existing features of a database.Appending external audio/video clips and images to any MUVIS database.Removing existing audio/video clips or images from a MUVIS database.Previewing any audio/video clip or image.Enhanced Multimedia Conversions.
Applications of MUVIS..MBrowser
Main media retrieval and browser terminalSupports any kind of database and various image and video typesSupports 5 levels of video display hierarchy:
Single frameshot frames scene framespartial video (ROI)entire video
Queried MM Clip
First 12 Retrieval ResultsPQ Knob
PQ Browser Handle. 10thPSQ is currently active.
NextPrev
Applications of MUVIS..MBrowser
Supports user-friendly media browsing environment.Enables video summarization through shot (key) and scene frames. Region-of-interest (ROI) video handling and query options.Query based on merging of existing database features with user-defined weights. Supports cross querying between video clips, images and frames.Supports Fex & AFeX framework that aural/visual feature extraction modules that are used in run-time.Supports two Query Schemes: Progressive & Normal Queries.
Visual Feature Extraction in MUVIS
Visual Features are extracted from key frames of video clips, and decoded images.Key frames of video clips are INTRA frames chosen by the video encoder, they might be stored in YUV format while recording.Generation of key frames (if not stored already) by decoding bit stream.Already implemented feature extraction modules are used Feature extraction module generates feature vectorsFeature vectors are stored in their associated feature files.
Visual Feature Extraction (FeX) Framework in MUVISFeatures represented by series of numbers, which correspond to feature vectorsNormalization is required within a feature vector. Each feature vector should have unit length. This is required for multiple feature based retrieval.DbsEditor is the responsible application for the management of the feature extraction into any kind of multimedia databaseDbsEditor supports extraction of multiple features and multiple sub-features with different parameters from any single feature available.
FeX in MUVIS (cont.)
MBrowser can be used to query any multimedia item from any MUVIS database, if there is at least on visual or aural feature extracted within the database.Features of video clips are extracted from the key-frames of the clips.Image features are extracted from 24bit RGB frame buffer of the decoded image. Feature extraction algorithms should be implemented as a Dynamic Link Library (DLL) exclusively with respect to the Feature Extraction Module Interface (FEX API), and stored in appropriate folder (i.e. C:\MUVIS\.)
FeX in MUVIS (cont.)
Q uery item
F eatu reE xtractio n
S im ilar ityM easu rem en t
M u ltim ed iaD atab ase
F ea tu res
E nd U s er
F eatu re E xtractio n
B es t M atc hes
O ther S ourc es
O ff-lin e
O n-line
D is p lay R es u lts
F eatu resM u ltim ed ia Item s
F eatu resM u ltim ed ia Item s
F eatu resF eatu res
FEX Module Interface (FeX API)
Generic FeX API is defined in “Fex_API.h”file. The implemented FeX modules (DLLs) should include Fex_API.h fileFive different API functions and necessary data structures are declared in Fex_API.hThe name of the FeX DLL should be named as: Fex_[FourCC].dll (i.e Fex_TEXT.dll)A FeX DLL file should be stored in the same folder with the application (DBSEditor or MBrowser) or in “C:\MUVIS\” folder
FEX Module Interface (cont.)
DBSEditor
MBrowser
Fex_API.hFex_Bind()Fex_Init()Fex_Extract()Fex_GetDistance()Fex_Exit()
Fex_*.DLL
Fex_BindFex_InitFex_ExtractFex_Exit
Fex_GetDistance
FEX_API.h : Structures
FexParam Structure: The data members in the structure are set by the feature extraction module
typedef struct{• char feat_name[255];• long feat_fourcc;• unsigned int feat_param_no;• long* feat_param_fourcc;• double* feat_param_default; • FrameType ftype;
} FexParam;
FrameParam Structure: Frame data is stored in the structure to be used for feature extraction
typedef struct {• unsigned char *buffer;• int Xs, int Ys;
} FrameParam;
FEX_API.h : Functions
int Fex_Bind(FexParam*): Used for handshaking between a MUVIS application and a FeX module (DLL). Module introduces and binds itself to the applicationint Fex_Init(double*): Used to initialize the FeX modulewith given parametersdouble* Fex_Extract(FrameParam, int&): Used to extract the features of the frame given inside the FrameParamstructureint Fex_Exit(FexParam*): Used to reset or terminate the FeX moduledouble Fex_GetDistance(double*,double*,int): Used by only MBrowser to calculate the distance between two feature vectors
DBSEditor Feature Extraction Process
1. Looks for the FeX modules (DLLs) at startup, loads themand calls Fex_Bind of each module
2. When user enters feature extraction parameters, calls Fex_Init with the parameters to initialize the FeX module
3. For each frame in database, calls Fex_Extract with the FrameParam structure. Vector size should be returned as well as the feature vector. Feature vectors are stored
4. Calls Fex_Exit to reset the FeX module5. To extract new features, step 2 to 4 can be repeated6. While closing, calls Fex_Exit again
MBrowser Query & Feature Extraction Processes
1. Looks for the FeX modules (DLLs) at startup, loads and callsFex_Bind of each module
2. When user queries a media, calls Fex_Init with the chosen parameters to initialize the FeX module
3. For each frame of queried media, calls Fex_Extract with the FrameParam structure. Vector size should be returned as well as the feature vector
4. Calls Fex_Exit to reset the FeX module5. Calls Fex_GetDistance for each frame in database, with
associated feature vectors and extracted feature vectors.Results are displayed accordingly
6. To query new media, step 2 to 5 can be repeated7. While closing, calls Fex_Exit again
New Features on MUVIS v1.6
Audio-only clips are supported Now audio has the same priority as video.Audio based Multimedia Indexing & Retrieval is supported.New File Formats: MP3 & AAC.New Video Capturing Formats: YUY2, UYVY, RGB32 & RGB24.
New Features on MUVIS v1.6 (cont..)
New Image Format: JPEG 2000 (*.jp2, *.j2c).A/V Direct Stream Access is supported KF storage of video clips in a MUVIS database is entirely optional. Image Conversion is supported into one of these formats:
JPEG (with a compression quality factor)JPEG 2000 (with a compression rate)BMPPNGTIFF
Progressive Query (PQ)
Sub-QueryFusion
Sub-QueryFusion
PeriodicSub-Query
Results
t 2t 3t 4t
ProgressiveSub-Query
Result
time
PQ Overview
1. Sub-Query Fusion Operation2. Periodic Sub-Query Formation
• Atomic Sub-Query• Sub-query rate estimation• Fractional Sub-Query
3. Progressive Sub-Query Formation (PSQ)
PROGRESSIVE QUERY OVERVIEW
Fusion
Atomic Sub-Query
No items in ASQ >0
Calculate size ofnext Fractional
Sub-Query
FractionalSub-Query
t > Tp-Tw
Implementedonly at the
beginnning ofthe first periodic
sub-query
Yes
No
Yes
NoPeriodic Sub-
Query(Tp)Stop
Advantages of PQ (vs Normal Query)
Reduced Retrieval Time (Faster Query Speed).Independent from Min. System Requirements.
Memory & Speed
Advantages of PQ
Query AccessibilityInstantaneous Query Retrievals.Query Browsing.Query Stop Capability.
The probability to retrieve better (more relevant) results than the final outcome due to the limited sub-set search within the database.
Audio-Based Multimedia Indexing and Retrieval Framework for MUVIS
A global framework implementation in order to achieve a robust and generic solution for audio-based multimedia indexing and retrieval, specifically:
Generic Support for Audio CodecsGeneric Support for File FormatsGeneric Support for Audio Capturing & Encoding ParametersGeneric Support for AFeX Framework Parameters
The main objective is content-based (speaker, subject, “sounds like..”) retrieval of the audio, which is suitable to human judgment and (aural) perception.
Audio Indexing Scheme in MUVIS
Silence MusicSpeech Fuzzy
Audio Framing & Classification Conversion
Uncertain Speech Music Fuzzy
AFeX Module(s)
......
57
3
010
20 1
29 6
15
Audio Indexing
Speech Music Fuzzy
KF Feature Vectors
Classification & Segmentation per granule / frame.1
Audio Stream
AFeX Operationper frame3
Audio Framingin Valid Classes2
KF Extractionvia MST Clustering4
MST
2. Audio Framing with Classification Conversion
M MMM MM M M SS S SSSSS SX XXXX
Music SpeechUncertain
Classification per granule/frame
Final Classification per audio frame
M: Music S: Speech X: Silence
Uncertain
Audio Feature Extraction (AFeX) Framework
Independent AFeX module(s) integration capability into MUVIS framework for audio-based indexing and retrieval.
DBSEditor
MBrowser
AFex_API.hAFex_Bind()AFex_Init()AFex_Extract()AFex_GetDistance()AFex_Exit()
AFex_*.DLL
AFex_BindAFex_InitAFex_ExtractAFex_Exit
AFex_GetDistance
Key-Framing via MST Clustering
S p ee ch L a b
9
8
1
1 12 13
1418
194
'a'
p
1111
12
17
'L'
1
3168
21
'b'
8
7
69
0101
1
20
'S'
1
29 6
2
1 21
15
'ch'
5
7
31
2
'ee'
21
A Sample AFeX Module Imp.: MFCC
MFCC (Mel-Frequency Cepstrum Coefficients)AFeX module provide generic feature vectors independent from the following parameters:
Sampling Frequency.Number of audio channels (mono/stereo).Audio Volume level.
⎟⎠⎞
⎜⎝⎛ −
⋅⋅= ∑ =
)5.0(coslog)/2(1
2/1 jN
imPc P
j jiπ
Audio Retrieval in MUVIS
FV(i)
FV(0)
Sub-feature Vectorsof a Database Clip
FV(i)
FV(0)
FV(i)
FV(0)
Sub-feature Vectorsof Query Clip
For each frame, a search isdone to find a matching frame
which gives minimum distance.
Matching Class Types
Feature vectors perclass type
Audio Retrieval in MUVIS (cont.)
In order to accomplish an audio based query within MUVIS, an audio clip is chosen in a multimedia database and queried through the database if the database includes at least one audio feature. Let NoS be the number of feature sets for a database and let NoF(s)is the number of sub-features per feature. Sub-features are obtained by changing the AFeX module parameters or the audio frame size during the audio feature extraction process.
( )( )
( )
),(),(
,),(
0
),(),,(min),(
)(
fsDfsWQD
fsDfsD
Cjif
CjiffsDFVfsQFVSDfsD
NoS
s
sNoF
fc
Ci
ii
i
iCjC
jC
ii
i
i
ii
∑ ∑
∑
×=
=
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
∅=∈
∅≠∈=
∈
∈
”Compression effects on Query Performances”
We used Lossy JPEG image compression method. Why?Widely used,Adjustable quality parameter (compression ratio).
We used MPEG-4 and H.263+ video compression methods. Why?
Widely used,Allow integrating scene-cut detection algorithms (key-frames for features),Adjustable target bit rate.
Features Used in the Experiments
We used;HSV and YUV color histograms.Texture features via Gabor Wavelet Transform (GWT) technique.Texture features via Gray Level Co-occurence Matrix (GLCM) technique.
Why?Commonly used features in indexing and retrieval area,Practically feasible, reasonable computational complexity,HSV and YUV color spaces are close to human visual system.
Experimental Databases
3 base databases created by AVDatabase and Idatabase.Base Image Database: General-purpose database.
1594 uncompressed color images from several sources.Various sizes (from 100x100 up to 1200x1000 pixels), color bit depths (8-24 bits).
Base Rock Texture Image Database: Constraint database.1512 uncompressed gray scale texture images of 6 different rock classes.166x166 pixels size and 8 bits luminance depth.
Base Video Database: General-purpose database. 300 video clips from TV channels. CIF-size (352x288 pixels). MPEG-4 compressed with 2 Mbits/sec bit rate.Video clip durations between 40 and 120 seconds.
Experimental Databases (cont.)
7 more image databases generated from the base image database.Each one has unique JPEG compression quality (10% to 75%).Together with the base image database, used for HSV and YUV colorhistogram based experiments.
10 more video databases generated from the base video database.Each one has unique {compression method, bit rate} pair. i.e. {MPEG-4, 1024 Kbits/sec}Used for HSV and YUV color histogram based experiments.
7 more texture databases generated from the base texture image database
Each one has unique JPEG compression quality (10% to 75%).Together with the base texture image database, used for GWT and GLCM texture based experiments.
Subjective Evaluation of Retrieval Experiments
An evaluation group of 10 people.Performance (P), Query Performance (QP) and Overall Query Performance (OQP)values are calculated.
Grade Subjective Meaning
0 No similarity / Not related
1 Slightly related
2 Related
3 Similar
4 Fairly similar
5 Same / Almost identical
∑=
⋅=12
2iii WGP where Gi∈ [0,5] is the given subjective grade, and Wi =
(13-i) is the weight for the item in the rank i.
N
QPOQP
N
kk∑
== 1
NP
PQP
NP
jj∑
== 1where N is the number of queries, and NP (=10)is the size of the evaluation group.
A Sample Retrieval Setup
Rank Grade
2 5
3 3
4 4
5 0
6 3
7 2
8 1
9 1
10 1
11 1
12 3
P = (5*11) + (3*10) + (4*9) + (0*8) + (3*7) + (2*6) + (1*5) + (1*4) + (1*3) + (1*2) + (3*1) = 171
Color Based Image RetrievalPerformance Results
30 images selected from the base uncompressed databaseIn each database, selected 30 images (compressed with the corresponding compression quality) were queried.Grades were collected to obtain the OQP values.
Conclusions:The best color based retrieval performance in uncompressed image databaseIncreasing compression ratio gradually decreases the retrieval performanceHSV color hist. performs better than YUV color hist. in the uncompressed domain, and vice versa in the compressed domain
In general, 30% quality parameter achieves a satisfying compression ratio (94.4%), and does not cause significant loss in the overall retrieval performance.
100
110
120
130
140
150
160
170
180
Base 75 50 40 30 25 15 10
JPEG Compression Quality
OQ
P Va
lue
HSV YUV
Color Based Video RetrievalPerformance Results
15 video clips selected from the base video databaseIn each database, selected 15 video clips (compressed with the corresponding compression bit rate) were queried.Grades were collected to obtain the OQP values.
Conclusions:Color based video retrieval performance is better in MPEG-4 compressed databases.More stable in H.263+ compressed databases at lower bit rates It might be expected that in H.263+ compressed databases, retrieval performance is better at very low bit rates HSV color hist. gives slightly better performance512 Kbits/sec can be the optimum compression bit rate at high bit ratesFor low bit rate applications, 128 Kbits/sec for MPEG-4 and 64 Kbits/sec for H.263+ can be preferable
100
110
120
130
140
150
160
170
180
1024 512 256 128 64
Video Compression Bitrate
OQ
P Va
lue
HSV / MPEG-4 YUV / MPEG-4 HSV / H.263+ YUV / H.263+
Texture Based Image RetrievalPerformance Results
15 images selected from the base image databaseIn each database, selected 15 images (compressed with the corresponding compression quality) were queried.Grades were collected to obtain the OQP values.
Conclusions:Texture based image retrieval gives more robust performance results color based retrievalGLCM feature extraction technique gives higher preformance than GWT technique Retrieval performance based on GWT is more stable25% compression quality achieves the highest compression ratio without loosing significant retrieval performance
100120140160180200220240260280300320
Base 75 50 40 30 25 15 10
JPEG Compression Quality
OQ
P Va
lue
Gabor GLCM
Conclusion and RemarksMUVIS system is designed to bring a unified and global solution to content-based multimedia indexing and retrieval problem.Furthermore, it supports several user friendly media browsing capabilities.The main objective is to focus on multi-feature support which possibly uses shape, texture and audio along with the color information.MUVIS framework supports integration of the aural and visual feature extraction algorithms explicitly.
Conclusion and Remarks (cont.)
Following visual features are already implemented and integrated into MUVIS framework:
HSV, RGB and YUV Color Histogram BINS,Texture Feature using Gabor Wavelet Transform, Texture Feature using GLCM (Gray Level Cooccurence Matrix).Canny Edge Histogram Feature.
Conclusions and Remarks (cont)
Audio is important. Sometimes it bears more semantic and content information than visual part.Henceforth the preliminary results shows the effectiveness of the audio-based retrieval compared to visual retrievals (similar or better results).Classification and segmentation algorithm has been recently improved. A new approach based on fuzzy-regions and semantic-rule-based classification with intra segment boundary detection has been developed.
Conclusions and Remarks (cont.)
MFCC prove to be as successful tool for audio based indexing & retrieval as in speech recognition and speaker identification systems.Some problems are still open:
Normalization of AFeX feature vectors (for merging). Other AFeX modules should be developed and tested.Studies on merging visual and aural retrievals should be provided to improve overall retrieval performance.
Conclusions and Remarks (cont.)
MUVIS provides the following advantages via Progressive Query:
Query Accessibility• Instantaneous Query Retrievals.• Query Browsing.• Query Stop Capability.
The probability to retrieve better (more relevant) results than the final outcome due to the limited sub-set search within the database.Faster retrievals with low memory consumption