51
A System for Content-based Indexing and Retrieval in Multimedia Databases System Overview & Applications by Serkan KIRANYAZ.

A System for Content-based Indexing and Retrieval in ...karen/AudioMUVIS.pdf · Indexing and Retrieval in Multimedia Databases System Overview & Applications ... Freq. Channel File

  • Upload
    vandat

  • View
    222

  • Download
    1

Embed Size (px)

Citation preview

A System for Content-based Indexing and Retrieval in

Multimedia Databases

System Overview & Applications

by Serkan KIRANYAZ.

MUVIS (v1.6) Overview

MBrowserQuery

BrowsingVideo

SummarizationDisplay

DbsEditor

Encoding-Decoding-Rendering

DatabaseCreation

FeX- AFeXManagement

AVDatabase

Capturing

Encoding

Recording

AV DatabaseCreation

Still Images

*.jpg

*.gif *.bmp

*.pct

*.pcx *.png

*.pgm

*.wmf

*.eps

*.tga

Real TimeVideo-Audio

Stored MM(Video-Audio)

An Image

A Frame

A Video-Audio Clip

Image and MM filesAppending - Deleting

Appending into Dbs.

Image and MMfiles - typesConvertions

DatabaseEditing

*.jp2

*.yuv

AVDatabase

ImageDatabase

HybridDatabase

FeX Modules AFeXModules

Fex & AFeX API

Indexing Retrieval

MUVIS Multimedia

MUVIS Audio MUVIS Video

CodecsSampling

Freq. Channel File Formats Codecs Frame Rate FrameSize File Formats

MP3 8 & 11.025 Khz Mono MP3 H263+ 1..25 fps Any AVI

AAC 12 & 16 KHz Stereo AAC MPEG-4 MP4

G721 22.050 KHz AVI YUV 4:2:0

G723 24 KHz MP4 RGB 24

PCM 32 KHz

44.1 KHz

PGMWMFEPSPCXTGA PCTGIFPCXPNGTIFFBMPJPEG 2KJPEG

MUVIS Images

Multimedia Databases

3 types of databases in MUVIS:Video Database – vdbs : Containing video clips, keyframes and associated feature informationImage Database – idbs : Containing images and associated feature informationHybrid Database – hdbs : Containing video clips, keyframes, images and associated feature information

Applications of MUVISAVDatabase

Real-time video indexing. Main audio/video database creator.Create and Appendmodes.Real time audio/video capturing, encoding, recording and display

Applications of MUVIS..AVDatabase

Capturing:• Various picture formats, i.e. YUV 4:2:0, RGB24,

YUY2, UYVY.• Various frame rates, i.e. 1fps to 25fps • Various sizes, i.e. QSIF, QCIF, SIF, CIF, etc.

Video codecs: built in last generation codecs, i.e. MPEG-4, H263+.Audio/Video encoding parameters can be defined during recording time, i.e. VLBR to High bit-ratesCapability of recording into interlaced (i.e. AVI, MP4) file formats.Support for several Audio codecs: PCM, G721, G723, MPEG 1, 2, 2.5 layer 3 (MP3), MPEG-2,4 AAC.

Applications of MUVIS..DbsEditor

Creating initial (dummy) audio/video or image MUVIS databases. Extracting new features or removing existing features of a database.Appending external audio/video clips and images to any MUVIS database.Removing existing audio/video clips or images from a MUVIS database.Previewing any audio/video clip or image.Enhanced Multimedia Conversions.

Applications of MUVIS..MBrowser

Main media retrieval and browser terminalSupports any kind of database and various image and video typesSupports 5 levels of video display hierarchy:

Single frameshot frames scene framespartial video (ROI)entire video

Queried MM Clip

First 12 Retrieval ResultsPQ Knob

PQ Browser Handle. 10thPSQ is currently active.

NextPrev

Applications of MUVIS..MBrowser

Supports user-friendly media browsing environment.Enables video summarization through shot (key) and scene frames. Region-of-interest (ROI) video handling and query options.Query based on merging of existing database features with user-defined weights. Supports cross querying between video clips, images and frames.Supports Fex & AFeX framework that aural/visual feature extraction modules that are used in run-time.Supports two Query Schemes: Progressive & Normal Queries.

Visual Feature Extraction in MUVIS

Visual Features are extracted from key frames of video clips, and decoded images.Key frames of video clips are INTRA frames chosen by the video encoder, they might be stored in YUV format while recording.Generation of key frames (if not stored already) by decoding bit stream.Already implemented feature extraction modules are used Feature extraction module generates feature vectorsFeature vectors are stored in their associated feature files.

Visual Feature Extraction (FeX) Framework in MUVISFeatures represented by series of numbers, which correspond to feature vectorsNormalization is required within a feature vector. Each feature vector should have unit length. This is required for multiple feature based retrieval.DbsEditor is the responsible application for the management of the feature extraction into any kind of multimedia databaseDbsEditor supports extraction of multiple features and multiple sub-features with different parameters from any single feature available.

FeX in MUVIS (cont.)

MBrowser can be used to query any multimedia item from any MUVIS database, if there is at least on visual or aural feature extracted within the database.Features of video clips are extracted from the key-frames of the clips.Image features are extracted from 24bit RGB frame buffer of the decoded image. Feature extraction algorithms should be implemented as a Dynamic Link Library (DLL) exclusively with respect to the Feature Extraction Module Interface (FEX API), and stored in appropriate folder (i.e. C:\MUVIS\.)

FeX in MUVIS (cont.)

Q uery item

F eatu reE xtractio n

S im ilar ityM easu rem en t

M u ltim ed iaD atab ase

F ea tu res

E nd U s er

F eatu re E xtractio n

B es t M atc hes

O ther S ourc es

O ff-lin e

O n-line

D is p lay R es u lts

F eatu resM u ltim ed ia Item s

F eatu resM u ltim ed ia Item s

F eatu resF eatu res

FEX Module Interface (FeX API)

Generic FeX API is defined in “Fex_API.h”file. The implemented FeX modules (DLLs) should include Fex_API.h fileFive different API functions and necessary data structures are declared in Fex_API.hThe name of the FeX DLL should be named as: Fex_[FourCC].dll (i.e Fex_TEXT.dll)A FeX DLL file should be stored in the same folder with the application (DBSEditor or MBrowser) or in “C:\MUVIS\” folder

FEX Module Interface (cont.)

DBSEditor

MBrowser

Fex_API.hFex_Bind()Fex_Init()Fex_Extract()Fex_GetDistance()Fex_Exit()

Fex_*.DLL

Fex_BindFex_InitFex_ExtractFex_Exit

Fex_GetDistance

FEX_API.h : Structures

FexParam Structure: The data members in the structure are set by the feature extraction module

typedef struct{• char feat_name[255];• long feat_fourcc;• unsigned int feat_param_no;• long* feat_param_fourcc;• double* feat_param_default; • FrameType ftype;

} FexParam;

FrameParam Structure: Frame data is stored in the structure to be used for feature extraction

typedef struct {• unsigned char *buffer;• int Xs, int Ys;

} FrameParam;

FEX_API.h : Functions

int Fex_Bind(FexParam*): Used for handshaking between a MUVIS application and a FeX module (DLL). Module introduces and binds itself to the applicationint Fex_Init(double*): Used to initialize the FeX modulewith given parametersdouble* Fex_Extract(FrameParam, int&): Used to extract the features of the frame given inside the FrameParamstructureint Fex_Exit(FexParam*): Used to reset or terminate the FeX moduledouble Fex_GetDistance(double*,double*,int): Used by only MBrowser to calculate the distance between two feature vectors

DBSEditor Feature Extraction Process

1. Looks for the FeX modules (DLLs) at startup, loads themand calls Fex_Bind of each module

2. When user enters feature extraction parameters, calls Fex_Init with the parameters to initialize the FeX module

3. For each frame in database, calls Fex_Extract with the FrameParam structure. Vector size should be returned as well as the feature vector. Feature vectors are stored

4. Calls Fex_Exit to reset the FeX module5. To extract new features, step 2 to 4 can be repeated6. While closing, calls Fex_Exit again

MBrowser Query & Feature Extraction Processes

1. Looks for the FeX modules (DLLs) at startup, loads and callsFex_Bind of each module

2. When user queries a media, calls Fex_Init with the chosen parameters to initialize the FeX module

3. For each frame of queried media, calls Fex_Extract with the FrameParam structure. Vector size should be returned as well as the feature vector

4. Calls Fex_Exit to reset the FeX module5. Calls Fex_GetDistance for each frame in database, with

associated feature vectors and extracted feature vectors.Results are displayed accordingly

6. To query new media, step 2 to 5 can be repeated7. While closing, calls Fex_Exit again

New Features on MUVIS v1.6

Audio-only clips are supported Now audio has the same priority as video.Audio based Multimedia Indexing & Retrieval is supported.New File Formats: MP3 & AAC.New Video Capturing Formats: YUY2, UYVY, RGB32 & RGB24.

New Features on MUVIS v1.6 (cont..)

New Image Format: JPEG 2000 (*.jp2, *.j2c).A/V Direct Stream Access is supported KF storage of video clips in a MUVIS database is entirely optional. Image Conversion is supported into one of these formats:

JPEG (with a compression quality factor)JPEG 2000 (with a compression rate)BMPPNGTIFF

A NOVEL MULTIMEDIA RETRIEVAL TECHNIQUE:

“PROGRESSIVE QUERY”

Progressive Query (PQ)

Sub-QueryFusion

Sub-QueryFusion

PeriodicSub-Query

Results

t 2t 3t 4t

ProgressiveSub-Query

Result

time

PQ Overview

1. Sub-Query Fusion Operation2. Periodic Sub-Query Formation

• Atomic Sub-Query• Sub-query rate estimation• Fractional Sub-Query

3. Progressive Sub-Query Formation (PSQ)

PROGRESSIVE QUERY OVERVIEW

Fusion

Atomic Sub-Query

No items in ASQ >0

Calculate size ofnext Fractional

Sub-Query

FractionalSub-Query

t > Tp-Tw

Implementedonly at the

beginnning ofthe first periodic

sub-query

Yes

No

Yes

NoPeriodic Sub-

Query(Tp)Stop

Advantages of PQ (vs Normal Query)

Reduced Retrieval Time (Faster Query Speed).Independent from Min. System Requirements.

Memory & Speed

Advantages of PQ

Query AccessibilityInstantaneous Query Retrievals.Query Browsing.Query Stop Capability.

The probability to retrieve better (more relevant) results than the final outcome due to the limited sub-set search within the database.

Audio-Based Multimedia Indexing and Retrieval

Framework within MUVIS

Audio-Based Multimedia Indexing and Retrieval Framework for MUVIS

A global framework implementation in order to achieve a robust and generic solution for audio-based multimedia indexing and retrieval, specifically:

Generic Support for Audio CodecsGeneric Support for File FormatsGeneric Support for Audio Capturing & Encoding ParametersGeneric Support for AFeX Framework Parameters

The main objective is content-based (speaker, subject, “sounds like..”) retrieval of the audio, which is suitable to human judgment and (aural) perception.

Audio Indexing Scheme in MUVIS

Silence MusicSpeech Fuzzy

Audio Framing & Classification Conversion

Uncertain Speech Music Fuzzy

AFeX Module(s)

......

57

3

010

20 1

29 6

15

Audio Indexing

Speech Music Fuzzy

KF Feature Vectors

Classification & Segmentation per granule / frame.1

Audio Stream

AFeX Operationper frame3

Audio Framingin Valid Classes2

KF Extractionvia MST Clustering4

MST

2. Audio Framing with Classification Conversion

M MMM MM M M SS S SSSSS SX XXXX

Music SpeechUncertain

Classification per granule/frame

Final Classification per audio frame

M: Music S: Speech X: Silence

Uncertain

Audio Feature Extraction (AFeX) Framework

Independent AFeX module(s) integration capability into MUVIS framework for audio-based indexing and retrieval.

DBSEditor

MBrowser

AFex_API.hAFex_Bind()AFex_Init()AFex_Extract()AFex_GetDistance()AFex_Exit()

AFex_*.DLL

AFex_BindAFex_InitAFex_ExtractAFex_Exit

AFex_GetDistance

Key-Framing via MST Clustering

S p ee ch L a b

9

8

1

1 12 13

1418

194

'a'

p

1111

12

17

'L'

1

3168

21

'b'

8

7

69

0101

1

20

'S'

1

29 6

2

1 21

15

'ch'

5

7

31

2

'ee'

21

A Sample AFeX Module Imp.: MFCC

MFCC (Mel-Frequency Cepstrum Coefficients)AFeX module provide generic feature vectors independent from the following parameters:

Sampling Frequency.Number of audio channels (mono/stereo).Audio Volume level.

⎟⎠⎞

⎜⎝⎛ −

⋅⋅= ∑ =

)5.0(coslog)/2(1

2/1 jN

imPc P

j jiπ

Audio Retrieval in MUVIS

FV(i)

FV(0)

Sub-feature Vectorsof a Database Clip

FV(i)

FV(0)

FV(i)

FV(0)

Sub-feature Vectorsof Query Clip

For each frame, a search isdone to find a matching frame

which gives minimum distance.

Matching Class Types

Feature vectors perclass type

Audio Retrieval in MUVIS (cont.)

In order to accomplish an audio based query within MUVIS, an audio clip is chosen in a multimedia database and queried through the database if the database includes at least one audio feature. Let NoS be the number of feature sets for a database and let NoF(s)is the number of sub-features per feature. Sub-features are obtained by changing the AFeX module parameters or the audio frame size during the audio feature extraction process.

( )( )

( )

),(),(

,),(

0

),(),,(min),(

)(

fsDfsWQD

fsDfsD

Cjif

CjiffsDFVfsQFVSDfsD

NoS

s

sNoF

fc

Ci

ii

i

iCjC

jC

ii

i

i

ii

∑ ∑

×=

=

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

∅=∈

∅≠∈=

Experiments in MUVIS Framework:

”Visual Compression effects on Query Performances”

”Compression effects on Query Performances”

We used Lossy JPEG image compression method. Why?Widely used,Adjustable quality parameter (compression ratio).

We used MPEG-4 and H.263+ video compression methods. Why?

Widely used,Allow integrating scene-cut detection algorithms (key-frames for features),Adjustable target bit rate.

Features Used in the Experiments

We used;HSV and YUV color histograms.Texture features via Gabor Wavelet Transform (GWT) technique.Texture features via Gray Level Co-occurence Matrix (GLCM) technique.

Why?Commonly used features in indexing and retrieval area,Practically feasible, reasonable computational complexity,HSV and YUV color spaces are close to human visual system.

Experimental Databases

3 base databases created by AVDatabase and Idatabase.Base Image Database: General-purpose database.

1594 uncompressed color images from several sources.Various sizes (from 100x100 up to 1200x1000 pixels), color bit depths (8-24 bits).

Base Rock Texture Image Database: Constraint database.1512 uncompressed gray scale texture images of 6 different rock classes.166x166 pixels size and 8 bits luminance depth.

Base Video Database: General-purpose database. 300 video clips from TV channels. CIF-size (352x288 pixels). MPEG-4 compressed with 2 Mbits/sec bit rate.Video clip durations between 40 and 120 seconds.

Experimental Databases (cont.)

7 more image databases generated from the base image database.Each one has unique JPEG compression quality (10% to 75%).Together with the base image database, used for HSV and YUV colorhistogram based experiments.

10 more video databases generated from the base video database.Each one has unique {compression method, bit rate} pair. i.e. {MPEG-4, 1024 Kbits/sec}Used for HSV and YUV color histogram based experiments.

7 more texture databases generated from the base texture image database

Each one has unique JPEG compression quality (10% to 75%).Together with the base texture image database, used for GWT and GLCM texture based experiments.

Subjective Evaluation of Retrieval Experiments

An evaluation group of 10 people.Performance (P), Query Performance (QP) and Overall Query Performance (OQP)values are calculated.

Grade Subjective Meaning

0 No similarity / Not related

1 Slightly related

2 Related

3 Similar

4 Fairly similar

5 Same / Almost identical

∑=

⋅=12

2iii WGP where Gi∈ [0,5] is the given subjective grade, and Wi =

(13-i) is the weight for the item in the rank i.

N

QPOQP

N

kk∑

== 1

NP

PQP

NP

jj∑

== 1where N is the number of queries, and NP (=10)is the size of the evaluation group.

A Sample Retrieval Setup

Rank Grade

2 5

3 3

4 4

5 0

6 3

7 2

8 1

9 1

10 1

11 1

12 3

P = (5*11) + (3*10) + (4*9) + (0*8) + (3*7) + (2*6) + (1*5) + (1*4) + (1*3) + (1*2) + (3*1) = 171

Color Based Image RetrievalPerformance Results

30 images selected from the base uncompressed databaseIn each database, selected 30 images (compressed with the corresponding compression quality) were queried.Grades were collected to obtain the OQP values.

Conclusions:The best color based retrieval performance in uncompressed image databaseIncreasing compression ratio gradually decreases the retrieval performanceHSV color hist. performs better than YUV color hist. in the uncompressed domain, and vice versa in the compressed domain

In general, 30% quality parameter achieves a satisfying compression ratio (94.4%), and does not cause significant loss in the overall retrieval performance.

100

110

120

130

140

150

160

170

180

Base 75 50 40 30 25 15 10

JPEG Compression Quality

OQ

P Va

lue

HSV YUV

Color Based Video RetrievalPerformance Results

15 video clips selected from the base video databaseIn each database, selected 15 video clips (compressed with the corresponding compression bit rate) were queried.Grades were collected to obtain the OQP values.

Conclusions:Color based video retrieval performance is better in MPEG-4 compressed databases.More stable in H.263+ compressed databases at lower bit rates It might be expected that in H.263+ compressed databases, retrieval performance is better at very low bit rates HSV color hist. gives slightly better performance512 Kbits/sec can be the optimum compression bit rate at high bit ratesFor low bit rate applications, 128 Kbits/sec for MPEG-4 and 64 Kbits/sec for H.263+ can be preferable

100

110

120

130

140

150

160

170

180

1024 512 256 128 64

Video Compression Bitrate

OQ

P Va

lue

HSV / MPEG-4 YUV / MPEG-4 HSV / H.263+ YUV / H.263+

Texture Based Image RetrievalPerformance Results

15 images selected from the base image databaseIn each database, selected 15 images (compressed with the corresponding compression quality) were queried.Grades were collected to obtain the OQP values.

Conclusions:Texture based image retrieval gives more robust performance results color based retrievalGLCM feature extraction technique gives higher preformance than GWT technique Retrieval performance based on GWT is more stable25% compression quality achieves the highest compression ratio without loosing significant retrieval performance

100120140160180200220240260280300320

Base 75 50 40 30 25 15 10

JPEG Compression Quality

OQ

P Va

lue

Gabor GLCM

Conclusion and RemarksMUVIS system is designed to bring a unified and global solution to content-based multimedia indexing and retrieval problem.Furthermore, it supports several user friendly media browsing capabilities.The main objective is to focus on multi-feature support which possibly uses shape, texture and audio along with the color information.MUVIS framework supports integration of the aural and visual feature extraction algorithms explicitly.

Conclusion and Remarks (cont.)

Following visual features are already implemented and integrated into MUVIS framework:

HSV, RGB and YUV Color Histogram BINS,Texture Feature using Gabor Wavelet Transform, Texture Feature using GLCM (Gray Level Cooccurence Matrix).Canny Edge Histogram Feature.

Conclusions and Remarks (cont)

Audio is important. Sometimes it bears more semantic and content information than visual part.Henceforth the preliminary results shows the effectiveness of the audio-based retrieval compared to visual retrievals (similar or better results).Classification and segmentation algorithm has been recently improved. A new approach based on fuzzy-regions and semantic-rule-based classification with intra segment boundary detection has been developed.

Conclusions and Remarks (cont.)

MFCC prove to be as successful tool for audio based indexing & retrieval as in speech recognition and speaker identification systems.Some problems are still open:

Normalization of AFeX feature vectors (for merging). Other AFeX modules should be developed and tested.Studies on merging visual and aural retrievals should be provided to improve overall retrieval performance.

Conclusions and Remarks (cont.)

MUVIS provides the following advantages via Progressive Query:

Query Accessibility• Instantaneous Query Retrievals.• Query Browsing.• Query Stop Capability.

The probability to retrieve better (more relevant) results than the final outcome due to the limited sub-set search within the database.Faster retrievals with low memory consumption