27
Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content- Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

Embed Size (px)

Citation preview

Page 1: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

Hierarchical Cellular Tree: An Efficient

Indexing Scheme for Content-Based

Retrieval on Multimedia Databases

Serkan Kiranyaz and Moncef Gabbouj

Page 2: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

Objective

• To present the technique of using a Hierarchical Cellular Tree (HCT) as an indexing scheme for content-based retrieval on multimedia databases.

Page 3: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

Why is this technique important?

• Technological hardware and network improvements

• Daily usage of Internet

• Technique reduces costly I/O operations

Page 4: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Overview

• Is a MAM(Metric Access Method) technique.

• Based off the M-tree

• Is a dynamic, cell-based, hierarchical structured indexing method

• Items are partitioned based on distances and stored within cells based on their similarity proximity

• Self-organized tree implemented via genetic programming principles

Page 5: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

Indexing Technique Categories

SAM (spatial access method)

• (dis-)similarity distance only measured through Euclidean distance.o Not suited for

deep spanning trees

MAM (metric access method)

• Support black box approach to (dis-)similarity distance.o Allows for deep

trees

• Do not support dynamic changes*

Page 6: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

*M-tree Similarities

• Is a dynamic MAM

• Has a hierarchical structure based on the mitosis of a cello Tree grows one level upwards whenever a

split occurs at the top level

• Each cell is represented by a nucleus (except the top most cell)

Page 7: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

M-tree Problems

• Achieves a balanced tree with low I/O cost in large datasetso Problem: Multimedia databases are seldom

balanced at all.o HCT: Cells are unbalanced and can vary in

size

• Must know the size of the database entries/Cells before building (capacity M)o Problem: All M-tree structures can hit upper

limits (size non dynamic)o HCT: Removes limit on cell size as long as

they keep a definite "compactness" measure

Page 8: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

M-tree Problems

• M-tree compactness is only measured with respect to distance of nucleus to furthest object (covering radius)o Problem: Determining compactness this way

does not allow for dynamic sizing of cells.o HCT: Uses all cell items and their minimum

distances to the cell(instead of a single nucleus item alone), compactness is constantly being updated.

Page 9: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

Related Work in Multimedia Databases (SAM trees)

• KD-Trees o Hierarchical tree structureo Use space-partitioning methods to divide the

feature space into predefined hyperplanes

• R-Treeso Feature space divided according to

distribution of database itemso Region overlapping may occur

Page 10: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

Related Work in Multimedia Databases (SAM trees)

• R*-treeso Improves the node splitting of R-tree by taking

overlapping areas into consideration

• TV-treeo Uses telescope vectorso Authors call telescope vectors "so called

telescope vectors"o Google search does not come up with

anything meaningful for telescope vectors

Page 11: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

Related Work in Multimedia Databases (SAM trees)

• X-treeo Avoids overlapping of region bounding boxes

by using a new organization of the directoryo Boxes can still intersect at higher levels in the

treeo Paper does not go into detail on what a

bounding box is (assumption bounding box = cell)

• SS-treeo Uses minimum bounding spheres instead of

boxeso Less intersects at higher levels

Page 12: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

Related Work in Multimedia Databases (MAM trees)

• vp-tree(vantage point)o organizes feature vectors(data points) into

two groups according to their similarity distances with respect to a single point(vantage point)

• mvp-tree(multiple vantage point)o assigns multiple vantage points instead of

one

Page 13: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Structure - Cell Structure

• Basic container in which similar database items are stored.

• Ground level cells contain the entire database items

• Cells carry an MST (Minimum Spanning Tree)o Holds minimum (dis-)similarity distance of each item to other

items within the cell.o Used to determine when mitosis should occur.

Splits occur at longest branch.o This is actually very similar to MVP-tree except every cell is

treated as a vantage point. Better idea about the similarity proximity of an item.

Page 14: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Structure - Cell Structure

• Cells cannot undergo mitosis before reaching a specific level of maturityo This works like real cellso Reason for this is not like real cells

• Nucleuso Represents the owner cell of a higher levelo Nucleus is found through MST

Item with maximum number of brancheso Nucleus is updated with every operation

performed M-tree does not do this

Page 15: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Structure - Cell Structure

• Cell Compactnesso How tight focused the clustering for items

within the cello High variations are eliminated by using more

than a single item(vantage point)

Page 16: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Structure - Cell Structure

• Cell Mitosiso Two conditions for mitosis

Maturity (Nc > Nm) • c = number of items in cell• m = maturity minimum limit

Cell Compactness (CFc > CThrL)

• CFc = Compactness feature

• CThrL = current level compactness threshold

o Cell Mitosis has no cost as the cell is simply split by breaking longest branch

Page 17: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Structure - Cell Structure

Page 18: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Structure - Level Structure

• Top level always single cello If mitosis occurs on top level, new top level is

created to preserve single cell top level.

• Each level attempts to dynamically maximize compactness of cells

Page 19: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Structure - HCT Operations

• Three operationso Cell mitosiso Item insertiono Item removal

• As stated before all three operations cause a recalculation of Compactness

Page 20: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Structure - HCT Operations

• Inserto First performs the Pre-Emptive cell search

recursively descends HCT from top to target level

o Once target located, insert item into target cell

o Perform post-processing check Check for mitosis Recalculate compactness for single or

multiple cellso If mitosis was performed

Remove old nucleus item from higher level Consecutively call Insert for new nucleus

Page 21: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Structure - HCT Indexing

• HCT can index using any set of available featureso Must have fusion mechanismo Must have similarity measure

• Consists of two operationso Incremental constructiono Optional periodic fitness check

Page 22: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Structure - HCT Indexing

• HCT Incremental Constructiono Takes a Database D and appends all new

items contained in an Arrayo If an HCT does not already exist for database

D All current items of D are inserted into the

Array A new HCT body is constructed from D

o Else if an HCT does exist for database D HCT body is first loaded HCT body is updated with contents of Array

Page 23: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Structure - HCT Indexing

• HCT Fitness Checko Aims to minimize corruption which can

happen during construction of HCT body Corruption happens because the order of

items that are inserted is not handledo Outliers Check

Reduces the "crowd effect" by removing redundant minority cells• minority cells, cells with a few or one item in it

All minority cells are reintroduced into the system to see if they fit into another cell

Page 24: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT Structure - HCT Indexing

o Cell Merging If a cell merge occurs that is later deemed

as not meeting the requirements of cell compactness it can be merged.

Page 25: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT - Examples

Page 26: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

HCT-Examples

Page 27: Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

QA