33
FastMap FastMap : Algorithm for : Algorithm for Indexing, Data-Mining and Indexing, Data-Mining and Visualization of Traditional Visualization of Traditional and Multimedia Datasets and Multimedia Datasets

Abstract

Embed Size (px)

DESCRIPTION

FastMap : Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. Abstract. Describe a fast algorithm to map objects into points in some k-dimensional space, such that the dis-similarities are preserved. Abstract. - PowerPoint PPT Presentation

Citation preview

FastMap FastMap : Algorithm for : Algorithm for Indexing, Data-Mining and Indexing, Data-Mining and

Visualization of Traditional and Visualization of Traditional and Multimedia DatasetsMultimedia Datasets

FastMap FastMap : Algorithm for : Algorithm for Indexing, Data-Mining and Indexing, Data-Mining and

Visualization of Traditional and Visualization of Traditional and Multimedia DatasetsMultimedia Datasets

AbstractAbstractAbstractAbstract

Describe a fast algorithm to Describe a fast algorithm to map objects into points in map objects into points in some k-dimensional space, some k-dimensional space, such that the dis-similarities such that the dis-similarities are preserved.are preserved.

AbstractAbstractAbstractAbstract

Thus, we can subsequently Thus, we can subsequently use fine-tuned spatial access use fine-tuned spatial access methods (SAMs) to answer methods (SAMs) to answer queries such as “query by queries such as “query by example” or “all pairs query”.example” or “all pairs query”.

IntroductionIntroductionIntroductionIntroduction

Not easy to extract Not easy to extract kk feature- feature-extraction functions, which extraction functions, which map to map to kk-dimensional points-dimensional points

For instance, typed English For instance, typed English words, what distance function words, what distance function should we consider to should we consider to transform one string to the transform one string to the other? other?

SolutionsSolutionsSolutionsSolutions

Old : Multi-Dimensional Old : Multi-Dimensional Scaling (MDS)Scaling (MDS) Unsuitable for indexingUnsuitable for indexing

Proposed : Fast AlgorithmProposed : Fast Algorithm Much fasterMuch faster Allow indexingAllow indexing

ApplicationsApplicationsApplicationsApplications

Image and multimedia Image and multimedia databasesdatabases Medical databasesMedical databases

ApplicationsApplicationsApplicationsApplications

String databases, e.g. OCRString databases, e.g. OCR Time series, e.g. financial Time series, e.g. financial data data

ApplicationsApplicationsApplicationsApplications

Data mining and visualization Data mining and visualization applications applications

Desirable types of queriesDesirable types of queriesDesirable types of queriesDesirable types of queries

query-by-examplequery-by-example search a search a collection of objects to find the collection of objects to find the ones that are within a user-ones that are within a user-defined distance from the defined distance from the query objectquery object

all pairs queryall pairs query find the pairs of find the pairs of objects which are within objects which are within distance from each otherdistance from each other

Benefit of mapping objectsBenefit of mapping objectsBenefit of mapping objectsBenefit of mapping objects

Accelerate the search time for Accelerate the search time for queries, by employing SAMs queries, by employing SAMs like like RR*-trees and *-trees and zz-ordering-ordering

Help with visualization, Help with visualization, clustering and data-miningclustering and data-mining

Ideal mapping fulfills…Ideal mapping fulfills…Ideal mapping fulfills…Ideal mapping fulfills…

Fast to compute: O(Fast to compute: O(NN) or O() or O(N N loglogN)N), but not O(, but not O(N N 22))

Preserve distances with little Preserve distances with little discrepanciesdiscrepancies

Should be very fast to map a Should be very fast to map a new objectnew object

MDSMDSMDSMDS

Used to discover the Used to discover the underlying (spatial) structure underlying (spatial) structure of a set of data items from of a set of data items from the (dis)similarity informationthe (dis)similarity information

Map objects to a k-Map objects to a k-dimensional space, so as to dimensional space, so as to minimize the minimize the stressstress function function

MDSMDSMDSMDS

Stress functionStress function

it is the average difference it is the average difference between the distance of the between the distance of the "images" and the actual "images" and the actual distance.distance.

Drawbacks of MDSDrawbacks of MDSDrawbacks of MDSDrawbacks of MDS

Requires O(NRequires O(N22) time, which is ) time, which is impractical for large impractical for large databasesdatabases

Fast retrieval is questionable Fast retrieval is questionable as MDS is not prepared for as MDS is not prepared for “query-by-example” “query-by-example” operationoperation

DefinitionsDefinitionsDefinitionsDefinitions

k-d point Pk-d point Pii that corresponds that corresponds to the object Oto the object Oii, will be called , will be called the the ‘image’‘image’ of object O of object Oii. That . That is , Pis , Pii = (x = (xii,1, x,1, xii,2,…, x,2,…, xii,k),k)

k-d space containing ‘images’ k-d space containing ‘images’ will be called will be called target spacetarget space

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

Assumption: a domain expert Assumption: a domain expert has only provided us with a has only provided us with a distance/dis-similarity distance/dis-similarity function function D D (*, *)(*, *)

For instance, the Euclidean For instance, the Euclidean distance between two feature distance between two feature vectors as the distance vectors as the distance function between the function between the corresponding objectscorresponding objects

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

Pretend that objects are Pretend that objects are indeed points in some indeed points in some unknown unknown nn-dimensional -dimensional space, and to try to project space, and to try to project these points on these points on kk mutually mutually orthogonal directionsorthogonal directions

The challenge is to compute The challenge is to compute these projections from the these projections from the distance matrix onlydistance matrix only

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm Project the objects on a carefully Project the objects on a carefully

selected “line”selected “line” Choose OChoose Oaa and O and Obb be “pivot be “pivot

objects”objects”

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

compute the distance of each compute the distance of each point from the pivot points point from the pivot points using only information we using only information we know, i.e., the distances know, i.e., the distances between objectsbetween objects

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

Oa Ob

Oi

Xi

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

By Cosine Law, in any triangle By Cosine Law, in any triangle OOaaOOiiOObb

ddb,ib,i22 = d = da,ia,i

22 + d + da,ba,b22 – 2x – 2xiidda,ba,b

ddi,ji,j the shorthand for the the shorthand for the distance distance DD (O (Oii, O, Ojj))

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

By simple math manipulationBy simple math manipulation

Xi = (dXi = (da,ia,i22 + d + da,ba,b

2 2 - d- db,ib,i22) / 2d) / 2da,ba,b

We can map objects into We can map objects into points on a line, preserving points on a line, preserving some of the distance some of the distance informationinformation

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm Solved 2-d spaceSolved 2-d space Extend to higher dimensionsExtend to higher dimensions

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

Determines the coordinates of Determines the coordinates of the N objects on a new axis, the N objects on a new axis, after each of k recursive callsafter each of k recursive calls

Record the Record the “pivot objects”“pivot objects” in in each recursive call is to each recursive call is to facilitate queriesfacilitate queries

Choose pivots objects by Choose pivots objects by heuristic algorithmheuristic algorithm

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

All steps are linearAll steps are linear Complexity is O(N k)Complexity is O(N k)

ExperimentsExperimentsExperimentsExperiments

Compare FastMap with MDSCompare FastMap with MDS speed and qualityspeed and quality

Illustrate the visualization Illustrate the visualization and clustering abilitiesand clustering abilities real and synthetic datasetsreal and synthetic datasets

Comparison with MDSComparison with MDSComparison with MDSComparison with MDS Response time vs. no. of Response time vs. no. of

database sizedatabase size

Comparison with MDSComparison with MDSComparison with MDSComparison with MDS Response time vs. no. of Response time vs. no. of

dimensions kdimensions k

Comparison with MDSComparison with MDSComparison with MDSComparison with MDS Response time vs. stressResponse time vs. stress

Clustering/visualization properties Clustering/visualization properties of FastMapof FastMapClustering/visualization properties Clustering/visualization properties of FastMapof FastMap

Clustering/visualization properties Clustering/visualization properties of FastMapof FastMapClustering/visualization properties Clustering/visualization properties of FastMapof FastMap

ConclusionConclusionConclusionConclusion

A fast algorithm to map objects A fast algorithm to map objects into points in k-d spaceinto points in k-d space

Accelerate searching by highly Accelerate searching by highly optimized SAMs e.g. R-trees, R*-optimized SAMs e.g. R-trees, R*-trees etc.trees etc.

Application of the algorithm to Application of the algorithm to multimedia database, data-multimedia database, data-mining, clustering and document mining, clustering and document retrieval etc.retrieval etc.

ReferenceReferenceReferenceReference Christos Faloutsos, King-Ip (David) LinChristos Faloutsos, King-Ip (David) Lin

FastMap: A Fast Algorithm for Indexing, DaFastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional ta-Mining and Visualization of Traditional and Multimedia Datasetsand Multimedia Datasets

Joseph B. Kruskal, Myron WishJoseph B. Kruskal, Myron WishMultidimensional scalingMultidimensional scaling