Large-Scale Content-Based Image Retrieval

Large-Scale Content-Based Image RetrievalProject PresentationCMPT 880: Large Scale Multimedia Systems and Cloud Computing

Under supervision of Dr. Mohamed HefeedaBy: Ahmed Abdelsadek ([email protected])

Outlines •Introduction•Project Scope•Work Flow•Image Features •Indexing and Retrieval•Matching•Evaluation•Conclusion

Introduction•Current image search engines rely heavily

on text to retrieve images▫User provides keywords, and images

having that keyword in the filename or in nearby html are candidates for retrieval.

•In this project we are willing to try content-based retrieval techniques where the query is an image.

Project Scope•Similarity using local features.•Extracting features from the reference

images.•Index these features in efficient data

structure in a scalable large scale environment

•Process query images.•Search and Match.

•This project is NOT▫Recognition, Classification, Categorization

Work Flow

Generate Feature Points

Generate Feature Points

Direct to KD-Tree Index Bin

Build KD-Tree Index Bins

Distributed Storage

Searching for Nearest Neighbors

Matching Objects

Sorting and Reporting Results

QueryMultimedia Object

ReferenceMultimedia Object

Results

Matching

BuildingQuerying

SaveLoad

Image Features• Using SIFT features (Scale-invariant feature transform).

▫ A SIFT feature is a selected image region (also called keypoint) with an associated descriptor.

▫ A SIFT descriptor is a histogram of the image gradients surrounding a keypoint.

▫ Using PCA for Dimension Reduction

KD-Tree•Using KD-Trees

▫Each tree level represent a dimension of a feature

▫Searching the index for the K-nearest neighbours

Logical View

ReferenceFeatures

Points

QueryFeatures

Points

Multimedia Objects Matcher

Similar Features

Similar Objects

Results

Physical ViewDirecting

Bui

ldin

g

Block 1 Block 2 Block 3 Block n

Block 1

Block 2

Block 3

Block n

Physical FilesOn HDFS

B1 vs B1

B2 vs B2

B3 vs B3

Bn vs Bn

Computing DistancesTasks

ReducePhase

MapPhase

MapPhase

DistributedCache

QueriesR

efer

ence

sKD-Tree

Matching•For each query we extract the features

and then search the index for the K-NN features.

•For each query feature, each neighbouring feature of it votes to certain image with a score of its rank.

•The maximum 10 images for the voting array are reported as the most similar images.

Evaluation•Core KNN

▫Experiments on local machine.▫Our results vs brute force

•Image retrieval▫CalTech, and TRICVID datasets▫On amazon AWS cloud.▫We 8 machines.

Dual core 4 GB ram

Precision of KNN

Scanned Bins Size

Affect of Data Size

Image Recall @ K

First Correct @ K

Implementation Details•The system is implemented in Java•We use Hadoop 1.0.3 •We run cloud experiments on AWS

services▫S3▫EMR

•We use some open source libraries▫For images preprocessing we use :

FFMPEG▫For extracting SIFT features we use :

VLFeat

Conclusion•We implement a full pipeline for image

retrieval problem.▫The framework can easily support different

types of features, different indexing methods.

•We show how we can build a big cloud system from small components.

Conclusion•Intersection with my research

•Contributions▫Feature Selection and Extraction▫Implement Dimension Reduction▫Design and Implement Map/Reduce Index▫Implement Image Matching and Ranking

Questions ?

Thank you !

Documents

Large-Scale Content-Based Image Retrieval