75
Content Based Image Retrieval : An Introduction 9/22/2014 Prof. M. K. Kundu, MIU,ISI Malay K. Kundu Machine Intelligence Unit Indian Statistical Institute Kolkata http://www.isical.ac.in/~malay

Content Based Image Retrieval : An Introductionmiune/LECTURES/Sikkim_CBIR_2014F_MKK.pdf · Content Based Image Retrieval (CBIR) 9/22/2014 Prof. M. K. Kundu, MIU,ISI 26 Image Database

  • Upload
    dangnhi

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Content Based Image Retrieval : An Introduction

9/22/2014 Prof. M. K. Kundu, MIU,ISI

Malay K. KunduMachine Intelligence UnitIndian Statistical Institute

Kolkata http://www.isical.ac.in/~malay

Introduction

• Due to revolutionary growth of digital imaging & internet  technology, development of efficient & intelligent scheme for image retrieval from a large image collection became an important research issue. Two important approaches in this regards are,  

• Content Based Image Retrieval(CBIR).• Content Based Image Retrieval with relevance feedback.

9/22/2014 Prof. M. K. Kundu, MIU,ISI 2

The Problem and Motivation

• There are now billions of images on the web and in collections such as Flickr, Google etc.

• Suppose I want to find pictures of monkeys with existing model for search.

9/22/2014 Prof. M. K. Kundu, MIU,ISI 3

Google Image Search -- monkey

9/22/2014 Prof. M. K. Kundu, MIU,ISI 4

Metadata based retrieval systems

• Metadata based retrieval systems– text, click-rates, etc.– Google Images– Clearly not sufficient

• what if computers understood images? – Content based image

retrieval (early 90’s)– search based on

the image content

9/22/2014 Prof. M. K. Kundu, MIU,ISI 5

Top 12 retrieval results for the query ‘Mountain’

Content Based Image Retrieval(CBIR)

• The process of retrieval of relevant images from an image database(or distributed databases) on the basis of primitive (e.g. color, texture, shape etc.) or semantic image features  extracted automatically is known as Content Based Image Retrieval.  

9/22/2014 Prof. M. K. Kundu, MIU,ISI 6

Problems of Information extraction in CBIR

• It differs generically from conventional information retrieval/Data Mining (DM) due to the following reasons,Unstructured nature of image databasesContains pixel intensities with no inherent meaningAny kind of reasoning about image content is possible only after extraction of some useful image information(e.g, presence of primitive or semantic feature)

9/22/2014 Prof. M. K. Kundu, MIU,ISI 7

What is Unstructured Data?

• Any data without a well‐defined model for information access

• Examples,– Image, Video, Sound.– Word documents– E‐mails

• Examples of what is structured– Database tables– Objects– XML tags

9/22/2014 Prof. M. K. Kundu, MIU,ISI 8

Unstructured data• Unstructured data consists of text, audio, images etc.

• Unstructured data contains significant scientific and commercial information

• Technologies and tools are being developed for efficient extraction of information in unstructured data

• Problems of information retrieval from  unstructured data are not fully understood yet. A paradigm shift is needed

9/22/2014 Prof. M. K. Kundu, MIU,ISI 9

UDM Increases Informational Content

9/22/2014 Prof. M. K. Kundu, MIU,ISI 10

Structured Data(10-40%)

Unstructured Data(60-90%)+

Representation of Image

9/22/2014 Prof. M. K. Kundu, MIU,ISI 11

Two important questions for content-based image retrieval:• How are images represented => features

•How are image representations compared =>distance/similarity measures

CBIR: Visual Signature

• Image is Indexed by its visual contentVisual content is described by

• Low level features (Color, texture, shape, layout etc.).

• High level descriptors(Face, iconic pattern, context, spatial reasoning, semantic information etc.)

9/22/2014 Prof. M. K. Kundu, MIU,ISI 12

Animal Image Database

9/22/2014 Prof. M. K. Kundu, MIU,ISI 13

Example1:

Lion              Rabbit

9/22/2014 Prof. M. K. Kundu, MIU,ISI 14

Lion  Color? Yes Rabbit

9/22/2014 Prof. M. K. Kundu, MIU,ISI 15

Horse  Color? No Rabbit

9/22/2014 Prof. M. K. Kundu, MIU,ISI 16

Horse Shape? Yes Rabbit

9/22/2014 Prof. M. K. Kundu, MIU,ISI 17

Horse  Shape? No Zebra

9/22/2014 Prof. M. K. Kundu, MIU,ISI 18

Horse Texture? Yes Zebra

9/22/2014 Prof. M. K. Kundu, MIU,ISI 19

Lion Color? No Rabbit

9/22/2014 Prof. M. K. Kundu, MIU,ISI 20

Lion Shape? Yes Rabbit

9/22/2014 Prof. M. K. Kundu, MIU,ISI 21

Lion  Color? No Horse

9/22/2014 Prof. M. K. Kundu, MIU,ISI 22

Lion   Shape? Yes Horse

9/22/2014 Prof. M. K. Kundu, MIU,ISI 23

Horse  Color? Different Horse

9/22/2014 Prof. M. K. Kundu, MIU,ISI 24

Deer  Texture? Different Deer

9/22/2014 Prof. M. K. Kundu, MIU,ISI 25

Content Based Image Retrieval (CBIR)

9/22/2014 Prof. M. K. Kundu, MIU,ISI 26

Image Database

Result

Query Image

User

Content based image retrieval -1• Query by Visual Example(QBVE) 

– user provides query image

– system extracts image features (texture, color, shape)

– returns nearest neighbors using suitable similarity measure

9/22/2014 Prof. M. K. Kundu, MIU,ISI27

Texturesimilarity

Colorsimilarity

Shapesimilarity

The Semantic Gap• First generation CBIR systems were based on color and texture; however these do not capture what  users really care about : conceptual or semantic categories.

• Perception studies suggest that the most important cue to visual categorization is shape. This was ignored in earlier work (because it was hard!)

9/22/2014 Prof. M. K. Kundu, MIU,ISI 28

Content based image retrieval • Semantic Retrieval (SR)

– User provided a query text (keywords)– find images that contains the associated semantic concept.

– around the year 2000,– model semantic classes, learn to annotate images– Provides higher level of abstraction, and supports natural language

queries9/22/2014 Prof. M. K. Kundu, MIU,ISI 29

query: “people, beach”

Text based Semantic Retrieval (SR)

• Problem of lexical ambiguity– multiple meaning of the same word

• Anchor - TV anchor or for Ship?

• Bank - Financial Institution or River bank?

• Multiple semantic interpretations of an image• Boating or Fishing or People?

• Limited by Vocabulary size – What if the system was not trained for

‘Fishing’ – In other words, it is outside the space of

trained semantic concepts

9/22/2014 Prof. M. K. Kundu, MIU,ISI30

Lake? Fishing? Boating? People?

Fishing! what if not in the vocabulary?

Summary• SR Higher level of abstraction 

– Better generalization inside the space of trained semantic concepts

– But problem of 

• Lexical ambiguity 

• Multiple semantic interpretations

• Vocabulary size 

• QBVE is unrestricted by language. – Better Generalization outside the space of trained 

semantic concepts

• a query image of ‘Fishing’ would retrieve visually similar images.

– But weakly correlated with human notion of similarity

9/22/2014 Prof. M. K. Kundu, MIU,ISI 31

Both have visually dissimilar sky

Fishing! what if not in the vocabulary?

Lake? Fishing? Boating? People?

The two systems in many respects are complementary!

What should be the Research Focus ?

• Automatically generate annotations corresponding to object labels or activities in an image

• If possible, combine these with other metadata such as text.

9/22/2014 Prof. M. K. Kundu, MIU,ISI 32

CBIR

• Content‐Based Image Retrieval

– Query‐by‐Example(QBE)

– Query‐by‐Feature(QBF)– Feature Vector

9/22/2014 Prof. M. K. Kundu, MIU,ISI 33

CBIR Architecture

9/22/2014 Prof. M. K. Kundu, MIU,ISI 34

FeatureExtraction

User Interface

SimilarityMetric

DatabaseImage data

Image Representation

Imagedata

ImageBrowsing

DatabaseCreation

QueryComparison

query

Example: CBIR of Butterflies

• To allow non‐expert users to find out some possible species of the butterflies they saw by the appearance of the butterflies

• The appearance:– Color, Texture, Shape

9/22/2014 Prof. M. K. Kundu, MIU,ISI 35

Problems

• How can you describe a butterfly?

• How can you communicate with a machine?

9/22/2014 Prof. M. K. Kundu, MIU,ISI 36

Problems

• Different users have different perception.• Users may not remember the appearance of

the butterfly clearly.• Users usually do not have enough

knowledge to describe the butterflies like experts.

• Users usually do not have patience to browse too many query results.

9/22/2014 Prof. M. K. Kundu, MIU,ISI 37

Possible Solutions• An user‐driven interactive query process: QBF/QBE query process– (Query By Features and Query By Example)

• Fuzzy feature description for each butterfly

• An “What You See Is What You Get” query interface

• A representative set for a collection of butterflies

9/22/2014 Prof. M. K. Kundu, MIU,ISI 38

QBF/QBE query process

• QBF query:– A QBF query is to choose some features of butterflies and expect that the system returns all butterflies with those features.

– Features of butterflies:• Dominant color, texture pattern, shape.

• QBE query:– A QBE query is to point an image and expect that the system returns all butterflies similar to that.

9/22/2014 Prof. M. K. Kundu, MIU,ISI 39

Feature Description (1)

• Feature Description for a butterfly:– Like metadata which describe the

appearance of this butterfly.– This makes QBF queries possible.– Feature Description consists of some

feature descriptors.• Feature descriptor:

– A ( “feature value” , “match level” ) pair.

9/22/2014 Prof. M. K. Kundu, MIU,ISI 40

Feature Description (coarse)

9/22/2014 Prof. M. K. Kundu, MIU,ISI 41

FeatureType Feature Value Degree of

Match mixed_with_black_and_orange 52/57 orange_yellow 12/42Color orange_red 3/38 many_spots 58/62 fore_half_different_color 27/33 horizontal_bands 41/60Texture

edge_with_different_color 10/74Shape wave 98/110

Common Color Features• Most commonly used color features are• Pixel values in RGB or alternative space like HSV, Y Cr,Cb etc. 

with uniform quantization.• Color Histogram• Joint probability density of intensities of the three channels

Histogram intersection as similarity measure. • Color Moments• Using moments (low order moments like mean, variance, 

skewness)  for characterizing color distribution with Euclidian distance measure.

• Invariant descriptors.Invariance to illumination changes

9/22/2014 Prof. M. K. Kundu, MIU,ISI 42

What color is the apple ? We are so visual !!!!

9/22/2014 Prof. M. K. Kundu, MIU,ISI 43

I’d say it isBright Red

I think it is“Crimson”

It isRed!

I really couldn’t tell you(I am color blind)

RGB Color Space• Hardware Oriented Model:

RGB Color Space: 3 values to represent a color.

Red Green Blue

9/22/2014 Prof. M. K. Kundu, MIU,ISI 44

HSV Color Space

• HSV – Hue Saturation Value– close to human perception

– 3 values to represent a color.

9/22/2014 Prof. M. K. Kundu, MIU,ISI 45

SaturationHue

Value

Red (0o)

Yellow (60o)Green (120o)Cyan (180o)

Blue (240o) Magenta (300o)

Black

White

Y Cb Cr Color Space• Y is the Luminance Cb and Cr are the

Chrominance Values of this Color Space.

• Decouples intensity and color information

• A monochrome color representation has only the Y value.

• Very close to Perceptual Model

9/22/2014 Prof. M. K. Kundu, MIU,ISI 46

CIE-LAB Uniform Color Space

• Uniform Color Spaces: CIE Lab

• CIE solved this problem in 1976 with the development of the Lab color space.

9/22/2014 Prof. M. K. Kundu, MIU,ISI 47

CBIR: Major Challenges

• Major developmental Issues• Efficient feature extraction and similarity measure.

• Multi‐dimensional indexing for multi‐level image queries

• Relevance feedback network for automatic extraction high level knowledge.

• Efficient and adaptive retrieval system design. 

9/22/2014 Prof. M. K. Kundu, MIU,ISI 48

Distance Measures• Heuristic

– Minkowski‐form– Weighted‐Mean‐Variance (WMV)

• Nonparametric test statistics– χ 2  (Chi Square)– Kolmogorov‐Smirnov (KS)– Cramer/von Mises (CvM)

• Information‐theory divergences– Kullback‐Liebler (KL)– Jeffrey‐divergence (JD)

• Ground distance measures– Histogram intersection– Quadratic form (QF)– Earth Movers Distance (EMD)

9/22/2014 Prof. M. K. Kundu, MIU,ISI 49

Texture

Texture refers to visual pattern that have properties of human perceptionhomogeneity, contrast,            roughness, coarseness,directionality or sense of orientation,regularity (periodic or quasi‐periodic distribution)  line‐likeness 

9/22/2014 Prof. M. K. Kundu, MIU,ISI 50

Texture

• Innate property of virtually all natural objects (like clouds, trees, hair, woods, stones etc.) and scenes. 

9/22/2014 Prof. M. K. Kundu, MIU,ISI 51

Common Texture features:

Co‐occurrence Matrix• Explore spatial dependence of gray level or color.• Frequency counts in a 2‐D array for a pair of pixels with 

different direction and distance.• Statistical features extracted from the matrix used as 

descriptor.

Wavelet based multi‐scale featuresStatistical features extracted from sub‐bands.

Gabor Filter based technique.

9/22/2014 Prof. M. K. Kundu, MIU,ISI 52

MPEG-7 CBIR Texture Feature

• In different CBIR systems performance greatly rely on the notion of texture, this may differ. For standardization MPEG‐7 has recommended different texture feature models,

Edge Histogram Descriptor(EHD)Homogeneous texture Descriptor(HTD)Texture Browsing Descriptor(TBD)TEXTURETEXTURE

– HOMOGENEOUS TEXTURE

9/22/2014 Prof. M. K. Kundu, MIU,ISI 53

Shape

9/22/2014 Prof. M. K. Kundu, MIU,ISI 54

•Shape of a visual object is one of the most powerful signature for human perception mechanism.

•It remains invariant under different geometric transformations (rotation, scaling & translation)

Two categories of methods

•Boundary based(using only outer boundary):Fourier Descriptor, Curvature function and Curvature scale space

•Region Based(using shape of entire region)

Region Based shape Signatures

• Binary and gray level moments

• Zernike Moments

• Grid based descriptor

• Object bounding box.

• Segmentation of different regions(may not be a meaningful objects)

9/22/2014 Prof. M. K. Kundu, MIU,ISI 55

Results: Based on Shape With Rotational invariance feature.

9/22/2014 Prof. M. K. Kundu, MIU,ISI 56

M. Banerjee and M. K. Kundu`` Edge based features for content based image retrieval'' , Pattern Recognition, Vol..36, No. 11, pp. 2649-2661, 2003

Relevance Feedback

9/22/2014 Prof. M. K. Kundu, MIU,ISI 57

The key idea of interactive relevance feedback technique  is that human perception subjectivity is incorporated into the retrieval process, 

providing users with the opportunity to evaluate the retrieval results.

Queries or similarity measures are automatically refined on the basis of these evaluations

Relevance Feedback

9/22/2014 Prof. M. K. Kundu, MIU,ISI 58

Relevance feedback, is a supervised active learning technique used to improve the effectiveness of information systems.

It uses positive and negative examples from the user to improve system performance. For a given query, the system first retrieves a list of ranked images according to a predefined similarity metrics.

Then, the user marks the retrieved images as relevant to the query(positive examples) or irrelevant (negative examples).

The system will refine the query based on the feedback, retrieve a new list of images, and present them to the user. The key issue in relevance feedback is how to use positive and negative examples to refine the query and/or to adjust the similarity measure.

Performance Measure

Precision is defined as:Number of relevant images retrievedTotal  Number of images retrieved

Recall is defined as:Number of relevant images retrieved .     

Number of relevant images in database* Recall‐Precision graph is another important measure for 

performance.

9/22/2014 Prof. M. K. Kundu, MIU,ISI 59

Interactive Image Retrieval with M- band Wavelet Features

9/22/2014 Prof. M. K. Kundu, MIU,ISI 60

Aims & MOTIVATIONUsing an effective representation (features) of image, which supports human visual system.Improving the retrieval results, utilizing a fuzzy relevance feedback mechanism (RFM) with automatic feature’s weight updation.Incorporating Earth Mover’s Distance (EMD) in the RFM.

PROPOSED CBIR SYSTEM

SEARCH ENGINE (RETRIEVAL BLOCK)

M-BAND WAVELET FEATURE

EXTRACTION

IMAGE DATABASE

USER INTERACTION

QUERY IMAGE RETRIEVAL RESULTS

ERROR IN RELEVANCE

FUZZY RELEVANCE FEEDBACK BLOCK

FUZZY FEATURE

EVALUATION

EMD WEIGHT DISTANCE

UPDATE BASED ON RELEVANCE

Proposed AlgorithmM-band wavelet Features are computed of an input image.

EMD similarity measure are used on the M-band wavelet features distances, retrieval is made.From the first stage of retrieval the user marks the relevant and irrelevant set.

The intraset and interset ambiguity is computed.

The feature evaluation index (FEI) for each component of each plane (Y-CB-Cr) is computed from the ambiguity measures.

The weight of each component is updated automatically according to the FEI as follows :

The next set of iteration is started with the updated features.

FEIw 2

qq=

ImplementationThe experiments were performed on Dell Precision T7400 with 4 GB RAM. The performance of the image retrieval system is tested upon two databases

A) SIMPLIcity:1000 images in 10 categories (People, Beach, Buildings, Bus, Dinosaur, Elephant, Flower, Horses, Mountains and Food ).

B) Corel 10000 miscellaneous database:9908 images belonging to 79 semantic categories.

Results are compared with the Color Structure Descriptor (CSD) and Edge Histogram Descriptor (EHD) included in MPEG-7 standard. MPEG-7 Reference Software provided by ISO is used for the computation of CSD and EHD.

Euclidean similarity measure are used with CSD and EHD features.EMD similarity measure with M‐band wavelet features for enhancement of accuracy

EXPERIMENTAL RESULTSM-BAND +ED ON SIMPLICITY

DATABASE (FIRST PASS) (5/20)

EHD +ED (FIRST PASS) (1/20)

CSD+ED (FIRST PASS) (5/20)

CONTINUED…1st ITERATION ON M-BAND +ED (6/20)

1st ITERATION ON EHD+ED (2/20)

1st ITERATION ON CSD+ED (6/20)

CONTINUED…

CSD+ED (FIRST PASS) (5/20)

EHD +ED (FIRST PASS) (1/20)

M-BAND + EMD ON SIMPLICITY DATABASE (FIRST PASS) (6/20)

CONTINUED…

1st ITERATION EHD+ED (2/20)

1st ITERATION CSD+ED (6/20)

1st ITERATION ON M-BAND +EMD (8/20) RANKING

IMPROVED

CONTINUED…M-BAND +EMD ON COREL DATABASE (FIRST

PASS) (6/20)

EHD + ED(FIRST PASS) (1/20)

CSD +ED(FIRST PASS)(2/20)

CONTINUED…1st ITERATION ON M-BAND +EMD (7/20)

1st ITERATION ON CSD +ED(2/20)

1st ITERATION ON EHD +ED (1/20)

CONTINUED…2nd ITERATION ON M-BAND +EMD (8/20)

2nd ITERATION ON EHD +ED (1/20)

2nd ITERATION ON CSD +ED(2/20)

GRAPHICAL INTERPRETATION

AVERAGE PRECISION OF CSD,EHD & M-BAND

Vs NO. OF IMAGES Displayed ON SIMPLICITY DATABASE

AVERAGE PRECISION Vs

NO.OF ITERATION

Average precision is computed on 20 display images

References:

1. Beyer, K. et al., Bottom‐up Computation of Sparse and Iceberg CUBE’s, SIGMOD’99.

2. Barnet, J. Computational  Methods for a Mathematical Theory of Evidence, Proc. 7th Int. Jt. Conf. On artificial Intelligence, Canada, 1982, pp. 868‐875.

3. Itten, J., Kunst der Farbe. Ravensburg, Otto Maier‐Verlag, 1961.4. Chang, S. et al., Semantic Visual template: linking Visual features to 

semantic, IEEE Int Conf. On Image Processing(ICIP’98) Chicago, 1998, pp. 531‐535.

5. Oliva, A. et al., Real‐world Scene Characterization by a Self‐Organizing Neural Network, Perception, Supp 26,19, 1997.

6. Zaiane, O. et al., Mining Multi‐Media Data, CASCON’98: Meeting of Minds, pp.83‐96, Canada, 1998. 

7. http://www.isical.ac.in/~scc/DmDw_web/ISI_DmDw_Web.html

9/22/2014 Prof. M. K. Kundu, MIU,ISI 73

Conclusions• CBIR in general at present, is still very much a research topic. The technology is exciting but yet to achieve a good degree of maturity. The available commercial systems have limited capability for restricted domain.

• Medical Image retrieval research is a very important use of CBIR, considering the generation of huge amount of visual data for diagnostic purpose. But it still remains an academic exercise due to lack of integration between visual and clinical information

9/22/2014 Prof. M. K. Kundu, MIU,ISI 74

9/22/2014 Prof. M. K. Kundu, MIU,ISI 75

Thank You