Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash...

Min-Hashing andGeometric min-Hashing

Ondřej Chum, Michal Perdoch, and Jiří Matas

Center for Machine Perception

Czech Technical University

Prague

Outline

1. Looking for representation of images that:• is compact (useful for very large datasets) • is fast (linear in the size of the output)• supports local (image and object) recognition/retrieval• is accurate (very low false positive rate)

2. min-Hash is a powerful representation, but we show thatGeometric min-Hash is significantly more powerful

3. Applications: • Large database clustering• Discovery of (even small) objects

Introduction to min-Hash for Images

min-Hash originates from the text retrieval community, originally used for detection of near duplicate documents

Image Representation: a Set of Words

Features Local Appearance

SIFT Description [Lowe’04]

Vector quantization

Visual vocabulary

... …0

Bag of words

Set of words

Visualwords

Image representation

min-Hash

This probability will be called “image similarity ” and denoted

min-Hash is a locality sensitive hashing (LSH) function m that selects an element (visual word) m(Ii) from each set Ii of visual words detected in image i so that

145263

min-Hash

A C D EB F

Vocabulary

A CB C DB

F F453621 A B546123 C C

Set I1 Set I2

Random orderings min-Hash

sim (I1, I2) = 1/2

vocabulary of 1 M visual words

~ 2000 wordsper image

fixed sizeimage

representation~100

Estimated similarity of I1 and I2 from 3 min-Hashes = 2/3

k hash tables

a sketch = s-tuple of min-Hashes

Sketch collisionI1 I2

... P{collision} = sim(I1,I2)s

P{retrieval} = 1 – (1 - sim(I1, I2)s)k

collision:all s min-Hashes must agree

min-Hash Retrieval

retrieval: 1. generate k sketches2. at least one of k

sketches must collide

Probability of Retrieving an Image Pair

similarity (set overlap)

Near duplicate imagesImages of the same object

s = 3, k = 512

Unrelated images

Near Duplicate Images

10 /34

11 /34

12 /34

Scalable Near Duplicate Image Detection

• Images perceptually (almost) identical but not identical (noise, compression level, small motion, small occlusion)

• Similar images of the same object / scene

• Large databases

• Fast – linear in the number of duplicates

• Store small constant amount of data per image

13 /34

Representation, Distance and Search

• Naïve method (space and time infeasible):– Representation: the whole image

– Similarity: sophisticated visual similarity measure

– Search: O(N^2) on N images

• Goal:– Compact representation – small constant number of bytes (that captures

visual content addressed by near duplicate definition)

– Similarity that allows for fast retrieval of similar images (that measures similarity according to near duplicate definition)

– Search: linear in the number of near duplicates – needs to be indexing

14 /34

Word Weighting for min-Hash

all words Xw have the same chance to be a min-Hash

For hash function (set overlap similarity)

For hash function

the probability of Xw being a min-Hash is proportional to dw

A Q VE RJC ZA U B: YdA dC dE dVdJ dQ dY dZdR

15 /34

Histogram Intersection Using min-HashIdea: represent a histogram as a set, use min-Hash set machinery

A1 C1B1

C1 D1B1

A1 C1 D1B1A2 B2 C2 C3min-Hash vocabulary:

Bag of words A / set A’ Bag of words B / set B’

A1 C1 D1B1A2 B2 C2 C3A’ U B’:

Set overlap of A’ of B’ is a histogram intersection of A and B

A C DBVisual words:

tA = (2,1,3,0) tB = (0,2,3,1)

16 /34

Similarity Measures for min-Hash

Set representation Bag of words representation

17 /34

Min-Hash on TrecVid

Query Retrieved images with decreasing similarity

• DoG features• vocabulary of 64,635 visual words• 192 min-Hashes, 3 min-Hashes per a sketch, 64 sketches• similarity threshold 35%

Set overlap similarity measure

18 /34

Query Examples

Query image:

ResultsSet overlap, weighted set overlap, weighted histogram intersection

19 /26

Geometric min-Hash (GmH)

Can geometry help us in finding sketches of min-Hashes with much higher repeatability than

random s-tuples?

20 /26

The Idea

• For a sketch collision all s min-Hashes in the sketch must agree

• In the construction of a sketch, we can assume that the first min-Hash is matching

• If the assumption is violated, no harm is done, the sketch would not collide

We show how to exploit the assumption of matching m1(I) to design the rest of the sketch (not independent min-Hashes anymore). The new procedure – Geometric min-Hash - is superior to the standard min-Hash.

21 /26

Vocabulary Size and Set Representation

Small vocabulary (1k visual words) Large vocabulary (1M visual words)

On a 100k dataset of images, 95% of features have a unique visual word in an image

For large vocabularies selecting a visual word from an image is

(almost) equivalent to selecting a feature (with location and scale)

22 /26

Repeatability of Feature Sets:Translation and Occlusion

Do not group features that are far apart into sketches. Such sketches have lower repeatability (as groups, not individually)

under translation or occlusion

23 /26

Repeatability of Feature Sets: Scale Change

Do not group features of very different scale into sketches. Such sketches have lower repeatability

under scale change.

24 /26

Geometric min-Hash algorithm

1. Keep features with unique visual word in the image2. Obtain the “central feature” by min-Hash3. Select scale and spatial neighbourhood of the central feature4. Select secondary min-Hash(es) from the neighbourhood5. Relative pose of the sketch features is a geometric invariant (as in

geometric hashing)

Sketch of GmH: s-tuple of visual words + geometric invariant

25 /26

Geometric vs. Standard min-Hash

• Lower false positives– Additional geometric invariant (part of

the hash key or verification)

– Lower probability random sketch collisions (next slide)

GmH: 37 collisions mH: 4 collisions

Sketches s=2, k=5000

GmH: 0 collisions mH: 4 collisions

Sketches s=2, k=5000

• Higher true positives– View point change

– Severe occlusion

– Scale change

– Object on a different background

• Faster spatial verification– Sketch collision defines geometric

transformation

26 /26

Overlap of Random Sets

False positive = sketch collision of two random images

The probability of two random sets I1and I2having a common min-Hash (i.e. the average overlap of two random sets)

where w is the size of the vocabulary

The smaller the sets, the smaller probability of random collision

• Min-Hash: features in the whole image

• Geometric min-Hash: only small subset of the image

27 /26

Experiments

28 /26

Experiment 1: Clustering on Oxford 100k DB

Hertford

Magdalen

Pitt Rivers

Radcliffe Camera

All Soul's

Ashmolean

Balliol

Bodleian

Christ Church

Cornmarket

100 000 Images downloaded from FLICKRIncludes 11 Oxford Landmarks with manually labeled ground truth

Randomized clusteringCluster hypotheses by hashing, completion by retrieval

29 /26

Clustering Results on Oxford 100k

16 min* 33 min*

<s = 2, k = 64~550 bytes per image

* The time does not include feature detection, SIFT computation, vector quantization

s = 3, k = 256~1600 bytes per image

30 /26

Experiment 2: Object Discovery

Verification by co-segmentationcritical for small objects

[Cech, Matas, Perdoch CVPR 08], code available on WWW [Ferrari, Tuytelaars,Van Gool, ECCV 2004]

Geometric min-Hashsketch collisions = 2, k = 256

31 /26

Small Object Discovery

Other instances of the discovered object by (sub)image retrieval

32 /26

Faces are Small Objects

33 /26

Discovery of the Face Category:the Largest Cluster in Oxford 100k

34 /26

The Importance of the Co-segmentationin Object Discovery

Seed image pair

Result by query expansion using features inside the segmentation

Using the whole bounding box

35 /26

Visual Content Hyperlinks

36 /26

Conclusions

• Novel representation for hashing was introduced– Significantly improves the recall (reduces false negatives)

– Reduces the number of false positives

– Reduces memory footprint of image representation

– Connects min-Hash and geometric Hashing

– The first efficient combination of appearance and geometry in large scale indexing

• Applications– Clustering of spatially related images

– Discovery of small objects

37 /26

Thank you!

Example of discovered object

(cut outs)(cut outs)

Min-Hashing and Geometric min-Hashing€¦ · Introduction to min -Hash for Images min-Hash...

Documents

CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick

Hashing part one - people.cs.bris.ac.ukpeople.cs.bris.ac.uk/~clifford/coms31900-2020/slides/universal-hash… · Hashing part one Chaining, true randomness and universal hashing

Hashing Lesson Plan - 8. Contents Evocation Objective Introduction General idea Hash Tables Types of Hashing Hash function Hashing methods

Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Hash-Based Indexing Torsten Grust Chapter 6...Hash-Based Indexing Torsten Grust Hash-Based Indexing Static Hashing Hash Functions Extendible Hashing Search Insertion Procedures Linear

Hashing in C - cse.cuhk.edu.hk fileContents •Hash function •Collision resolutions –Separate Chaining (Open hashing) –Open addressing (Closed Hashing) •Linear probing •Quadratic

Tabelas Hash-com Duplo Hashing

Hashing. 2 argomenti Hashing Tabelle hash Funzioni hash e metodi per generarle Inserimento e risoluzione delle collisioni Eliminazione Funzioni hash per

1 Hashing Techniques: Implementation Implementing Hash Functions Implementing Hash Tables Implementing Chained Hash Tables Implementing Open Hash Tables

ADSA: Hashing/10 1 241-423 Advanced Data Structures and Algorithms Objectives – –introduce hashing, hash functions, hash tables, collisions, linear probing,

Hash-and-Sign with Weak Hashing Made Secure

Hashing & Hash Tables

Chapter 20 Hashing / Hash tables 1CSCI 3333 Data Structures

Hash Tables - cglab.cacglab.ca/~morin/teaching/5408/notes/hashing.pdf · 1.2 Hash Tables for Integer Keys::::: 1-2 Hashing by Division Hashing by Multiplication Universal Hashing

Choosing Best Hashing Strategies and Hash Functions Best...and states the need of using hashing i.e. for faster data retrieval. The report contains ... hashing technique and hash function

Hashing - Introduction - McMaster Universitycarette/CS1MD3/2005/slides/hashing23.pdf · Hashing - Introduction ... Hashing. Universal Hashing ... idea: extendible hash tables? 21

Lab t 8 (Hashing ) - CISO Academyx Hash file = hash es .txt (having set of 10 hashes per hashing algorithm) x Hashing algorithms = MD5, SHA1, SHA256, SHA512 and RIPEMD Lab Objective

Cryptographic hash functions - University Of Maryland · Applications of hash functions •Virus fingerprinting •Deduplication •Password hashing ... •Uses hash function to index

Lecture XI HASHINGyap/wiki/pm/uploads/Algo/l11_BASE.pdf · basic hashing framework, including universal hashing, perfect hashing, extendible hashing, and cuckoo hashing. Hash is one

Chapter 5: Hashing Hash Table ADT Hash Functions CS 340 Page 89 Collision Resolution Rehashing