Adaptive Binary Quantization for Fast Nearest Neighbor...

IBM Research

Zhujin Li1, Xianglong Liu1*, Junjie Wu1, and Hao Su2

1Beihang University, Beijing, China2Stanford University, Stanford, CA, USA

Adaptive Binary Quantization for Fast Nearest Neighbor Search

http://www.nlsde.buaa.edu.cn/~xlliu

Introduction

– Nearest Neighbor Search

– Motivation

Adaptive Binary Quantization

– Formulation

– Optimization

Experiments

Conclusion

Outline

Definition

Given a database 𝑃 = 𝑝𝑖 𝑖=1…𝑛 and a query 𝑞, the nearest neighbor of 𝑞:

𝑝∗ ∈ 𝑃 , such that 𝑑 𝑞, 𝑝∗ ≤ 𝑑(𝑞, 𝑝)

Solutions

– linear scan

• time and memory consuming

– tree-based: KD-tree, VP-tree, etc.

• divide and conquer

• degenerate to linear scan for high dimensional data

Introduction: Nearest Neighbor Search (1)

Hash based nearest neighbor search

– Locality sensitive hashing [Indyk and Motwani, 1998]: close points in the original space have similar hash codes

X x1 x2 x3 x4 x5

h1 0 1 1 0 1

h2 1 0 1 0 1

… … … … … …

hk … … … … …

010… 100… 111… 001… 110…x2

ℎ 𝑥 = 𝑠𝑔𝑛(𝑤𝑇𝑥 + 𝑏)

Hash based nearest neighbor search

– Compressed storage: binary codes

– Efficient computations: hash table lookup or Hamming distance ranking based on binary operations

… …

Hashing Hash Table

Bucket Indexed Image

Linear projection based quantization

Introduction: State-of-the-art Hashing Solutions (1)

ITQ: Rotation, CVPR’12

LSH: random PCAH: PCA

AGH: Kernel, ICML’11

Try to capture the data distribution

Prototype based quantization

– Step 1: find a number of prototypes to represent the data (like clustering)

– Step 2: assign a binary code to the prototype

Introduction: State-of-the-art Hashing Solutions (2)

SPH, CVPR’12

each prototype generate a bitKMH, CVPR’13

each prototype generate multiple bits

∈ {𝟎, 𝟏} with 2 codes ∈ 𝟎, 𝟏 𝒎 with 2𝒎 codes

achieve encouraging performance

Problems

– make use of the complete binary code set (geometrically forms a hypercube), which can hardly characterize the real-world data distribution

Motivation

– a better coding solution only relying on a small subset of binary codes (instead of the complete set) can largely reduce the quantization loss.

Introduction: Motivation

编码空间

𝑑𝑜 𝑥, 𝑦 ≅ 𝑑ℎ(𝑐𝑥 , 𝑐𝑦)

Using the complete binary code set

Using a small subset of binary codes

𝑑𝑜 𝑥, 𝑦 ≅ 𝑑ℎ(𝑐𝑥 , 𝑐𝑦)

Introduction

– Nearest Neighbor Search

– Motivation

Adaptive Binary Quantization

– Formulation

– Optimization

Experiments

Conclusion

Outline

– characterize the inherent data relations, and maintain the affinities between samples in the code space (i.e., Hamming space).

Basic idea: space alignment

– jointly find the discriminative prototypes and their associated binary codes that can align the Hamming space to the original one

Adaptive Binary Quantization: Formulation (1)

𝑑𝑜 𝑥𝑖 , 𝑥𝑗 ≅ 𝑑ℎ(𝑦𝑖 , 𝑦𝑗)

the original space the Hamming space

𝑥𝑖

𝑥𝑗

𝑦𝑖𝑦𝑗

Notations

– The training data set 𝑋 = 𝑥1, 𝑥2, … , 𝑥𝑁 ∈ ℝ𝑑×𝑛

– The code matrix 𝑌 = [𝑦1, 𝑦2, … , 𝑦𝑛] ∈ {−1,1}𝑏×𝑛

– The prototype set 𝑃 = {𝑝𝑘|𝑝𝑘 ∈ ℝ𝑑×𝑛}

– The codebook 𝐶 = {𝑐𝑘|𝑐𝑘 ∈ −1, 1 𝑏}

A prototype based hashing

– learn a hash function ℎ(𝑥) that can map each 𝑥 to 𝑦

(𝑝1, 𝑐1)

(𝑝2, 𝑐2) (𝑝3, 𝑐3)

𝑦 = 𝑐1

ℎ(𝑥) = 𝑐𝑖∗ 𝑥 𝑖∗ 𝑥 = argmink

𝑑𝑜 𝑥, 𝑝𝑘

Space Alignment

– concentrate on the distance consistence so that codes in Hamming space will be aligned with the original data distribution

• global distribution: the prototypes capture the data distribution

• neighbor structure: data belonging to the same prototype share the same code

– Quantization loss

• 𝑑ℎ 𝑦𝑖 , 𝑦𝑗 =1

2𝑦𝑖 − 𝑦𝑗 is the square root of the Hamming distance

𝑸(𝑌, 𝑋) =1

𝑛2σ𝑖,𝑗=1𝑛 𝜆𝑑𝑜 𝑥𝑖 , 𝑥𝑗 − 𝑑ℎ 𝑦𝑖 , 𝑦𝑗

𝑑𝑜 𝑥𝑖 , 𝑥𝑗 ≈ 𝑑𝑜 𝑥𝑖 , 𝑝i∗ xj

𝑸 (𝑃, 𝐶, 𝑖∗(𝑋)) =

𝑖=1

𝑘=1

𝑃𝑤𝑘

𝑛2𝜆𝑑𝑜 𝑥𝑖 , 𝑥𝑗 − 𝑑ℎ 𝑐i∗ xj 𝑖

, 𝑐j∗ xj

Space Alignment

Alternating Optimization

– 1. Adaptive Coding

fixing 𝑃 and 𝑖∗(𝑋), optimize 𝐶

– 2. Prototype Update

fixing 𝐶 and 𝑖∗(𝑋), optimize 𝑃

– 3. Distribution Update

fixing 𝑃 and 𝐶, optimize 𝑖∗(𝑋)

Adaptive Binary Quantization: Optimization (1)

min𝑃,𝐶,𝑖∗(𝑋)

𝑸 (𝑃, 𝐶, 𝑖∗(𝑋))

𝑠. 𝑡. 𝑐𝑘 ∈ −1, 1 𝑏 ; 𝑐𝑘𝑇𝑐𝑙 ≠ 𝑏, 𝑙 ≠ 𝑘

Adaptive Coding

– With the prototype 𝑃 and the assignment index 𝑖∗(𝑋), from 2𝑏 codes find a subset most consistent with the prototypes.

Codebook

101𝑝1

min𝑐𝑘∈ ሚ𝐶

𝑖∗ 𝑥𝑖 =𝑘

𝑘′≠𝑘

𝑤𝑘′ 𝜆𝑑𝑜 𝑥𝑖 , 𝑝𝑘′ − 𝑑ℎ 𝑐𝑘 , 𝑐𝑘′2+

𝑖∗ 𝑥𝑖 ≠𝑘

𝑤𝑘 𝜆𝑑𝑜 𝑥𝑖 , 𝑝𝑘 − 𝑑ℎ 𝑐𝑖∗ 𝑥 , 𝑐𝑘2

Prototype Update

– With the codebook 𝐶 and the assignment index 𝑖∗(𝑋), find the prototypes 𝑃 that can simultaneously capture the data distribution and align with the geometric structure in the code space

– prototypes 𝑃 might be shrunk, and thus gradually adapt the binary codes to the data distribution

Distribution Update

– an assignment updating step to capture the distribution variation

min𝑘′≤ 𝐶

𝑘=1

𝑤𝑘 𝜆𝑑𝑜 𝑥𝑖 , 𝑝𝑘 − 𝑑ℎ 𝑐𝑘′ , 𝑐𝑘2 𝑝𝑘 =

𝑤𝑘

𝑖∗ 𝑥𝑖 =𝑘

𝑥𝑖

𝑖∗ 𝑥𝑖 = arg min𝑘≤ 𝑃

𝑑𝑜 𝑥𝑖 , 𝑝𝑘

Initialization

– k-means clustering to initialize the prototypes 𝑃

– 2𝑏 prototypes and codes to initialize scale parameter 𝜆

Product Quantization

– Generating long (𝑏∗) hash codes by

(1) dividing the original space into 𝑀 = 𝑏∗/𝑏 subspaces

(2) adaptive binary quantization in each subspace

Adaptive Binary Quantization: Algorithm Details (1)

Complexity

– Training: for 𝑛 training samples of dimension 𝑑, to generate 𝑏∗ binary codes, the complexity 𝑂(22𝑏𝑡 ∙ 𝑛𝑑), almost linear to 𝑛 (𝑏 ≤ 8 and # iteration 𝑡 ≤ 20)

– Testing: 𝑂(|𝑃|𝑑), close to the linear projection based hashing

Adaptive Binary Quantization: Algorithm Details (2)

0 10 20

# iterations

0 10 20

# iterations

0 10 201

# iterations

Experiments

Datasets

– SIFT-1M: 1 million 128-D SIFT; GIST-1M: 1 million 960-D GIST

– SIFT-20M: 20 millions 128-D SIFT; Tiny-80M: 80 millions 384-D GIST

Baselines:

– Projection based: LSH, SH, KLSH, AGH, ITQ, KBE

– Prototype based: SPH, KMH

Setting:

– 50,000 and 100,000 training samples and 3,000 queries on each set

– The groundtruth for each query is defined as the top 1,000 nearest neighbors on SIFT-1M, GIST-1M and SIFT-20M, and 5,000 on Tiny-80M based on Euclidean distances

– Average performance of 10 independent runs

Experiments: precision performance

Experiments: recall performance

Experiments: effect of #groundtruth

One observation: in prototype based hashing there might exist a better coding solution that only utilizes a small subset of binary codes instead of the complete set

An adaptive binary quantization method: jointly pursues a set of prototypes in the original space and a subset of binary codes in the Hamming space.

Good properties: enjoys fast computation and the capability of generating long hash codes in product space, with discriminative power for nearest neighbor search.

Encouraging performance: significantly outperforms existing methods on several large datasets, encouraging the further study on the effective binary quantization

Conclusion

Easy to extend for distributed system

– Distributed (map-reduce) + Parallel (PQ)

Future work

Thank you!http://www.nlsde.buaa.edu.cn/~xlliu

Adaptive Binary Quantization for Fast Nearest Neighbor...

Documents

Polarization-multiplexed rate-adaptive non- binary-quasi ...ivan/OE_Murat_01_10.pdf · Polarization-multiplexed rate-adaptive non-binary-quasi-cyclic-LDPC-coded multilevel modulation

Learning Deep Binary Descriptor With Multi-Quantization · 2017-05-31 · learning method called deep binary descriptor with multi-quantization (DBD-MQ) for visual matching. Existing

One-Bit Radar Imaging Via Adaptive Binary Iterative Hard

Context-based adaptive binary arithmetic coding in … IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 7, JULY 2003 Context-Based Adaptive Binary Arithmetic

Self Adaptive Simulated Binary Crossover

OPTIMAL BINARY QUANTIZATION FOR DEEP NEURAL …

Project on Adaptive Binary Coding for Diversity in Communication Systems

Adaptive Binary Quantization for Fast Nearest … a database = =1… and a query , the nearest neighbor of : ∗∈ , such that ... –Projection based: LSH, SH, KLSH, AGH, ITQ, KBE

ADAPTIVE STRUCTURING OF BINARY SEARCH TREES USING

Adaptive Binary Search Treesjonderry/thesis.pdfKeywords: binary search trees, adaptive algorithms, splay trees, Uniﬁed Bound, dy-namic optimality, BST model, lower bounds, partial-sums

Vector quantization of images using modified adaptive ...techlab.bu.edu/files/resources/articles_tt/Vector...VLAJIC AND CARD: VECTOR QUANTIZATION OF IMAGES 1149 For thesake of thefollowing

ADAPTIVE QUANTIZATION OF NEURAL NETWORKS

ADAPTIVE DECOUPLING CONTROL FOR BINARY DISTILLATION COLUMN

Adaptive quantization methods

Distance Encoded Product Quantization...Distance Encoded Product Quantization Jae-Pil Heo 1, Zhe Lin2, Sung-Eui Yoon 1 KAIST 2 Adobe Research Abstract Many binary code embedding techniques

COLOR IMAGE QUANTIZATIONcse.iitd.ac.in/~pkalra/csl783/assignment1/p297-heckbert.pdf · Uniform quantization, though compu- tationally much faster than adaptive, tapered quantization,

Up or Down? Adaptive Rounding for Post-Training Quantization

Adaptive Edge-Based Side-Match Finite-State Classified Vector Quantization with Quadtree Map

Adaptive Quantization Using Piecewise Companding and ... · Adaptive Quantization Using Piecewise Companding and Scaling for Gaussian Mixture Lei Yang and Dapeng Wu leiyang@uﬂ.edu,

Robust Face Hallucination using Quantization-Adaptive ... · Robust Face Hallucination using Quantization-Adaptive Dictionaries Reuben Farrugia Christine Guillemot IEEE Int. Conf