98
Multi-dimensional Search Trees CS302 Data Structures dified from Dr George Bebis

Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Embed Size (px)

Citation preview

Page 1: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Multi-dimensional Search Trees

CS302 Data Structures

Modified from Dr George Bebis

Page 2: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Query Types

Exact match query: Asks for the object(s) whose key matches query key exactly.

Range query: Asks for the objects whose key lies in a specified query range (interval).

Nearest-neighbor query: Asks for the objects whose key is “close” to query key.

2

Page 3: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Exact Match Query

Suppose that we store employee records in a database:

ID Name Age Salary #Children

Example:key=ID: retrieve the record with ID=12345

3

Page 4: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Range Query

Example: key=Age: retrieve all records satisfying

20 < Age < 50 key= #Children: retrieve all records satisfying

1 < #Children < 4

4

ID Name Age Salary #Children

Page 5: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Nearest-Neighbor(s) (NN) Query Example:

key=Salary: retrieve the employee whose salary is closest to $50,000 (i.e., 1-NN).

key=Age: retrieve the 5 employees whose age is closest to 40 (i.e., k-NN, k=5).

5

ID Name Age Salary #Children

Page 6: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Nearest Neighbor(s) Query

What is the closest restaurant to my hotel?

6

Page 7: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Nearest Neighbor(s) Query (cont’d)

Find the 4 closest restaurants to my hotel

7

Page 8: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Multi-dimensional Query

In practice, queries might involve multi-dimensional keys.

key=(Name, Age): retrieve all records with

Name=“George” and “50 <= Age <= 70”

8

ID Name Age Salary #Children

Page 9: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Nearest Neighbor Query in High Dimensions Very important and practical problem!

Image retrieval

9

(f1,f2, .., fk)

find N closest matches (i.e., N nearest neighbors)

Page 10: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Nearest Neighbor Query in High Dimensions

Face recognition

10

find closest match(i.e., nearest neighbor)

Page 11: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

We will discuss …

Range trees

KD-trees

11

Page 12: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Interpreting Queries Geometrically Multi-dimensional keys can be thought as

“points” in high dimensional spaces.

Queries about records Queries about points

12

Page 13: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example 1- Range Search in 2D

13

age = 10,000 x year + 100 x month + day

Page 14: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example 2 – Range Search in 3D

14

Page 15: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example 3 – Nearest Neighbors Search

15

QueryPoint

Page 16: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

1D Range Search

16

Page 17: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

1D Range Search

17

• Updates take O(n) time

• Does not generalize well to high dimensions.

Example: retrieve all points in [25, 90]

Range: [x, x’]

Page 18: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

1D Range Search

18

Data Structure 2: BST Search using binary search property. Some subtrees are eliminated during search.

xRange:[l,r]

l x r x

Example: retrieve all points in [25, 90]

Search using:

search searchif

if

Page 19: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

1D Range Search

19

Data Structure 3: BST with data stored in leaves Internal nodes store splitting values

i.e., not necessarily same as data. Data points are stored in the leaf nodes.

Page 20: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

BST with data stored in leaves

20

0 100

5025 75

Data: 10, 39, 55, 120

50

25 75

10 39 55 120

Page 21: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

1D Range Search

21

Retrieving data in [x, x’] Perform binary search twice,

once using x and the other using x’ Suppose binary search ends at leaves l and l’ The points in [x, x’] are the ones stored between l and l’ plus,

possibly, the points stored in l and l’

Page 22: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

1D Range Search Example: retrieve all points in [25, 90]

The search path for 25 is:

22

Page 23: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

1D Range Search The search for 90 is:

23

Page 24: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

1D Range Search

Examine the leaves in the sub-trees between the two traversing paths from the root.

24

split node

retrieve all points in [25, 90]

Page 25: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

1D Range Search – Another Example

25

Page 26: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

1D Range Search

26

How do we find the leaves of interest?

Find split node (i.e., node where the paths to x and x’ split

Left turn: report leaves in right subtrees

Right turn: report leaves in left substrees

O(logn + k) time where k is the number of items reported.

Page 27: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

1D Range Search

Speed-up search by keeping the leaves in sorted order using a linked-list.

27

Page 28: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

28

2D Range Search

y

y’

Page 29: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

29

2D Range Search (cont’d) A 2D range query can be decomposed in two 1D

range queries: One on the x-coordinate of the points. The other on the y-coordinates of the points.

y

y’

Page 30: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

2D Range Search (cont’d)

Store a primary 1D range tree for all the points based on x-coordinate.

For each node, store a secondary 1D range tree based on y-coordinate.

30

Page 31: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

2D Range Search (cont’d)

31

Space requirements: O(nlogn)

Range Tree

Page 32: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

2D Range Search (cont’d)

Search using the x-coordinate only. How to restrict to points with proper y-coordinate?

32

Page 33: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

2D Range Search (cont’d) Recursively search within each subtree using

the y-coordinate.

33

Page 34: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Range Search in d dimensions

34

O(logn + k)

O(log2n + k)

1D query time:

2D query time:

d dimensions:

Page 35: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree • A binary search tree where every node is a k-dimensional point.

53, 14

27, 28 65, 51

31, 8530, 11 70, 3 99, 90

29, 16 40, 26 7, 39 32, 29 82, 64

73, 7515, 6138, 23 55,62

Example: k=2

Page 36: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree (cont’d)

Example: data stored at the leaves

Page 37: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree (cont’d) • Every node (except leaves) represents a hyperplane

that divides the space into two parts.• Points to the left (right) of this hyperplane represent the

left (right) sub-tree of that node.

Pleft Pright

Page 38: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree (cont’d)

As we move down the tree, we divide the space along alternating (but not always) axis-aligned hyperplanes:

Split by x-coordinate: split by a vertical line that has (ideally) half the points left or on, and half

right.

Split by y-coordinate: split by a horizontal line that has (ideally) half the points below or on and half above.

Page 39: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree - Example Split by x-coordinate: split by a vertical line that has approximately half the points left or on, and half right.

x

Page 40: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree - Example

Split by y-coordinate: split by a horizontal line that has half the points below or on and half above.

x

yy

Page 41: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree - Example Split by x-coordinate: split by a vertical line that has half the points left or on, and half right.

x

y

x

y

xxx

Page 42: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree - Example

x

y

x

y

Split by y-coordinate: split by a horizontal line that has half the points below or on and half above.

y

xxx

y

Page 43: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Node Structure

A KD-tree node has 5 fields Splitting axis Splitting value Data Left pointer Right pointer

Page 44: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Splitting Strategies

Divide based on order of point insertion Assumes that points are given one at a time.

Divide by finding median Assumes all the points are available ahead of time.

Divide perpendicular to the axis with widest spread Split axes might not alternate

… and more!

Page 45: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example – using order of point insertion

(data stored at nodes)

Page 46: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example – using median(data stored at the leaves)

Page 47: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example – using median(data stored at the leaves)

Page 48: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example – using median(data stored at the leaves)

Page 49: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example – using median(data stored at the leaves)

Page 50: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example – using median(data stored at the leaves)

Page 51: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example – using median(data stored at the leaves)

Page 52: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example – using median(data stored at the leaves)

Page 53: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example – using median(data stored at the leaves)

Page 54: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Example – using median(data stored at the leaves)

Page 55: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Another Example – using median

Page 56: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Another Example - using median

Page 57: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Another Example - using median

Page 58: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Another Example - using median

Page 59: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Another Example - using median

Page 60: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Another Example - using median

Page 61: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Another Example - using median

Page 62: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

62

Example – split perpendicular to the axis with widest spread

Page 63: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

63

KD Tree (cont’d)

Let’s discuss Insert

Delete

Search

Page 64: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Insert (55, 62)

53, 14

27, 28 65, 51

31, 8530, 11 70, 3 99, 90

29, 16 40, 26 7, 39 32, 29 82, 64

73, 7515, 6138, 23 55,62

55 > 53, move right

62 > 51, move right

55 < 99, move left

62 < 64, move left

Null pointer, attach

Insert new data

x

y

x

y

Page 65: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree – Exact Search

Page 66: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree – Exact Search

Page 67: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree – Exact Search

Page 68: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree – Exact Search

Page 69: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree – Exact Search

Page 70: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree – Exact Search

Page 71: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree – Exact Search

Page 72: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree – Exact Search

Page 73: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree – Exact Search

Page 74: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree – Exact Search

Page 75: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree – Exact Search

Page 76: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

76

KD Tree - Range Search

53, 14

27, 28

31, 8530, 11

40, 26 32, 29

38, 23

65, 51

70, 3 99, 90

82, 64

73, 75

29, 16 7, 39

15, 61

low[0] = 35, high[0] = 40;

In range? If so, print cell

low[level]<=data[level] search t.left

high[level] >= data[level] search t.right

This sub-tree is never searched.

Searching is “preorder”. Efficiency is obtained by “pruning” subtrees from the search.

low[1] = 23, high[1] = 30;

xRange:[l,r]

l x r x[35, 40] x [23, 30]

x

y

x

y

x

Page 77: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD Tree (vs Range tree)• Construction O(dnlogn)

• Sort points in each dimension: O(dnlogn)• Determine splitting line (median finding): O(dn)

• Space requirements: • KD tree: O(n) • Range tree: O(nlogd-1n)

• Query requirements: • KD tree: O(n1-1/d+k) • Range tree: O(logdn+k)

O(n+k) as d increases!

Page 78: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Nearest Neighbor (NN) Search Given: a set P of n points in Rd

Goal: find the nearest neighbor p of q in P

qp

1 1 2 2

2 21 2 1 2

( , ) ( , )

( ) ( )

p x y q x y

d x x y y

Euclidean distance

Page 79: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Nearest Neighbor Search -Variations

r-search: the distance tolerance is specified.

k-nearest-neighbor-queries: the number of close matches is specified.

Page 80: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Naïve approach Compute the distance from the query point to

every other point in the database, keeping track of the "best so far".

Running time is O(n).

Nearest Neighbor (NN) Search

qp

Page 81: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Array (Grid) Structure

(1) Subdivide the plane into a grid of M x N square cells (same size)

(2) Assign each point to the cell that contains it.

(3) Store as a 2-D (or N-D in general) array:

“each cell contains a link to a list of points stored in that cell”

p1,p2p1

p2

Page 82: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Algorithm

* Look up cell holding query point.

* First examine the cell containing the query, then the cells adjacent to the query

(i.e., there could be points in adjacent cells that are closer).

Comments

* Uniform grid inefficient if points unequally distributed.

- Too close together: long lists in each grid, serial search. - Too far apart: search large number of neighbors. * Multiresolution grid can address some of these issues.

Array (Grid) Structure

qp1

p2

Page 83: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Nearest Neighbor with KD Trees

Traverse the tree, looking for the rectangle that contains the query.

Page 84: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Explore the branch of the tree that is closest to the query point first.

Nearest Neighbor with KD Trees

Page 85: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Explore the branch of the tree that is closest to the query point first.

Nearest Neighbor with KD Trees

Page 86: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

When we reach a leaf, compute the distance to each point in the node.

Nearest Neighbor with KD Trees

Page 87: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

When we reach a leaf, compute the distance to each point in the node.

Nearest Neighbor with KD Trees

Page 88: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Then, backtrack and try the other branch at each node visited.

Nearest Neighbor with KD Trees

Page 89: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Each time a new closest node is found, we can update the distance bounds.

Nearest Neighbor with KD Trees

Page 90: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Each time a new closest node is found, we can update the distance bounds.

Nearest Neighbor with KD Trees

Page 91: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

Nearest Neighbor with KD Trees

Page 92: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

Nearest Neighbor with KD Trees

Page 93: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

Nearest Neighbor with KD Trees

Page 94: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Can find the k-nearest neighbors to a query by maintaining the k current bests instead of just one.

Branches are only eliminated when they can't have points closer than any of the k current bests.

K-Nearest Neighbor Search

Page 95: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD variations - PCP Trees

Splits can be in directions other than x and y.

Divide points perpendicular

to the axis with widest

spread.

Principal Component

Partitioning (PCP)

Page 96: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

KD variations - PCP Trees

Page 97: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

“Curse” of dimensionality KD-trees are not suitable for efficiently finding the

nearest neighbor in high dimensional spaces.

Approximate Nearest-Neighbor (ANN) Examine only the N closest bins of the kD-tree Use a heap to identify bins in order by their distance

from query. Return nearest-neighbors with high probability

(e.g., 95%).

97

J. Beis and D. Lowe, “Shape Indexing Using Approximate Nearest-Neighbour Search in High-Dimensional Spaces”, IEEE Computer Vision and Pattern Recognition, 1997.

Query time: O(n1-1/d+k)

Page 98: Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis

Dimensionality Reduction

98

Idea: Find a mapping T to reduce the dimensionality of the data.Drawback: May not be able to find all similar objects (i.e., distance relationships might not be preserved)