UNIT 4.ppt

Preview:

Citation preview

UNIT IVMULTIMEDIA STRUCTURES

SEGMENT TREES• Originally introduced by Bentley(1977)• Handle intervals on the real line whose end-points

belong to a fixed set of N abscissae. • A static structure with respect to the abscissae.

i.e. Does not support insertions and deletions of abscissae.

• The abscissae can be normalized by replacing each of them by its rank in their left-to-right order.

• Hence, without loss of generality, we may consider these abscissae as the integers in the range [1,N]

SEGMENT TREES• Consider a set of N abscissae on the x-axis normalized to

the integers [1,N] by their rank.• These N abscissae determine N-1 elementary intervals

[i,i+1], for i = 1,2,….,N-1

SEGMENT TREES• Segment tree is a rooted binary tree.• Each node x is assigned a static interval int[x]• Recursive construction of T(L,R), where 1≤ L < R ≤ N

are integers:– Root r : int[r] = [L,R]– For each node x є Tr , if high[int[x]] – low[int[x]] > 1, then

• A left subtree Tleft[x] and a right subtree Tright[x] such that

int[left[x]] = [ low[int[x]] , mid[int[x]] ]int[right[x]] = [ mid[int[x]] , high[int[x]] ] wheremid[int[x]] = (low[int[x]] + high[int[x]]) / 2

• For each node x є Ty , int[x] int[y]

SEGMENT TREES• Leaf nodes of Ty stores elementary intervals:

[i,i+1] for i = low[int[y]] ,……, high[int[y]]-1• Intervals of the nodes of T(L,R) :

Standard intervals of T(L,R) • T(L,R) is balanced – All leafs belong to two consecutive levels– Heigth(depht) : h(T(L,R)) = lg(R-L)

h(T(1,N)) = lg(N-1)• An arbitrary interval i [1,N] will be partitioned into at

most lg(N-1) + lg(N-1) - 2 = 2h(T)-2 standard intervals of T(1,N).

The Segment Tree T(1,17)

1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 12-13 13-14 14-15 15-16 16-17

1-3 3-5 5-7 7-9 9-11 11-13 13-15 15-17

1-5 5-9 9-13 13-17

1-9 9-17

1-17

The Segment Tree T(4,15)

7-8 8-9 10-11 11-12 13-14 14-15

4-5 5-6 6-7 7-9 9-10 10-12 12-13 13-15

4-6 6-9 9-12 12-15

4-9 9-15

4-15

Insert Operation in Segment Trees• Pseudocode for inserting an interval i to a segment tree T• Initial Invocation : INSERT(root[T], i)

INSERT(x, i)if low[i] ≤ low[int[x]] and high[i] ≥ high[int[x]] then

C[x] ← C[x] + 1Z[x] ← Z[x] {i}

elsemidx ← (low[int[x]] + high[int[x]]) / 2 if low[i] < midx then

INSERT( left[x], i)if high[i] ≥ midx then

INSERT(right[x], i)

CASE (1)

CASE (2)

CASE (3)

Insert Operation in Segment Trees

• Z[x] : Set of intervals allocated to node x Set of intervals overlapping with the standard

interval int[x]

• C[x] : Cardinality of the set I[x] Number of intervals overlapping with the standard

interval int[x]

Deletion in Segment Trees• Pseudocode for deleting an interval i from a segment tree T• Initial invocation : DELETE (root[T], i)

DELETE(x, i)if low[i] ≤ low[int[x]] and high[i] ≥ hig[int[x]] then

C[x] ← C[x] - 1Z[x] ← Z[x] - {i}

elsemidx ← (low[int[x]] + high[int[x]]) / 2 if low[i] < midx then

DELETE( left[x], i)if high[i] ≥ midx then

DELETE(right[x], i)

Insertion to a Segment Tree• CASE (1) is mutually exclusive with CASE (2) and CASE (3)

(3)

(1)

(2)&(3)

(3)

(2)

(2)

int[right[x]]

int[x]

int[left[x]]

(2)&(3) (2)&(3)

Operation of INSERT(root[T], i)• Corresponds to a tour T of the following structure• An initial path PINIT from the root to a fork node x*

PINIT may be empty. (e.g. x* = root[T] )

• Two paths PL & PR may stem form the fork node x*

PL & PR are paths in Tleft[x*] & Tright[x*] , respectively• Either, interval i is allocated entirely to the fork node x*

In this case PL & PR are both empty.

Or, all right children of nodes on PL, which are not on PL

all left children of nodes on PR, which are not on PR

identify the fragmentation(allocation) of the interval i

Operation of INSERT(root[T], i)

• For all x PIN

– Either (2) or (3) holds (not both) such that– Either i int[left[x]] or i int[right[x]]

OR

int[x]int[x]

i i

Operation of INSERT(root[T], i)• At the fork node x* – Either (1) holds such that int[x] = i

– Or both (2) & (3) holds such that • low[int[x]] ≤ low[i] < mid(int[x]) < high[i] ≤ high[int[x]]

int[x*]

i

i

int[x*]

Operation of INSERT(root[T], i)• Left path PL from the fork node x* • Discussion for right path PR is similar (dual)• Traversal on PL corresponds to locating the point low[i]

in the standard intervals of Tleft[x*]

• Traversal continues until allocating node z where– low[int[z]] = low[i]– Search will terminate on PL due to (1) since:

i

int[z]

Operation of INSERT(root[T], i)• (3) holds for all nodes on PL since– high[i] > mid(int[x*]) ≥ high[int[x]] > mid(int[x])

for all x Tleft[x*]

• If only (3) holds for a node x PL (x≠z), we have– mid(int[x]) < low[z] < high[int[x]]– PL will go right due to INSERT(right[x],i)

int[x]

Operation of INSERT(root[T], i)• If both (2) & (3) holds for a node x PL (x≠z)– low[int[x]] < low[i] < mid(int[x])– Path PL will go left due to INSERT(left[x],i) – Node right[x] will be allocated in INSERT(right[x],i) by (1)

since low[i] < low[int[left[x]]] and high[i] > high[int[left[x]]]

int[x]

int[right[x]]

i

The Segment Tree T(1,17)

1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 12-13 13-14 14-15 15-16 16-17

1-3 3-5 5-7 7-9 9-11 11-13 13-15 15-17

1-5 5-9 9-13 13-17

1-9 9-17

1-17a : [5,11]

b : [7,13]

i : [10,12]

{a} {b}

{b} {a}

The Segment Tree T(1,17)

1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 12-13 13-14 14-15 15-16 16-17

1-3 3-5 5-7 7-9 9-11 11-13 13-15 15-17

1-5 5-9 9-13 13-17

1-9 9-17

1-17

mid = 5

mid = 9[5-17]

KD tree definition

• A recursive space partitioning tree.• – Partition along x and y axis in an alternating

fashion.• – Each internal node stores the splitting node

along x (or y).

K-d tree

• Used for point location and multiple database quesries, k –number of the attributes to perform the search

• Geometric interpretation – to perform search in 2D space – 2-d tree

• Search components (x,y) interchange!

K-d tree example

a

c

b

ed

d

b

f

f

c a e

Kd tree example

3D kd tree

The canonical method of kd-tree construction is the following:

As one moves down the tree, one cycles through the axes used to select the splitting planes. (For example, the root would have an x-aligned plane, the root's children would both have y-aligned planes, the root's grandchildren would all have z-aligned planes, the next level would have an x-aligned plane, and so on.)

Points are inserted by selecting the median of the points being put into the subtree, with respect to their coordinates in the axis being used to create the splitting plane. (Note the assumption that we feed the entire set of points into the algorithm up-front.)

Construction

This method leads to a balanced kd-tree, in which each leaf node is about the same distance from the root. However, balanced trees are not necessarily optimal for all applications.

Note also that it is not required to select the median point. In that case, the result is simply that there is no guarantee that the tree will be balanced. A simple heuristic to avoid coding a complex linear-time median-finding algorithm or using an O(n log n) sort is to use sort to find the median of a fixed number of randomly selected points to serve as the cut line

Consruction

Kd tree – mean vs median

kd-tree partitions of a uniform set of data points, using the mean (left image) and the median (right image) thresholding options. Median: The middle value of a set of values. Mean: The arithmetic average.(Andrea Vivaldi and Brian Fulkersson)http://www.vlfeat.org/overview/kdtree.html

Example of using Median

One adds a new point to a kd-tree in the same way as one adds an element to any other search tree.

First, traverse the tree, starting from the root and moving to either the left or the right child depending on whether the point to be inserted is on the "left" or "right" side of the splitting plane.

Once you get to the node under which the child should be located, add the new point as either the left or right child of the leaf node, again depending on which side of the node's splitting plane contains the new node.

Adding points in this manner can cause the tree to become unbalanced, leading to decreased tree performance

Additions

To remove a point from an existing kd-tree, without breaking the invariant, the easiest way is to form the set of all nodes and leaves from the children of the target node, and recreate that part of the tree.

Another approach is to find a replacement for the point removed. First, find the node R that contains the point to be removed. For the base case where R is a leaf node, no replacement is required. For the general case, find a replacement point, say p, from the sub-tree rooted at R. Replace the point stored at R with p. Then, recursively remove p.

Deletions

• Balancing a kd-tree requires care. Because kd-trees are sorted in multiple dimensions, the tree rotation technique cannot be used to balance them — this may break the invariant.

• Several variants of balanced kd-tree exists. They include divided kd-tree, pseudo kd-tree, K-D-B-tree, hB-tree and Bkd-tree. Many of these variants are adaptive k-d tree.

Balancing

Kdtree query uses a best-bin first search heuristic. This is a branch-and-bound technique that maintains an estimate of the smallest distance from the query point to any of the data points down all of the open paths.

Kdtree query supports two important operations: nearest-neighbor search and k-nearest neighbor search. The first returns nearest-neighbor to a query point, the latter can be used to return the k nearest neighbors to a given query point Q. For instance:

Quering

• Starting with the root node, the algorithm moves down the tree recursively (i.e. it goes right or left depending on whether the point is greater or less than the current node in the split dimension).

• Once the algorithm reaches a leaf node, it saves that node point as the "current best"

• The algorithm unwinds the recursion of the tree, performing the following steps at each node:

Nearest-neighbor search

◦ If the current node is closer than the current best, then it becomes the current best.

◦ The algorithm checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. In concept, this is done by intersecting the splitting hyperplane with a hypersphere around the search point that has a radius equal to the current nearest distance.

◦ If the hypersphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search.If the hypersphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated.

Recursion step

kd-trees are not suitable for efficiently finding the nearest neighbour in high dimensional spaces.

In very high dimensional spaces, the curse of dimensionality causes the algorithm to need to visit many more branches than in lower dimensional spaces. In particular, when the number of points is only slightly higher than the number of dimensions, the algorithm is only slightly better than a linear search of all of the points.

The algorithm can be improved. It can provide the k-Nearest Neighbors to a point by maintaining k current bests instead of just one. Branches are only eliminated when they can't have points closer than any of the k current bests.

Nearest-neighbor search

• Kd tree provide convenient tool for range search query in databases with more than one key. The search might go down the root in both directions (left and right), but can be limited by strict inequality on key value at each tree level.

• Kd tree is the only data structure that allows easy multi-key search.

Range search

Kd tree

http://upload.wikimedia.org/wikipedia/en/9/9c/KDTree-animation.gif

Building a static kd-tree from n points takes O(n log 2 n) time if an O(n log n) sort is used to compute the median at each level.

The complexity is O(n log n) if a linear median-finding algorithm such as the one described in Cormen et al.] is used.

Inserting a new point into a balanced kd-tree takes O(log n) time.

Removing a point from a balanced kd-tree takes O(log n) time.

Querying an axis-parallel range in a balanced kd-tree takes O(n1-1/k +m) time, where m is the number of the reported points, and k the dimension of the kd-tree.

Complexity

Instead of points, a kd-tree can also contain rectangles.A 2D rectangle is considered a 4D object (xlow, xhigh, ylow, yhigh). Thus range search becomes the problem of returning all

rectangles intersecting the search rectangle. The tree is constructed the usual way with all the rectangles at

the leaves. In an orthogonal range search, the opposite coordinate is used when comparing against the median. For example, if the current level is split along xhigh, we check the xlow coordinate of the search rectangle. If the median is less than the xlow coordinate of the search rectangle, then no rectangle in the left branch can ever intersect with the search rectangle and so can be pruned. Otherwise both branches should be traversed.

Note that interval tree is a 1-dimensional special case.

Kd tree of rectangles

• Query processing in sensor networks• Nearest-neighbor searchers• Optimization• Ray tracing• Database search by multiple keys

Applications

Examples of applications

0 100 Km.

Population, ’96

P o p u la tio n D is tr ib u tio n in A lb e rta , 1 9 9 6 cen su s

Progressive Meshes

Developed by Hugues Hoppe, Microsoft Research Inc. Published first in SIGGRAPH 1996.

Terrain visualization applications

Geometric subdivision

Problems with Geometric Subdivisions

ROAM principle

The basic operating principle of ROAM

Quadtree

• Quadtrees can be used to store different types of data

• In this section, we will describe the variant that stores a set of points in the plane

Quadtree

• The recursive splitting of squares continues as long as there is more than one point in a square

Definition

• A set P of points inside a square •  • If then the quadtree consists of a

single leaf where the set P and the square are stored

• Let NE, NW, SW, SE denote the four quadrants of

yyxx :::1)( Pcard

Definition

• Let

2/)(:2/)(:

yyyxxx

mid

mid

},:{:

},:{:

},:{:

},:{:

midymidxSE

midymidxSW

midymidxNW

midymidxNE

ypxpPpP

ypxpPpP

ypxpPpP

ypxpPpP

Theorem 14.1

• The depth of a quadtree for a set P of points in the plane is at most log(s/c)+3/2, where c is the smallest distance between any two points in P and s is the side length of the initial square that contains P

Theorem 14.2

• A quadtree of depth d storing a set of n points has O((d+1)n) nodes and can be constructed in O((d+1)n) time.

Neighbor FindingAlgorithm NorthNeighbor(v,T)

if v = root(T) then return nilif v = SW-child of parent(v) then return NW-child of parent(v)if v = SE-child of parent(v) then return NE-child of parent(v)μ←NorthNeighbor(parent(v), T)if μ = nil or μ is a leaf

then return μelse if v = NW-child of parent(v)

then return SW-child of μelse return SE-child of μ

Theorem 14.3

• Let T be a quadtree of depth d. The neighbor of a given node v in T in a direction, as defined above, can be found in O(d+1) time.

Balanced Quadtrees

• A quadtree subdivision is called balanced if any two neighboring squares differ at most a factor two in size.

• A quadtree is called balanced if its subdivision is balanced

Balanced Quadtrees

Balanced QuadtreeAlgorithm BalancedQuadtree(T)

Insert all the leaves of T into a linear list Lwhile L is not empty

do Remove a leaf μ from Lif (μ) has to be split

then Make μ into an internal node with four children, which are leaves that correspond to the four quadrants of (μ). If μ

stores a point, then store the point in the correct new leaf instead.Insert the four new leaves into LCheck if (μ) had neighbors that now need to be split and, if

so, insert them into Lreturn T

Theorem 14.4

• Let T be a quadtree with m nodes. Then the balanced version of T has O(m) nodes and it can be constructed in O((d+1)m) time.– At most 8m splitting

From Quadtrees to Meshes

• The idea is to use a quadtree subdivision as the first step

• Stop splitting when the square is no longer intersected by any component, or when it has unit size

• Make the quadtree subdivision balanced before triangulate

Generate MeshAlgorithm GenerateMesh(S)…………

for each face of Mdo if the interior of is intersected by an edge of a component

then Add the intersection(which is a diagonal) as an edge to Melse if hasonly vertices at its corners

then Add a diagonal of as an edge to Melse Add a Steiner point in the center of , connect it to all

vertices on the boundary of , and change M accordingly

return M

An Example

Theorem 14.5

• Let S be a set of disjoint polygonal components inside the square [0:U]×[0:U]. The number of triangles is O(p(S)log(U)), where p(S) is the sum of the perimeters of the components in S, and the mesh can be constructed in O(p(S)log2U) time.

R-trees

• [Guttman 84] Main idea: allow parents to overlap!– => guaranteed 50% utilization– => easier insertion/split algorithms.– (only deal with Minimum Bounding Rectangles -

MBRs)

R-trees

• A multi-way external memory tree• Index nodes and data (leaf) nodes• All leaf nodes appear on the same level• Every node contains between m and M entries• The root node has at least 2 entries (children)

Example

• eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> disk page

A

B

C

DE

FG

H

J

I

Example

• F=4

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4F GD E

H I JA B C

Example

• F=4

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

R-trees - format of nodes• {(MBR; obj_ptr)} for leaf nodes

P1 P2 P3 P4

A B Cx-low; x-highy-low; y-high

...

objptr ...

R-trees - format of nodes• {(MBR; node_ptr)} for non-leaf nodes

P1 P2 P3 P4

A B C

x-low; x-highy-low; y-high

...nodeptr ...

R-trees:Search

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

R-trees:Search

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

R-trees:Search• Main points:– every parent node completely covers its ‘children’– a child MBR may be covered by more than one parent - it

is stored under ONLY ONE of them. (ie., no need for dup. elim.)

– a point query may follow multiple branches.– everything works for any(?) dimensionality

R-trees:Insertion

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B CX

X

Insert X

R-trees:Insertion

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B CY

Insert Y

R-trees:Insertion• Extend the parent MBR

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B CYY

R-trees:Insertion

• How to find the next node to insert the new object?– Using ChooseLeaf: Find the entry that needs the

least enlargement to include Y. Resolve ties using the area (smallest)

• Other methods (later)

R-trees:Insertion

• If node is full then Split : ex. Insert w

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

W

K

K

R-trees:Insertion

• If node is full then Split : ex. Insert w

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

Q1 Q2

F G

D E

H I J

A BW

K

C K W

P5 P1 P5 P2 P3 P4

Q1Q2

R-trees:Split

• Split node P1: partition the MBRs into two groups.

A

B

C W

KP1

• (A1: plane sweep,

until 50% of rectangles)

• A2: ‘linear’ split

• A3: quadratic split

• A4: exponential split:

2M-1 choices

R-trees:Split

pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’

seed1

seed2R

R-trees:Split

• pick two rectangles as ‘seeds’;• assign each rectangle ‘R’ to the ‘closest’ ‘seed’:• ‘closest’: the smallest increase in area

seed1

seed2R

R-trees:Split• How to pick Seeds:– Linear:Find the highest and lowest side in each dimension,

normalize the separations, choose the pair with the greatest normalized separation

– Quadratic: For each pair E1 and E2, calculate the rectangle J=MBR(E1, E2) and d= J-E1-E2. Choose the pair with the largest d

R-trees:Insertion

• Use the ChooseLeaf to find the leaf node to insert an entry E

• If leaf node is full, then Split, otherwise insert there– Propagate the split upwards, if necessary

• Adjust parent nodes

R-Trees:Deletion• Find the leaf node that contains the entry E• Remove E from this node• If underflow:– Eliminate the node by removing the node entries and the

parent entry– Reinsert the orphaned (other entries) into the tree using

Insert

• Other method (later)

R-trees: Variations

• R+-tree: DO not allow overlapping, so split the objects (similar to z-values)

• R*-tree: change the insertion, deletion algorithms (minimize not only area but also perimeter, forced re-insertion )

• Hilbert R-tree: use the Hilbert values to insert objects into the tree

TV-Tree (Telescopic-Vector tree)

• The basis of the tv-tree is to use dynamically contracting and extending feature vectors. ( Like in classification )

TV-tree

• We have also a hierarchical structure:• The objects are clustered into leaf nodes of the

tree, and the (MBR), minimum bounding region is stored in the parent node.

• Parents are recursively grouped, until the root is formed.

• At the top levels it’s optimal because it uses only a few basic features.

TV-tree

• The TV-tree can be applied to a tree with nodes that describe bounding regions of any shape (cubes,spheres,rectangles, … etc ).

Telescoping function• The telescoping problem can be described as follows.• Given an n x 1 feature vector x and m x n (m≤n)

contraction matrix Am.• The Amx is an m-contraction of x.• A sequence of such matrices Am with m=1,…

describes a telescoping function provided that the following condition is satisfied : If the m1-contractions of the 2 vectors x and y are equal, then so are their respective m2-contractions, for every m2 ≤ m1.

Multiple shapes• We can use for example a sphere, because it’s

only a center and a radius r. Represents the set of points with euclidean distance ≤ r.

• ~the euclidean distance is a special case of the Lp metrics with p=2.

• For L1 metric (manhattan distance) it defines a diamond shape.

• The TV-tree is working with any Lp-sphere.

TMBR (Telescopic Minimum Bounding Region)

• Each node in the TV-Tree represents the MBR (an Lp-sphere) of all its descendents.

• Each region is represented by a center, which is a vector determined by the telescoping vectors representing the objects and a scalar radius.

• We use the term TMBR to denote an MBR with such a telescopic vector as a center.