32
The Power of Incorrectness A Brief Introduction to Soft Heaps

The Power of Incorrectness A Brief Introduction to Soft Heaps

Embed Size (px)

Citation preview

The Power of Incorrectness

A Brief Introduction to Soft Heaps

The Problem

A heap (priority queue) is a data structure that stores elements with keys chosen from a totally ordered set (e.g. integers)

We need to support the following operations Insert an element Update (decrease the key of an element) Extract min (find and delete the element with

minimum key) Merge (optional)

A Note on Notation

We evaluate algorithm speed using big O notation. Most of the upper bounds on runtime given here are also

lower bounds, but we use just big-O to simplify notation. Some of the runtime specified are amortized, which

means the average over a sequence of operation. They’re stated as normal bounds to reduce confusion.

All logs are 2 based and denoted as lg. N is the number of elements in our heap at any time. We

also use it to denote the number of operations. We’re working in the comparison model.

What about delete?

Note we can perform delete the ‘lazy’ style by marking the elements as deleted using a flag.

Then we can perform repeated extract-mins when the minimum element is already marked.

So delete doesn’t need to be treated any differently than extract-min.

The General Approach

We store the elements in a tree with a constant branching factor

Heap condition: the key of any node is always at least the key of its parent.

*Exercise: show we can perform insert and delete min in time proportional to the height of the tree.

Binary Heaps

We use a perfectly balanced tree with a constant branching factor.

The height of the tree is O(lgN). So insert/update/extract min all take

O(lgN) time. Merge is not supported as a ‘basic’

operation.

Binomial Heap

Binomial heaps use a branching factor of O(lgN) and can also support merge in O(lgn) time.

The main idea is keep a forest of trees, each with a number of nodes are powers of two and no two have the same size.

When we merge two trees of same size, we get another tree whose size is a power of two.

We can merge two such forests in O(lgN) time in a manner analogous to binary addition.

Structure of Binomial Heaps

We typically describe binomial heaps as one binomial heap attached to another of the same size.

Let the rank of a tree in a binomial be the log of the number of nodes it has.

Even More Heap

If we get ‘lazy’ with binomial heaps, which is to save all the work until we perform extract min, we can get to O(1) per insert and merge, but O(lgn) for extract min and update.

Fibonacci heaps (by Tarjan) can do insert, update, merge in O(1) per access.

But it still requires O(lgN) for a delete Can we get rid of the O(lgN) factor?

No!

WHY?

A Bound on Sorting

We can’t sort N numbers in a comparison model faster than O(NlgN) time.

Sketch of Proof:There are N! possible permutationsEach operation in comparison model can only

have 2 possible results, true/false.So for our algorithm to give distinguish all N!

inputs, we need log(N!)=O(NlgN) operations.

Now Apply to Heaps

Given an array of N elements, we can insert them into a heap in N inserts.

Performing extract-min once gives the 1st element of the sorted list, 2nd time gives the 2nd element.

So we can perform extract-min N times to get a sorted list back

So one of insert or extract min must take at least O(lgN) time per operation.

Is There a Way Around This

Note there is a hidden assumption made in the proof on the previous slide:

The result given by every call of extract-min must be correct.

The Idea

We sacrifice some correctness to get a better runtime. To be more specific, We allow a fraction of the answers provided by extract-min to be incorrect.

Soft Heaps

Supports insertion, update, extract-min and merge in O(1) time.

No more than εN (0<ε<0.5) of all elements have their keys raised at any point.

The Motivation: Car Pooling

No, I Meant This :

The Idea in Words

We modify the binomial heap described earlier. trees don’t have to be full anymore.The idea of rank can be transplanted.

We put multiple elements on the same node, resulting in the non-fullness.

This allows a reduction in the height of the tree.

The Catch

If a node has multiple elements stored on it, how do we track which one is the minimum?

Solution: we assign all the elements in the list the same key.

Some of the keys would be raised. This is where the error rate comes in.

Example

Modified binomial heap with 8 elements.

Two of the nodes have 2 elements instead of one. Note 2 and 3’s key values are raised.

But two nodes in the deeper parts of the tree are no longer there.

Outline of the Algorithm

Insert is done through merging of heaps We merge as we do in binomial heaps, in

a manner not so different than adding binary numbers.

When inserting, we do not have to change any of the lists stored in the nodes, all we have to do it to maintain heap order when merging trees.

Extract-Min

If the root’s list is not empty, we just take what it ‘close’ to the minimum, remove it and reduce the size of the list by 1 Recall that we don’t have to be right sometimes.

This is a bit trickier when the list is empty. In both cases we ‘siphon’ elements from below

the root to append the root’s list using a separate procedure called ‘sift’.

Sift

We pull up some of the elements the current node’s list up the tree. We need to concatenate the item lists when two lists collide.

Then we perform sift on one of the children of the current node. Note that at this point we’re doing the same thing as in a binary heap.

However, we call sift on another child of the node on some cases, which makes the sift calls truly branching. The question is when to do this.

How Many Elements Do We Sift?

This is tricky. If we don’t sift, the height (and thus runtime) would become O(lgN).

But if we sift too much, we can get more than εN elements with keys raised.

We need to use a combination of the size of the tree and size of the current list to decide when to sift and destroy nodes, hence the branching condition is key.

Sift Loop Condition

We call sift twice when the rank of the current tree is large enough (>r for some r) and the rank is odd.

The ‘rank being odd’ condition ensures we never call sift more than twice.

The constant r is used to globally control how much we sift.

One More Detail

We need to keep a rank invariant, which states that a node has at least half as many children as its rank.

This prevents excessive merging of lists. We can keep this condition as follows:

every time we find a violation on root, we dismantle it list and merge the elements of the list to get its subtrees.

Result of the Analysis

The total cost of merging is O(n) by an argument similar to counting the number of carries resulting from incrementing a binary counter N times.

Result on Sift from paper (no proof): Let r=2+2lg(1/ ε ), then sift runs in a O(r) per call, which is O(1)

per operation as ε is constant. We can also show that the runtime of O(1/ ε) is optimal if

at most εN elements can have keys raised. Note that if we set ε=1/2N, no errors can occur and we get a

normal heap back.

Is This Any Useful?

Don’t ever submit this for your CS assignment and expect it to get right answers.

A Problem

Given a list of N numbers, we want to find the kth largest in O(N) time.

Randomized quick-select does it in O(N), but it’s randomized.

The most well-known deterministic algorithm for this involved finding the median of 5 numbers and taking median of medians...basically a mess.

A Simple Deterministic Solution

We insert all N elements in to a soft heap with error rate ε=1/3, and perform extract min N/3 times. Then the largest number deleted has rank between 1/3N and 2/3N.

So we can remove 1/3n numbers from consideration each time (the ones on different side of k) and do the rest recursively.

Runtime: n+(2/3)n+(2/3)2n+(2/3)3n…=O(n)

Other applications

Approximate sorting: sort n numbers so they’re ‘nearly’ ordered.

Dynamic maintenance of percentiles Minimum spanning trees

This is the problem soft heaps were designed to solve. It gives the best algorithm to date.

With soft heap (and another 5~6 pages of work), we can get an O(Eα(E)) algorithm for minimum spanning tree. (α(E) is the inverse Ackerman function)

Bibliography

Chazelle, Bernard. The Soft Heap: An Approximate Priority Queue with Optimal Error Rate.

Chazelle, Bernard. A Minimum Spanning Tree Algorithm with Inverse Ackerman Type Complexity.

Wikipedia