36
SOFTWARE IMPLEMENTATION OF A NEW METHOD OF COMBINATOR IAL HASHING bY P. Dubost J. -M. Trousse STAN-CS-75-511 SEPTEMBER 1975 COMPUTER SCIENCE DEPARTMENT School of Humanities and Sciences STANFORD UN IVERS ITY

SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

SOFTWARE IMPLEMENTATION OF A NEW METHODOF COMBINATOR IAL HASHING

bY

P. DubostJ. -M. Trousse

STAN-CS-75-511SEPTEMBER 1975

COMPUTER SCIENCE DEPARTMENTSchool of Humanities and Sciences

STANFORD UN IVERS ITY

Page 2: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

L

L

L

L

L

Software Implementation of a New Method of Combinatorial Hashing

Pierre Dubost and Jean-Michel Trousse. .

Stanford University

Abstract, .

This is a study of the software implementation of a new method of

searching with retrieval on secondary keys.

A new fazr6I.y of partial match file designs is presented, the

'worst case' is determined, a detailed algorithm and program are

given and the average execution time is studied.

This research was supported in part by National Science Foundation grantDCR72-03752AC2and by the Office of Naval Research contract NR 044-4~2.Reproduction in whole or in part is permitted for any purpose of theUnited States Government.

1

Page 3: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

--

1. Retrieval on Secondary Key [2]

L. Common searching techniques use 'primary keys' which uniquely define

a record. But, it is sometimes necessary to make a search on other fields

of a record, called 'secondary keys? We might want to retrieve some

records from a file, given the values of some of these secondary keys.

These values may define zero, one, or several records. A set of values

is called a 'query'.

L.

For example, if we considered an hypothetical CIA file containing.information about all the American people, we might be interested in

knowing which men are married, own two cars, and have been to,France

last year. We do not specify their age, residence, etc.

L

L

L

-

L

We now may assume that each secondary key is mapped by a common

hashing technique into a short bit string. We then need a fast method

of retrieval on a reasonably short binary key with some unspecified bits

(noted * ). This technique must map any specified value of this binary

key into a shorter address. This address will correspond to a group of

records called a 'bucket'. This technique must differ from a common

hashing technique in that it allows some of the bits of the key to remain

unspecified. In this case, it is clear that a query may lead to different

buckets and the better the method is, the fewer buckets there will be for

a given query.

Example: American People File

Record 11 Name Sex Age Married m--m-

Mapping

Binary Key

The previous query might be represented, for example, by: *oHlll--- .

F

2

Page 4: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

i

-

L

L

L

L

L

L

L

2. A New Family of Partial Match File Designs and its Binary Tree

Representation.

W. A. Burkhard has recently presented a new family of partial match

file designs [l]. The interesting aspect of these designs is that they

can be represented by a binary tree. The binary tree leads directly to

a simple software implementation. To obtain an even simpler implementation,

we have slightly modified the Burkhard partial match file (PMF) and

introduced a new family of IMF which has, as we show in the next section,

the same worst case.

The mapping of binary keys into the bucket addresses is described by

a table. Each bucket corresponds to a given value of some of the bits,

the others remaining unspecified. Each entry in the tables gives a

description of-the keys which might be in the corresponding bucket. For

example, the bucket corresponding to the entry *lO*l might contain the

following keys:

01001 , 01011 , 11001 , 11011 l

Any specified key must be mapped into one, and only one, bucket. This

technique works for any query with an odd number of bits. The tables Tn

are constructed by induction.0

TO corresponds to a one bit key and two buckets: 0 0

c l1 1

Te n+l is deduced from Tn by the following method:

0 *

.

.

d cl

Tn :.

*

1*. .. .. .

0

Tn*

1*

Page 5: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

being Tn circularly shifted by one position (instead of the

symmetric image of Tn with columns in the reversing order, as in

the Burkhard technique). It is clear that Tn+l

will have 2n+l columnsand 2 lines. It then maps (2n+l) -bit Keys into 2n+1 buckets.

The representation of each table Tn

simple. A node of the tree Sn

by a tree Sn is then very

contains the position of a bit in the

query to be checked and a leaf contains the address of a bucket.

Let us consider the trees for n = 0 and 1 .

n=O

The table To is0

0 0

c l

which means,1 1

if the bit is 0 , the

Li

corresponding bucket address is 0 and if the bit is 1 , the corres-

ponding bucket address is 1 . We can represent this procedure by a

very simple treesO

.

with the convention that -- if the bit checked at a node is 0 , we go

to the left subtree (here the leaf a ) and to the right sub-tree for

1 (here the leaf 1cl) . If the bit is a -E , we go down in bothsubtrees.

e

=ln

_ Tl is o1

:

0 1 20 0 *01*

II

1* 01*1

the corresponding tree

0A1 2

0 12 36

5 is

Page 6: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

c.

We have seen the passage from Tn to Tn-i-l l

Similarly, we candefine a transformation on the corresponding trees between ('

)‘)nand c

Let us suppose that we have our tree Snn+l l

corresponding to the table 'I' :n

The first bit separates Tn+l into two halves. If the first bitis 0 9 the corresponding bucket address is going to be contained in the

first-half; or, if it is 1 , in the second-half.

root of SW1

subtree.

(node @ ) separates Sn+l

Correspondingly, the

into a left and right

L

1k

L1

In the first-half of Tn+l , Tn is directly inserted, which means

if a query leads to the bucket 1 in Tn , the query constructed with

the preceding query (shifted by 1 position) with a 0 leading bit,

is going to lea&to the same bucket 1 in Tn+l ' So, the left subtree

OfSn+l is nl s but with the content o$ each node increased by one.

In the second-half of Tn+l , Tn is inserted after a circular

shift of one position. In the same way,

is S

the right subtree of Sn+l

n but with the content of each node increased by one, except the

root 00 changed into 02n+2 and the content of each leaf increased

by z?-?

Page 7: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

Let us represent on the same figure, the induction on T.and Sn :

n

Tn

0 k......l

.

.

.

.

.

.

Tn+l

1 -- k+l 2n+2

L-

L

Li ,L

1*. .l .

. .

. .

l .

. .

. .

. .

. .

;;

f

0 . *

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . ..

d .

.

*w

.

.

.

.

.

.

.

-

.

.

.

/

.

.

.

Nodek --) k+l

Leafd --+I

'n

Sn+l

Nodek -+ k+l

Root0 -, 2n+2

Leaf1 --b 1+2n+l

L

It is easy to check the transformation betweenS0 and Sl .

. Let us now determine the various properties of the tree Sn

T4 and the corresponding tree S4 on the next page)(see

.

A --‘n is perfectly balanced.

B -- At level i (0 ,< i < n)

i (denoted by, the content of' the nodes is ei.thcjr

min -) or 2n+l-i (denoted byi '\ rrax

min 1). l

i is always a left son and maxC i a right son.

-- The content of a leaf can be computed directly by the path

used from the root/adding at each level i , 0 if we go

to the left and 2n-i , if we go to the right.

6

Page 8: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

lTrb++~T

T-r-c

c?+HHT

l-7: O*T*-r

T-r0*0we+eT

T@+4*lkt-L-

T0+60G-**T.

TO**-r o-E*T

-rO**OO-wT

O*-mwT*T

o*-roH-r*T

O*@-%-c*T*-r

O*@*O*-r*-r

O**T*TO*-J

G~o*To*T

Gw-rOO*T

0H-E000*T

*T-rr~TO

*T-rO*l-0

,*-r~-r**-ro

*-rO*C*T0

*o*-r*-r*-ro

*o*o*-r*-ro

*owa3-%-ro

*o*oo*-ro

*-mwt-00

-Mrowccoo

s+O*T*TOO

*o*o-E-coo

*HT*lIooo

*~~-rooo

***-roooo

****ooooo

QL94+&ZTO

7: 0

Page 9: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

Properties A , B , C hold forso Y s1 l

Suppose that A , B , C

hold for Sn :

Obviously, A holds for Sn+l .-.

The level i of Sn corresponds to the level i+l of Sn+l * Yor

i>O, the node i of Sn gives i+l and the node 2ntl-i gives

2n+2-i = 2(n+l)+l-(i+l) . For i III 0 , the node 0 of Sn gives 1 in

the left subtree and 2n+2 = 2(n+l)-tl-1 in the right subtree. So, the

property B holds for Sn+l .

The increase at level i+l in Sn+l is either 0 or 2n+l-(i+l)

exactly the same as in level i in Sn . If we follow the same path

in the left subtree of Sn+l as in Sn, we find 1 as in Sn . If we

follow the same path in the right subtree of Sn+l , we find R+2n+l

-=.since we had to go right in Sn+l at level 0 and add 2n+1 l

S o>

c ho1ds for %+l l

L

The representation of this family of partial

so simple a binary tree, leads to an easier proof

a very simple search algorithm.

match file designs, by

of the worst case and

3. Worst Case.

The worst case is expressed in terms of the number of buckets wn(k)

found when k bits are unspecified. W. A. Burkhard has shown that theL

,

worst case for his PMF family can be expressed in terms of the Fibonacci

number in the following way:

L.w,(k) = 2k for 0 < k < n+l- F- - rly&k) = 2n+1-k F2k n+l for n+l- ,< k ,< n+l

y-p) = $-b+l)F2n+4-k for --n+l < k < 2n+l .

Lt

Using the tree, we can prove those same results for our new IT4F family.

Let us set the notation: If the position i < n (or i > n ) of the-query contains a * , all the nodes which are left sons (or right sons)

Page 10: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

I . . .\ -\

we are coing to say that the 1cvc.l i ha:; one star. Conscqucntl~, ifall the nodes at a level i arc -X-s, we arc gointr to say that the Lc V‘CI> d - !i has two stars.

Finally, we represent Sn in the following way=. a node is ticnotctl.bya * if it corresponds to an unspecified bit and by a @ if itcorresponds to a specified bit.

As an example, we can have the following tree:

Since in our study of the worst case we considered first the left son tobe unspecified, a dot corresponds, in fact, to a bit 0 l

We can now study three different facts:t

Fact 1. Let us see how the Fibonacci number arises in the worst case

in studying the special case where all the levels contained one star.

Such a tree with n+l levels is denoged Sz .

For n = 0 , we have *sO

*A

2 buckets = F,

For n = 1 , we have x-sl

*

A0

3 buckets = F 4

Let US assume that the formula for the number of buckets found,

Fr

- holds for nn+3 and n-l , and let us study the case for n+l .

Page 11: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

L

i

i

1Lt

t

i

L

which can be expressed by:

So, # of buckets for Sz+l = # of buckets for S*+# of bucketsn

for S+n 1

= Fn+3 + F(n-l)+T = Fn+3 + Fn+2

= Fn+& = F(n+1)+3

and the number of buckets for S*n = F

n+3 l

Fact 2. Suppose a level i in S+n contains two stars, let us see atwhich level i the number of buckets is maximum. Such a tree can berepresented by the following figure:

i - -1

. . .

J

Fi+2 trees S*n-i are explored and we have F

. i+2 Fn-i+3

buckets.

10

Page 12: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

--

L

iL

LL

1

Now we have: Fi+2Fn-i+3

= Fn+3 + Fj.Fn+l-i l

( or

Then it can be easily shown that this product is maximum for i = 1

i = n+l ) since F Fi n+l-i varies like

L 0,c; in a tree S*C-l) i+1CFn-2i + 2Fn-*l+1) '.

niz-1 or

a level i with two *s gives the worst case for

n+I . I

worst

We can now claim that if we have j levels with two stars

case is obtained when those, the

j levels are assembled either atthe top of the tree or at the bottom.

Let us consider the case where j = 2 . We can represent the tree

in the following way:

.I1

i,,/

For any i2 , the tree S*i2-1 gives

the maximum of entries at level i2

if the level i1 is equal to 1 ,

then we obtained the following

representation: see (B).

\ n. .l2:

LiL!!lSn-l S

n-l

(B) Now S*n-l gives the maximum of

buckets if i2 in S** n-l equals 1 ,

so in Sn' i2 =2 . This procedure

can be clearly extended to any

number j l

11

Page 13: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

L

t

i

LLi

Fact 3. The $I of paths generated by the respective positions of

two stars, is the following:

1 2 3 4i-l

i

D-1

/36

I\/*I\

e 0

d J

* -.

\0

/\0 l

level iremoved

*/\

jl p1

1. On the same level, does not affect each other.

2. On two consecutive levels, the second * affects one of the

paths generated by the other.

3. At least one level apart, each of the paths generated by the

first * is affected by the second one.

4. We can also remark that any level without *s can be removed

from the tree without changing the final number of buckets.

The analysis of the worst case for any k follows easily.

For 0 < k < n+l- F- - r1 , the number of stars in this range allows

us to place all the *s one level apart so, going down the tree, each

new * affects every path generated by the others. So

wn(k) = gkI

(equals the upper bound).

We can remark, also, that if we cancel all the levels without *s, we

get a tree with only *s and k levels.

For n+l- , the number of *s is such that n+l-k

levels have no *s since no levels have two *s for the worst case (see

Fact 3). Thus, we can cancel those n+l-k levels and we are left with

a tree with n-(n+l-k) levels in which n+l-k levels have two stars.

By Fact 2, these n+l-k levels with two *s have to be at the top of

the tree for the worst case.' We can then represent the tree in the

following way:

12

Page 14: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

i

i

I

i

L1

n+l-k levels

*cc

%2[n+l-k]

a

l l .

wn(k) = 2n+1-k Fn-2[n+l-k]+3

and finally,

W,(k) = 2n+1-k FS-n+1 I

2n+l-ktrees*

n-2[ n+l-k]

-=.For n+l ,< k < 2n+l- J the number of *s is such that k-(n+l) levelc

have two stars and the others have one * . By Fact 2, the levels withtwo *s have to be at the top for the worst case. So, the tree can be

represented in the following way:

k-(n+l)levels

We get:

wn(k) z 2k-(n+1) Fn-[k-(n+l)]+3

*

%-[k-(n+l)]A

- a m wn(k) = 2k-(n+1) F2n+4-k 1

J2k-(n+l) -3c

treesSn-[k-(n+l)]

13

Page 15: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

-

4. Searching Algorithm.

As we have previously seen, it is not necessary to keep the tree in

memory because the value of the nodes may be computed when going down this. .tree.

i We will use the following variables:

L = the level in the tree, beginning at zero.

B = the bucket address which is computed during the search.r. . QUERY(i) = the i-th bit of the query (starting at zero).

-

The algorithm is similar to a binary search. When getting to a

node, the bit of the query specified in this node is tested. If it is

zero, the left subtree is explored and if it is one, the right subtree

is explored. In the case of an unspecified bit, the current level and-

bucket address at this node are saved on a stack, then the left subtree

is explored. When getting to a leaf, the bucket address is stored in a

table. If the stack is empty, the search is completed; otherwise, the

level and bucket address of a node are popped from the stack and the

L right subtree of this node is explored.

L

1.L

2 .

I4

i

Algorithm S [Searching all the bucket addresses corresponding to a

given query.]

7J*1 4-.

6.

P

79L

[INITIALIZE.] Set U-0, B+O,I,+0.

[TEST BIT OF QIJERY.] set L +- L-t-1, set B+-2B. If query(i) = '1' ,

go to 7.

[UNSPECIFIED BIT.] If query(i) = '*' , push (L,B) on the stack.

[MOVE LEFT.] Set i +& .

[TEST FOR LEAF.] If L < N , go to 2; otherwise, store B in the

bucket table.

[TEST FOR DONE. ] If stack empty, the algorithm terminates;

otherwise, pop (L,B) from the stack.

[MOVE RIGHT.] Set it-2N+l-L, set B *B+l, goto 'j*

14

Page 16: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

and, in particular, the inner loop corresponding to Algorithm :: in

Appendix 1.

5j* Detailed Study of the Average Search !I3ne.

We will now study 1;h(:, average execution time

all the bucket addre--

IJn(k) .t'or corrr~~uI,irt~:

Oties corresponding to it query with k un::pc:c:iVi~!cJ

bits for a given n .

We will consider, in the following argument, two time costs:

L

cDis the time cost for testing a node and getting to the

next node or leaf (left or right son).

L

cSis the additive cost when encountering a "*" for saving

the parameters in the stack and then restoring them.

LFrom the MIX program, we can see that CD has a value of 8 when

L

1

t

the son is not a leaf, and 9 when the son is a leaf. We can also see

that Cs is equal to 10 .

Dy considering that any time we get to the leaf, except the last

time,wehave to pop new parameters from the stack, we may take. C

and add the extra cost when getting to a leaf,

,, 8

to c,, . 'Chen, WC will-

take in the following study:A>

cs = 11

CD = 8Units of MIX time .

_ Let us now try to express U,(k) as a function of k and n . FOX-

a given n , k can take all the values between 0 and 2n+l . We have

seen the direct representation of Algorithm S by the tree Sn and the

relation of the trees for n and n-t-1 . Those preceding observations

lead us to express the average time for n+l in terms of the average

time for n - We can remark that if the key 0 is not specified, the

average time spent in the left and right sub-tree cannot be considered

equal since the root of each of those subtrees are different.

15

Page 17: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

i

LLiL

In respect of this last remark, we have isolated two types ofaveral:e time:

A,(k) = Average time for n if'. k keys are unspecified, the

first key being specified.

U,(k) = Average time for n if k keys are unspecified, the

first key being unspecified.

Seven different cases, regarding the diverse combinations

first keys and the last one, are going to be considered:

Case 0 1

Case 0 2

Case 0 3

Case 0 4

-

Cash 10

*

I\-x- *

X

/\

of the two

or 1

/\* *

or 1

/\

case 70 0 or 1_-

A /\

0 or 1

I\ /\* *

01Bn+# = A,(k-2)+B (k-l)+ 2Cn+"sn

since we have to explore both subtrees, the average time in one isAn(k-2) and Bn(k-1) in the other one. We have also to follow thetwo edges corresponding to the time 2C and to pop in and out thestack corresponding to

DCS . In the same way we can compute the

16

Page 18: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

02 . .Bn+&k) = 2B (k-2) -t- :!cD i- c:

n S

04An+l 04 = An(k-l)+cD

06An+l(k) = A,(k) + CD

05An+l(k) = Bn(k-l) + CD

07An+l(k> = B,(k) + CD

We have now to compute the probability for each case to occur.

[Remark: for ;I'tl we have 2(n+l)+l = 2n+3 bits ].

L- Case 1

c so

r Number of ways to select

(k-2) items out of

(2n+3)-3 r 2n (since the

items 0 , 1 and 2n+%

are already chosen

T2n

/k-2)

-Number of ways

to fill the

%n+3-k left

bits with 1

32 0.

T2%n+;-kk

01Nn+l = 2 22n+3-k

.

17

Page 19: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

,*‘,

L

‘..’

tI

tIn the same way we obtain:

05Nn+l =

22n+5-k 06N 22n+5-k 07n+l = Nni-1 =

The total number of ways 0

BNn+l when the first bit is unspecified is

given by:

0BNn+l =

22n+3-k .

L The total number. of ways 0ANn+l when the first

@ren by

i 0ANn+l = 2;�n+5-k l

We can easily verify that

Le

and finally that

L

t

c

0B 3 0iNn+l = ' Nn+li=l

0ANn+l =

Nn-U

=N-@ @n+l+ Nn+l =

22n+3-k z

We can derive An+lk) and En+l(")

An+lck) 1=-

0ANn+l

; N@i=4 n+l

.0‘n:lCk) Bn-t-l(“)

=

bit is specified is

5 ,oi=4 n+l

total number of

queries having k

bits unspecified

1-_-

0BNn+l

0-i1:n+.l.'k)

18

Page 20: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

and finally un+l 04

urt+l(k) = + Nntl Anil + N@c

0n-t-1 n+l H,+,(k) l

I

If we write W,(k) =

recurrence formula:U,(k) J we get after computation the following

L

i

Li

'n+lck) = an+.lk + 2W (k-2) +n 3Wn-(k 1) + 'nCk)

with:

and the initial condition:

y)(o) - cJ) 9 wo(l) 2 25) + cs l

Assumin,:

Let US now define the generating finction G,(z)= x Wn(k)zk 9k>O

From (1) we directly deduce that:

G&) = x cxti zkk>O

+ (2z2+3z+1)8 ,(z) .

Using the formula (l+~

in ank l We get:

x ankzk =k>o

I* = Zk we can ::implify the cum-

02 + f3j (1-+z)2n with a = 2~;1)

t-- c,,13 and f:', - f:, .

I/

-vlThen we express Gn-l (z) in t'erms of G

n-2(z) and G

of' G (z) , etcn-2 (z) in terms

n-3 .) we finally obtain:

19

Page 21: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

G(z) = (txz4 I;) y1 (pz2+ fz+ 1

i-=0

Ic,(i:) =- cxz+ 1:

so

jL(l+zy- (n-i) + (?c z;‘+ ;, $ l)”r7

‘Go

( n-i( 92 -t ;z t- :I-) :‘(1 I-z) 2J

2z2+32+1 = (l+z)(1+2zj

L

L

so we have-=.

G,(z) = (az+~)(l+z)2n g .i=O

(z

1

We can remark that (l+2z)n+7- = [ (I.+z)+z]n-+l = x (l+z j r1 t-1 -.j ,jz .

Then the product (l+z)n(l+2z)n+1 is:

(l+z)n(l+2z)n+1 =k;. [ j$o (n;1)(2~~~-J)]~k

and finally

Gn(z) = (a+ :);;c, [,j3;(j (n;J-)(2T_!l;j) -(:!%‘>I;;” .- -

20Ir

Page 22: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

We Wt after Computation a djrcct F~rmu& fo.r{J

n(k) _. - - - i i - -

2n-t1 w,C"> :t k. . ( !

p (cII, k+1 - -I-)

with

‘)LI Cnk - ( )

We can remark that this f*omnula pi,es for

resu1ts that W. could have eq-pct~~d: k -=‘ 0 and gjn-t-1

lJn(o> - p (m-1)

U#n+l) == c,x(~~~+~--JJ .

~Qmptot ic Behavior.

Using the big- onota-t;ion we have:

Similarly

and+ O(nw2)

Page 23: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

i

i

i

L

k

%k =502 -t- & jto. (s>c $>� ( 5 3 - j � : � ) + 0(⌧1-�) l

-

Using the first and second derivatives of the generating f'unction

we can give a direct formula for the sum

so

Cnk =. + -$--- (7-k) 2 k-20

+ O(nM2)

and we finally get for rl,@)

-=. II-

U,!k) - A(k)n + B(k) + O(n-')

1

with (($+' -1)

and B(k) = CX(@)k -1) + $$ @Jk+'++ # (&k)(;)k-l .

m ,

We can verify that for k = 0 , we get A(0) = B(0) = p .

e A program has been written to compute the values of Un(k) . Therecurrence (1) has been used instead of the final formula (2) for an

easier and more efficient implementation. The results for n < 20and graphs showing the variation of Un(k) for a given k

-

are shown in Appendix 2.or a, l';ivc37 n \

The method is very efficient when the number of *s stays with5.n a.

reasonable limit. Assuming a 1 ps cycle time, the algorithm forn = 15 (31 bits), will take only 65 ms with 20 X-s but will take

l-77 s for k = 30 .

22

Page 24: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

In any case, the searching time will be negligible cxpared ~~2 %IIP

time spent far retrieving the records themselves frani sewndaTA- st.xx+.

As we have already seen in the asymptotic behavior of U,(k) , the graphs

show the almost linearity in n for U,(k) , k being fi.xed.

Conclusions

The study of this new technique of retrieval on secondary keys ::hows

two different aspects:

- A very 'good' worst case.

- A very simple and efficient software implementation.

-=.

Acknowledgment

The authors

encouragement in

would like to thank D. E. Knuth for his advice and

this work.

IL

LLt

ReferenceseDl W. A. Burkhard, "Partial Match Retrieval File Designs," Computer \I

Science Division, University of California, San Diego. January, 1975.

[21 De E* muth, The Art of Computer Programming, VOL. 3, Sorting and

Searching (Addison-Wesley, 19~3)~ Section 6.5.

L

23

Page 25: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

Appendix 1

t

MIX Program and in particular

the inner loop for Algorithm S.

J

24

Page 26: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

...

c0.UJ

Page 27: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

- m - - - - w - - - - - - - - - - - - _ - _ - - - e - w - - - e - - - - - - - - _ ------w_-*+*****n*l****rhr*

---------mm _-2 E f-v -.I

I -I-*irxkr~$..*~l.k+*

-t Fi

Page 28: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

... -.....

---------_--.u kL3 5 za’N

7I

w ..

5 ..

m l

-I .

.

P .

g :

.

m .WI .“8 .m 0v) .

.

Page 29: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

!---

-

-

Appendix 2

Results and Graphs for U,(k)

the average execution time.

28

Page 30: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

69ZSb

L5Slf

EEl!F

cc<<7

15211

l;_LCT

tc*t

LE

fS

CLC

S

tilt

ilLZ

2t5

1

3331

1531

GGL

ES

5

6C *,

c*-

‘CL

512

141

hi1

6e

cc1ZSLZ

7.L I?Z

6GS

Sl

IrJ 711

st9e

EL F9

L69*r

lS9E

6252

E(rEl

t+c 1

C86

911

1zs

Q8E

112

EC

2

c*t

to 1

CSt

000cLb

LE

?

BLE

O?

69

LL

88LS

9623

ZL

tE

3EE

Z

2t11

zszt

(rt6

99

9

583

ES

E

LSZ

181

9E

f

55ILa

0cccCc5009

8L

tS

ELac

Zf3BZ

*rE17

ELS 1

SST 1

‘738

919

6w?

SZE

9EZ

111

EZ?

69

39L

cB0I!00006 73E

LLGZ

+261

6 291

f501

(rLL

59s

tt'r

862

STZ

s1;t

Ttt

41959

c00n09@ *

0c01 OL?

91zt

696

131

(rt4

lrLE

1 LZ

561

OE?

5553a+?

4

(UCrccc0c000LC

8

sz9

19

6

LE

E

+r9Z

GL?

*;Zl

?864c’r

$

00

00

0’1

1

C0cc0Cx00cc0cSCb

6tZ

Ll?

55

;

bet

c?Clc00c00Lc0@3s030ccGI 7

L3(

= 33

1:

c751

Ql

Lt

?I

“ST91El

21

!11

0168L9S47EZTC

Page 31: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed
Page 32: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

87t532Etl

47935cr)‘,e

ZL

9LE

LE

9

CO

ZE

S9L

3

cr299+ssz

009SS

47Qd

9fZS

’iSC

T

SC

ES

SS

fl

~0209L01

LElLfSL

biO

tb0S

E910623

t613tlE

63tOOC2

C666191

Eb

lE72

1

SS+bbd

SO

95

39

12

cCSS

CE

7995

38 I tE5Z

II

9L

lOL

8IE

4EL?rZ8E

Z

QZ

189LL

l

SS66lZC.t

54 62186

LL

OL

9ZL

5EE

69ES

81 tf?SbC

QL El1

62

37LY

C TZ

BZ

B’r95 1

665E

311

L3@+7CB

321109

cz

3000DZ

S1lE

BZ

91z19ztz

ZLQ S’C64T

Ol’rtTbt 1

flEl698

E LL

S0

99

SIE

106f

06QL

79E

3E6Q

L92

OLS

EL

OT

‘IbSCS*‘I

20BE

901

t+*laLLC

l+8

95

61

cccccC63L

SS

lCl

OC

SfE

9(il

06E

895L

OTZSi6S

9@16E

(r’r

St9C

CE

E

L5Q

L3b

Z

CE

6ClE

1

L l’r9E

El

C 78tQ6

frCO

ZZL

90

56

ZS

0000nL00cT9Q

LL

OL

83111

EC

S +‘A

8 6C

E’7E

LL

6Z

Lo

afiT

zz

L ltscl?

l

lLfrZ

Z?

l

LS

8EC

6

3bsQ

-,o

08E06’7

Ll

bC’ccccc(000iS?r4?59f I

F1 L/JZE I

09z95c,

561WL

53235;

8091TC

5’1

rcCcCC0CccCcCC’CCL+?33

LE

5’r3’)

!i135+

LS

;zLE

00000

l

11

= s3

00G303@3C0030930(‘311=

33

Page 33: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

‘\\- -_.---- .-----

Page 34: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed
Page 35: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

A-n

Page 36: SOFTWARE IMPLEMENTATION OF A NEW METHOD …i.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75...technique works for any query with an odd number of bits. The tables Tn are constructed

G8

3

OtJ ruw-r”

-i-

Iv--s

-. . . ._ , _ _