On identification of CR property in file organisation

Volume 9, number 2 INFORMATION PROCESSING LETTERS 17 August 1979

ON IDENTIFICATION OF CR PROPERTY IN FILE ORGANISATION

Abhijit SENGUPTA, Subir BANDYOPADHYAY and Pradip K. SRIMANI Indian Statistical Institute, Calcutta 700 035, India

Received 6 September 1978; revised version received 30 May 1979

File organization, information retrieval

I. Int reduction

Given a set Q = Q1, QP, . . . . Q, of queries regarding the records contained in a file F stored on a sequen- tial mediukn, Q is said to have Consecutive Retrieval Property (CRP) with respect to F, iff the records in F can be arranged such that, fcr all i, the set of records that are to be retrieved in response to the query Qi occupy consecutive storage locations in F. Such an organisation of F is defined to be a Consecutive Retrieval File Organ&at ion (CRFO). Obviously, such a file organisation facilitates fast retrieval of relevant records for any query of Q. Given the set of records of F and the query set Q together with the set of records pertinent to each Qi, the problem of testing the existence of a CRFO of F has been dealt with by many authors [ l-61. Eswaran [l] adopted a graph theoretic approach to solve this problem, which involves the detection of a Hamiltonian path in a directed graph and is applicable only to a restricted class of query sets. In this paper, we present a new approach based on augmentation where each query is examined one at a time and every instant we get all CRFOs of F with the queries considered up to that point.

2. Definitions and notations

Given the query set Q as above, every Qi can be regarded as a set of records that are pertinent to that Qi, i.e., the set of records to be retrieved in response

to Qi. Let rj denote the jfh record of i-‘. if ri is perti-

nent to Qi, then, rj E Qi. We define a list to be a set of elements, where each element may be either II list or a record of F. A list will Fe called an unordered list (UL) if any permutation of the elements of the list is allowed. A list will be called a semiordered list (SL) if no permuiaticrn of its elements is allowed, with the only exception that a permutation resulting in a mirror image of the list is allowed. Let us illu- strate this with an example. (We will enclose a UL in braces and an SL in brackets.) With the above definitions, L = { r1 r2 [r3 r4 rs] r6) is a UL containing an SL = [r3 r4 rs] and { rl r6 [rs r4 rJ] r2) is a valid form of this UL, where rz and rg have been permuted and mirror image permutation of the SL has been done. Several other valid forms of the above UL can be formed. Given a list L(SL or UL), any element of L will be called directly embedded in L. The above L has the four elements rl, ‘2, [ra r4 rs], r6 and all of them are directly embedded in L. If a list L1 is an element of another list L2, then L2 will be called an ancestor of L1 and any ancestor of L2 will also be called an ancestor of L1. Given a list L (SL or UL). any list or record whose ancestor is L, will be called indirectZy embedded in L. Thus, for above L, r6 is directly but r4 is indirectly embedded in L. A set of records S will be said to be covered bj a list L if every element of S is directly or indirectly embedded in L. A list L will be called a minimal cover of a sel‘l of records S if S is covered by L and L is not an ancestor of any list covering S. Thus, if S = { r3 rd.-, , in the above example, the list L is a cover of S but

Volume 9, number 2 INI:ORMATlION PROCESSING LETTERS 17 August 1979

nl ,+ a minimal caver. The minimal cover of S in L is Ir3 r4 rs]. Given a list L and a set of records S, L will

be said to be (a) total contributory w.r.t. S if all the elements

of S are directly or indirectly embedded in L, (b)p&al contributory w.r.t. S if the elements of

a proper subset of S are directly or indirectly embedded in L,

(c) non-cmtributory w.r.t..S if no element of S is

directly or indirectly embedded in L. A trivial list containing a single record will be used

without the use of braces or brackets and, since an SL containing only two records is effectively equiv-

alent to UL containing the same two records, we shall represerrt it as a UL. With all these definitions,

we now proceed to develop an algorithm to test for

CRFO of a file F with respect to a query set Q.

3. The algorithm

Given a query set Q = { Q1, Q2, . . . . QJ, where each Qi denotes the set of records pertinent to the ith query and a record set R = {r 1, r 2, l **% rm), the

set of recbrds of file F, to find CRFO for F with respect to Q, we start with a UL having each element

given by a record of F. Thus, L = { rl r2 *** r,,,) indi- cating every possible permutation is allowed on the elements of L. We select any arbitrary Qi and ‘refine’

L to Li, such that every valid form of Li gives a CRFO with respect to Qi. Our basic algorithm is this

‘refinement’ algorithm, which we are going to describe later. We then select another unconsidered query, say, Qj and refine Li to Lj. Ai a result Lj will give a CRFO with resptct to Qi and Qj. Proceeding recur-

sively in this fashion, when all the queries have been considered, the ultimate ‘refined’ list gives us the

CRFO of F with respect to Q. If CRFO does not exist, we will find at some intermediate step that a refined form with respect 10 some query cannot be

constructed. When CRFO exists, the final refined form will also give all possible CRFO with respect to Q.

Let Us now dkuss the refinement algorithm. Sup-

pose we have a list L (UL or SL) that is a cover of R and a query Qi- L will have as its elements a number of lists. Among these lists, only those that are either partially or totally contributory to Qi will be refined.

94

Thus, from L, find a minimal cover of Qi. Let it be ii.

Evidently, elements of t will be refined after considering Qi. i in its turn will have as its elements a number of lists, some of them will be partially contributory to Qi, some totally and some not. Let S,

and St represent the sets of all partially and totally contributory lists respectively. Since after refinement, the records in Qi must occupy consecutive positions in the refined form of L, we can at once say that the maximum number of elements in S, is two and these two lists must appear at the two ends of the group of lists given by St. If the number exceeds two, CRFO is nonexisting. Let us denote the elements of S, by LPI and Lp2 (if the number is zero or one, we will

assume both or any one void respectively), that of St

by Ltl, Lt2, 00.3 Ltk. If i is an SL, then for CRFO, elements of S, and St must appear in L in the fol- lowing order: L,l followed by any permutation of

Ltl, Lt2, a**¶ Ltk followed by LPz, otherwise we can infer nonexistence of CRFO. However, if f, is a UL, elements of S, and St may appear in any order and permute them to bring into the,above order. Since in this order only Lt I. Ltz, . . . . bk can be permuted among themselves, but the positions of b1 and L,2 become fixed, the list i is refined into the form

followed (or preceded) by the non-contributory lists. Of course, in this form, we have assumed, without

any loss of generality, that Lpl forms the head of the refined form of i and L,z the tail. If i is an SL, the only difference will be that the braces in the above form will be absent. If Lpl and Lps are null, then the brackets are missing.

Now each of L,I and Lp2 may be individually an SL or a UL and each of them will contain some lists totally contributory to Qi, some partially and some not contributory. We will first discuss the refinement of Lpl and show that this refinement will need a recursive procedure. Let Lpll, LPla, . . . . LPls be the lists directly embedded in L,l and are totally contributory in Qi. Evihiently, in the above form, LI1 must be refined such that ihe group of LpI1,

L p12, l **y LPI, appears at t.le right end of LPI :md this group can be only preceded at the left by a single partially contributo:*y list in LPI. Thus b1 can contain only a siklcle Qartially contributory list Lply(say). If there is more than one partially contributory list in

Volume 9, number 2 INFORMATION PROCESSING LETTERS 17 August 1979

LPI, a CRFO does not exist. All the non-contributory lists in LPI will precede L,r, at the left. Hence if Lr,r is an SL, L,,r must be of the form,

Lpi = Kpll lpl2 l **

lists)] Lplq L&non-contributory

or

Lr,t = [(non-contributory lists) LPI, Lplt

Lp 12 *** Lplql Y

remembering that mirror image permutation on L,r is permissible. After LPI has been refined, since no permutation (not even mirror image permutation) can be allowed on L,r, the bracket will be removed after refinement. If L,r is a UL, lists contained in Lt.,1 are permuted to the form,

(non-contributory lists) L,r, Lpll LPI2 l Lpls

and remembering that the non-contributory lists can be permuted among themselves and the same is the case with Lplr, L,r2, . . . . Lpts, the refined form of LPI becomes

{non-contributory lists) LPIP{ LPI1 LPI2 l -* Lr,I,) .

It may be noted that the braces enclosing all the elements of LPI (as it is UL), will be removed.

Similar treatment will be made with L,z. Thus L after refinement turns out to be

[{non-contributory lists of Lr,r] Lr,lP

NJ,11 Lp12 l -- Lplq) hl Lt2 l == Ltk)

{ L,2 1 Lp22 l ** Lp2r 1 Lpzp

(non-contributory lists of LP2)j .

where Lp21r LP22, . . . . LpZr are totally contributory lists and LpzP p artially contributory lists directly embedded in LP2. Once again, Lplp and Lp2,,, the partially contributory lists will have some lists contained in them and we give tirem similar treatments as L,l and LP2 respectively recursively and ultimately we get the final refinement on L, hence on L.

In the above discussions, we have described the logic involved in refining a list with respect to a query. We start with a list Lo, which is a UL and containing all the records, each directly embedded in Le. we select query Q1 and refine b to Lr , then refine LI with respect to Q2 and obtain L2 and SO on. Ulti- mately we arrive at L,, refining Lr,._l with respect to Qn, if there are n queries. If refinement process never

.

fails in these n steps, CR.FO exists, and L, gives the all possible CRFO corresponding to all possible valid forms of Ln. A formal description of the algorithm to refine Li with respect to Qt+r is given below.

Step I: From L;, find L the minimal cover of Qi+t. Step 2: Find LXt, b2, . . . . Lt, and L,r 9 Lp2 and

replace L in Li by [LPI { Lt l I+2 *-* bq) Lp23. Step 3: Replace LPI by (non-contributory lists)

Lplp Lpll Lpl2 l -- L,t9 if L,t is SL otherwise by {non-contribulory lists) Lplp(Lpll LPI2 l ** Lplq).

Step 4: Do simi!ar processing with LP2. Step 5: Using L,r, in place of L,r and Lp2p in

place of Lr,2 repeat Steps 3 and 4 until L,l, and Lp2n each is a single record or null.

4. An illustrative example

Let us consider an example with the query set Q = c&v Q2, . . . . Qr2) and record set R = (rl, r2, . . . . rr7I, and Ql = Ifs, r& Q2 = b-3, u), Q3 =h r5, r6q r71, Q4 = b9, rl0h QS = h, r4, r5, &r r7,

r8, r9, rloh Q6 = hlr hd, Q7 = {r13, f14, r;d,

Q8 ‘= h6r r17)r Q9 = h, hl, rd, QIO =h, r4, k r6, ‘7, ‘8, r9? rl0, hj)r Qll = b-1, r2jr

Q12 = c r2, r14, rid*

We start with Lo = { rl r2 r3 l -m r17), Each list directly embedded in Lo is just a record. Considering Qr, the minimal cover is Lo, all the lists contributing 1.0 Ql are totally contributory; thus, by our algorithm, the refined form of Lo by Qr is LI where LI =

h I-2 ‘*’ r4h r6) r7 -0. q7}. Considering Q2 and refining Lr with 02, we find the minimal cover is L1 and using the same argument as in case of Q1, we have L2, the refined form of LI as, L2 =

bl r2b3 r4) b5 r6) r7 - r17}. Considering Q3, the minimal cover is L2 and the lists (rs r6) and r7 are totally contributory, while { r2 r-4) is partially. Thus the refined form of L2 is given by La as,

La3 = {cl f2k3 r4bS r6) r71 ‘8 “’ r17) *

Proceeding in this manner, it can be found out using the algorithm, that for all i < 11, Li+l, the refined form of b considering the query Qi+l, are given by

L4 = bl r2[r3 r41rS r61 r71 r8b9 rlO) rll -** r171)

95

Volume 9. number 2 INFORMATION PROCESSING LETTERS 17 August 1979

L$i = bl d [r3 r405 rtj) r71 %h ho)) rll ‘*’ r171 3

L6 = frl r2i b3 r4tr5 r6) r71 bdr9 hd

hl r121 r13 -‘* r171 ’

L7 = h f2{ b3 r4b5 dr71 r8{r9 rlO))hl r121

b13 r14 r15) r16 r17) 3

La = ‘?I r2 { b3 r4ir5 r61 r71 %b9 rlO)hI r121

h3 r14 I5b16 r1711 5

i-d = r’ rl r2 {h %fr5 r61 r71 dr9 r1011{r13 r14 r151

h6 r17hl l-1211 1 3

ho = % r2i ! b3 r4rr5 r6) r71 dr9 ho}) r16 r17

h r12jI ir13 r14 h5)) 7

h = !q 1zj.f b3 r4tr5 r61 r71 %ir9 rlO)) r16 r17

L12 = ~ [rl r2h; b;lI:;’ r14 r15I.I 9

!

ii b3 r4fr5 r6) r71 %{r9 hCdh6 rl7

hl r12HI -

This indicates that a CRFO of the above record set is possible with respect to the above query set and every valid form of L12 above gives one possible CRFO.

5. Conclusion

WC have developed in this letter an algorithm to identify a possible CRFQ of a given record set F with an arbitrary query set Q where several theoretical results already stated by Ghosh [S] have been utilized.

No restrictions are imposed or. the queries of the set Q unlike the graph-theoretic approach [l] where the queries in Q are constrained to ;Je pair-wise non-dis- joint (i.e. every pair of queries has to have non zero reply records in common). Thus the present method is efficient over that in [ 1 ] in the sense that it can tackle a much larger genera1 class of query sets. Also the proposed algorithm is evidently faster than n, i.e., faster than to find all possible permutations of the record set for the queries since only a portion of the total record set is to be searched in this algorithm for each query to test the possible CR organisation. Work is continuing to give a rigorous analysis of the execu- tion time of the algorithm and we intend to report it in some later communication.

References

PI

121

131

K.P. Eswaran, Placement of records in a file and file allocation in a computer nelwork, Proc. IFIP Congr., Stockhohn (1974) 304-30:‘. K.P. Eswaran, Consecutive retrieval information system, Ph.D. Thesis, University of California, Berkeley (1973). S.P. Chosh, File organisation: the consecutive retrieval property, Comm. ACM 15 (1972) 802-808. S.P. Ghosh, On the theory of consecutive storage of relevant records, Information Sci. 6 (1973) l-9. S.P. Gho_;h, Data Base Organisation for Data Manage- ment (Academic Press. New York, 1977). A. Walksman and M.W. Green, On the consecutive retrieval property in file organisation, IEEE Trans. ComputetsC-23 (1074) 173-174.

Documents

On identification of CR property in file organisation