CS 4432lecture #71 CS4432: Database Systems II Lecture #7 Professor Elke A. Rundensteiner

CS 4432 lecture #7 1

CS4432: Database Systems IILecture #7

Professor Elke A. Rundensteiner


Indexing : helps to retrieve data quicker for certain queries

value= 1,000,000

Select * FROM Emp WHERE salary = 1,000,000;Select * FROM Emp WHERE salary = 1,000,000;

Chapter 4 (chapter 13 in ‘complete book’)

value

record


Topics

• Sequential Files (chap. 4.1)• Secondary Indexes (chap 4.2)


Sequential File

2010

4030

6050

8070

10090


Sequential File

2010

4030

6050

8070

10090

Dense Index

10203040

50607080

90100110120

Every Record

is in Index.


Sequential File

2010

4030

6050

8070

10090

Sparse Index

10305070

90110130150

170190210230

Only first Record

per block in Index.


Sequential File

2010

4030

6050

8070

10090

Sparse 2nd level

10305070

90110130150

170190210230

1090

170250

330410490570


• NOTE:

FILE or INDEX may be layed out on disk as either a contiguous or a block-chained strategy


Question:• Can we (do we want to)

build a dense, 2nd level index for a dense index?

Sequential File2010

4030

6050

8070

10090

2nd level?1030507090

110130150170190210230

1090

170250330410490570

1st level?


Notes on pointers:

(1)Block pointer (sparse index) can be smaller than record pointer

BP

RP

(2) If file is contiguous, then we can omitpointers (i.e., compute them)


K1

K3

K4

K2

R1

R2

R3

R4

say:1024 Bper block

• if we want K3 block: get it at offset (3-1)1024 = 2048 bytes


Sparse vs. Dense Tradeoff

• Sparse: Less index space per record can keep more of index in memory (Later: sparse better for insertions)

• Dense: Can tell if any record exists without accessing file

(Later: dense needed for secondary indexes)


Terms

• Index sequential file• Search key ( primary key)• Primary index (on Sequencing field)• Secondary index• Dense index (all Search Key values

in)• Sparse index• Multi-level index


Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes


Duplicate keys

1010

2010

3020

3030

4540


1010

2010

3020

3030

4540

1010

2010

3020

3030

4540

10101020

20303030

10101020

20303030

Dense index, one way to implement?

Duplicate keys


1010

2010

3020

3030

4540

Dense index, better way?

10203040

Duplicate keys


1010

2010

3020

3030

4540

10102030

Sparse index, one way?

Duplicate keys

care

ful if lookin

gfo

r 2

0 o

r 3

0!


1010

2010

3020

3030

4540

10203030

Sparse index, another way?

Duplicate keys

– place first new key from block

shouldthis be40?


Duplicate values, primary index

• Index may point to first instance ofeach value only

File Index

Summary

aaa

b


Next:

• Duplicate keys




Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150



2010

4030

6050

8070

10305070

90110130150

– delete record 40



2010

4030

6050

8070

10305070

90110130150


4040



2010

4030

6050

8070

10305070

90110130150

– delete records 30 & 40

5070


Deletion from dense index

2010

4030

6050

8070

10203040

50607080


Deletion from dense index

2010

4030

6050

8070

10203040

50607080


4040


Insertion, sparse index case

2010

30

5040

60

10304060



2010

30

5040

60

10304060

– insert record 34

34

• our lucky day! we have free space where we need it!



2010

30

5040

60

10304060


15

2030

20

• Illustrated: Immediate reorganization• Variation:

– insert new block (chained file)– update index



2010

30

5040

60

10304060


25

overflow blocks(reorganize later...)


Insertion, dense index case

• Similar

• Often more expensive . . .


Next:

• Duplicate keys




Secondary indexesSequencefield

5030

7020

4080

10100

6090

Can I make a

Sparse Index?



5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

does not make sense!



5030

7020

4080

10100

6090

• Dense index10203040

506070...

105090...

sparsehighlevel


With secondary indexes:

• Lowest level is dense• Other levels are sparse

Also: Pointers are record pointers

(not block pointers; not computed)


Duplicate values & secondary indexes

1020

4020

4010

4010

4030



1020

4020

4010

4010

4030

10101020

20304040

4040...

one option...

Problem:excess overhead!

• disk space• search time



1020

4020

4010

4010

4030

10

another option...

4030

20Problem:variable sizerecords inindex!



1020

4020

4010

4010

4030

10203040

5060...

Another idea :Chain records with same key?

Problems:• Need to add fields to records• Need to follow chain to know records


Summary : Conventional Indexes

– Basic Ideas: sparse, dense, multi-level…

– Duplicate Keys– Deletion/Insertion– Secondary indexes

Documents

CS 4432lecture #71 CS4432: Database Systems II Lecture #7 Professor Elke A. Rundensteiner