42
CS 4432 lecture #7 1 CS4432: Database Systems II Lecture #7 Professor Elke A. Rundensteiner

CS 4432lecture #71 CS4432: Database Systems II Lecture #7 Professor Elke A. Rundensteiner

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

CS 4432 lecture #7 1

CS4432: Database Systems IILecture #7

Professor Elke A. Rundensteiner

CS 4432 lecture #7 2

Indexing : helps to retrieve data quicker for certain queries

value= 1,000,000

Select * FROM Emp WHERE salary = 1,000,000;Select * FROM Emp WHERE salary = 1,000,000;

Chapter 4 (chapter 13 in ‘complete book’)

value

record

CS 4432 lecture #7 3

Topics

• Sequential Files (chap. 4.1)• Secondary Indexes (chap 4.2)

CS 4432 lecture #7 4

Sequential File

2010

4030

6050

8070

10090

CS 4432 lecture #7 5

Sequential File

2010

4030

6050

8070

10090

Dense Index

10203040

50607080

90100110120

Every Record

is in Index.

CS 4432 lecture #7 6

Sequential File

2010

4030

6050

8070

10090

Sparse Index

10305070

90110130150

170190210230

Only first Record

per block in Index.

CS 4432 lecture #7 7

Sequential File

2010

4030

6050

8070

10090

Sparse 2nd level

10305070

90110130150

170190210230

1090

170250

330410490570

CS 4432 lecture #7 8

• NOTE:

FILE or INDEX may be layed out on disk as either a contiguous or a block-chained strategy

CS 4432 lecture #7 9

Question:• Can we (do we want to)

build a dense, 2nd level index for a dense index?

Sequential File2010

4030

6050

8070

10090

2nd level?1030507090

110130150170190210230

1090

170250330410490570

1st level?

CS 4432 lecture #7 10

Notes on pointers:

(1)Block pointer (sparse index) can be smaller than record pointer

BP

RP

(2) If file is contiguous, then we can omitpointers (i.e., compute them)

CS 4432 lecture #7 11

K1

K3

K4

K2

R1

R2

R3

R4

say:1024 Bper block

• if we want K3 block: get it at offset (3-1)1024 = 2048 bytes

CS 4432 lecture #7 12

Sparse vs. Dense Tradeoff

• Sparse: Less index space per record can keep more of index in memory (Later: sparse better for insertions)

• Dense: Can tell if any record exists without accessing file

(Later: dense needed for secondary indexes)

CS 4432 lecture #7 13

Terms

• Index sequential file• Search key ( primary key)• Primary index (on Sequencing field)• Secondary index• Dense index (all Search Key values

in)• Sparse index• Multi-level index

CS 4432 lecture #7 14

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

CS 4432 lecture #7 15

Duplicate keys

1010

2010

3020

3030

4540

CS 4432 lecture #7 16

1010

2010

3020

3030

4540

1010

2010

3020

3030

4540

10101020

20303030

10101020

20303030

Dense index, one way to implement?

Duplicate keys

CS 4432 lecture #7 17

1010

2010

3020

3030

4540

Dense index, better way?

10203040

Duplicate keys

CS 4432 lecture #7 18

1010

2010

3020

3030

4540

10102030

Sparse index, one way?

Duplicate keys

care

ful if lookin

gfo

r 2

0 o

r 3

0!

CS 4432 lecture #7 19

1010

2010

3020

3030

4540

10203030

Sparse index, another way?

Duplicate keys

– place first new key from block

shouldthis be40?

CS 4432 lecture #7 20

Duplicate values, primary index

• Index may point to first instance ofeach value only

File Index

Summary

aaa

b

CS 4432 lecture #7 21

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

CS 4432 lecture #7 22

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

CS 4432 lecture #7 23

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 40

CS 4432 lecture #7 24

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 30

4040

CS 4432 lecture #7 25

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete records 30 & 40

5070

CS 4432 lecture #7 26

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

CS 4432 lecture #7 27

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

– delete record 30

4040

CS 4432 lecture #7 28

Insertion, sparse index case

2010

30

5040

60

10304060

CS 4432 lecture #7 29

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 34

34

• our lucky day! we have free space where we need it!

CS 4432 lecture #7 30

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 15

15

2030

20

• Illustrated: Immediate reorganization• Variation:

– insert new block (chained file)– update index

CS 4432 lecture #7 31

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 25

25

overflow blocks(reorganize later...)

CS 4432 lecture #7 32

Insertion, dense index case

• Similar

• Often more expensive . . .

CS 4432 lecture #7 33

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

CS 4432 lecture #7 34

Secondary indexesSequencefield

5030

7020

4080

10100

6090

Can I make a

Sparse Index?

CS 4432 lecture #7 35

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

does not make sense!

CS 4432 lecture #7 36

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Dense index10203040

506070...

105090...

sparsehighlevel

CS 4432 lecture #7 37

With secondary indexes:

• Lowest level is dense• Other levels are sparse

Also: Pointers are record pointers

(not block pointers; not computed)

CS 4432 lecture #7 38

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

CS 4432 lecture #7 39

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10101020

20304040

4040...

one option...

Problem:excess overhead!

• disk space• search time

CS 4432 lecture #7 40

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10

another option...

4030

20Problem:variable sizerecords inindex!

CS 4432 lecture #7 41

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10203040

5060...

Another idea :Chain records with same key?

Problems:• Need to add fields to records• Need to follow chain to know records

CS 4432 lecture #7 42

Summary : Conventional Indexes

– Basic Ideas: sparse, dense, multi-level…

– Duplicate Keys– Deletion/Insertion– Secondary indexes