41
File Structures SNU-OOPSLA Lab. 1 Chap6. Organizing Files for Performance Chap6. Organizing Files for Performance 서서서서서 서서서서서서 서서서서서서서서서서 SNU-OOPSLA-LAB 서서 서 서 서 File structures by Folk, Zoellick and Rica rrdi

Chap6. Organizing Files for Performance

  • Upload
    lamya

  • View
    70

  • Download
    0

Embed Size (px)

DESCRIPTION

File structures by Folk, Zoellick and Ricarrdi. Chap6. Organizing Files for Performance. 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주. Chapter Objectives(1). Look at several approaches to data compression Look at storage compaction as a simple way of reusing space in a file - PowerPoint PPT Presentation

Citation preview

Page 1: Chap6. Organizing Files for  Performance

File Structures SNU-OOPSLA Lab. 1

Chap6. Organizing Files for PerformanceChap6. Organizing Files for Performance

서울대학교 컴퓨터공학부객체지향시스템연구실SNU-OOPSLA-LAB

교수 김 형 주

File structures by Folk, Zoellick and Ricarrdi

Page 2: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 2

Chapter Objectives(1)Chapter Objectives(1)

Look at several approaches to data compression Look at storage compaction as a simple way of reusing space in a file Develop a procedure for deleting fixed-length records that allows vacated file

space to be reused dynamically Illustrate the use of linked lists and stacks to manage an avail list Consider several approaches to the problem of deleting variable-length

records Introduce the concepts associated with the terms internal fragmentation and

external fragmentation

Page 3: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 3

Chapter Objectives(2)Chapter Objectives(2)

Outline some placement strategies associated with the reuse of space in a variable-length record file

Provide an introduction to the idea underlying a binary search Undertake an examination of the limitations of binary searching Develop a keysort procedure for sorting larger files; investigate the costs

associated with keysort Introduce the concept of a pinned record

Page 4: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 4

ContentsContents

6.1 Data compression

6.2 Reclaiming space in files

6.3 Finding things quickly: An Introduction to internal sorting and binary searching

6.4 Keysorting

Page 5: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 5

Data Compression(1)Data Compression(1)

Reasons for data compression less storage transmitting faster, decreasing access time processing faster sequentially

6.1 Data Compression

Page 6: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 6

Data Compression(2)Data Compression(2)::Using a different notationUsing a different notation

Fixed-Length fields are good candidates

Decrease the # of bits by finding a more compact notationex) original state field notation is 16bits, but we can encode with 6bit

notation because of the # of all states are 50

Cons. unreadable by human cost in encoding time decoding modules => increase the complexity of s/w=> used for particular application

6.1 Data Compression

Page 7: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 7

Data Compression(3)Data Compression(3)::Suppressing repeating sequencesSuppressing repeating sequences

Run-length encoding algorithm read through pixels, copying pixel values to file in sequence, except the

same pixel value occurs more than once in succession when the same value occurs more than once in succession, substitute the

following three bytes special run-length code indicator((ex) ff) pixel value repeated the number of times that value is repeated ex) 22 23 24 24 24 24 24 24 24 25 26 26 26 26 26 26 25 24

22 23 ff 24 07 25 ff 26 06 25 24

6.1 Data Compression

Page 8: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 8

화면

pixel

빛의 세기 수치화(digital)

각 pixel 당

전기 신호 (analog)

컴퓨터내 컴퓨터내 imageimage 의 표현의 표현

Page 9: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 9

화면

12 8 12 33 99 1256 7 13 44 66 2312 4 34 57 99 12…...

컴퓨터내컴퓨터내 imageimage 의 표현의 표현

Page 10: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 10

화면12 4 34 57 99 12…...

56 7 13 44 66 2312 4 34 57 99 12…...

12 8 12 33 99 1256 7 13 44 66 2312 4 34 57 99 12…...

** 동영상 --- 초당 25 - 30 개의 정지화상을 교체 (video) (image)

컴퓨터내 컴퓨터내 color color 영상의 표현영상의 표현

Page 11: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 11

Data Compression(3)Data Compression(3)::Suppressing repeating sequencesSuppressing repeating sequences

Run-length encoding (cont’d) example of redundancy reduction cons.

not guarantee any particular amount of space savings under some circumstances, compressed image is larger than

original image Why? Can you prevent this?

6.1 Data Compression

Page 12: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 12

Data Compression(4)Data Compression(4)::Assigning variable-length codesAssigning variable-length codes

Morse code: oldest & most common scheme of variable-length code Some values occur more frequently than others

that value should take the least amount of space Huffman coding

base on probability of occurrence determine probabilities of each value occurring build binary tree with search path for each value more frequently occurring values are given shorter search paths in tree

6.1 Data Compression

Page 13: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 13

Data Compression(5)Data Compression(5)::Assigning variable-length codesAssigning variable-length codes

Huffman coding Letter: a b c d e f g Prob: 0.4 0.1 0.1 0.1 0.1 0.1 0.1 Code: 1 010 011 0000 0001 0010 0011 ex) the string “abde”

101000000001

6.1 Data Compression

Page 14: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 14

d(0000) e(0001) f(0010) g(0011)

b(010) c(011)

a(1)

Huffman TreeHuffman Tree

0

0001

000 001

6.1 Data Compression

Page 15: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 15

Data Compression(6)Data Compression(6)::Irreversible compression techniquesIrreversible compression techniques

Some information can be sacrificed Less common in data files Shrinking raster image

400-by-400 pixels to 100-by-100 pixels 1 pixel for every 16 pixels

Speech compression voice coding (the lost information is of no little or no value)

6.1 Data Compression

Page 16: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 16

Compression in UNIXCompression in UNIX

System V pack & unpack use Huffman codes after compress file, appends “.z” to end of packed file

Berkeley UNIX compress & uncompress use Lempel-Ziv method after compress file, appends “.Z” to end of compressed file

6.1 Data Compression

Page 17: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 17

Record Deletion and Storage CompactionRecord Deletion and Storage Compaction

Storage compaction record deletion : just marks each deleted record reclamation of all deleted records

=> pros : delete/undelete operation with little effort

Ex)

Ames|123|OK|…...|Morrison|9035|OK|Brown|625|IA|…...|

Delete second record

Ames|123|OK|…|*|rrison|9035|OK|Brown|625|IA|…|

After

compaction

Ames|123|OK|…|Brown|625|IA|…|

6. 2 Reclaiming Space in Files

Page 18: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 18

Deleting Fixed-length Records for Deleting Fixed-length Records for Reclaiming Space Dynamically(1)Reclaiming Space Dynamically(1)

Reuse the space from deleted records as soon as possible deleted records must be marked in special way we could find the deleted space

To make record reuse quickly, we need a way to know immediately if there are empty slots in the file a way to jump directly to one of those slots if they exist=> Linked lists or Stacks for avail list* avail list : a list that is made up of deleted records

6. 2 Reclaiming Space in Files

Page 19: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 19

Deleting Fixed-length Records for Deleting Fixed-length Records for Reclaiming Space Dynamically(2)Reclaiming Space Dynamically(2)

Headpointer

RRN5

RRN2

-1

Headpointer

RRN3

PRN5

RRN2

-1

(2)

(3)

2

25

(a)

(b)

after pushing record of RRN 3

Headpointer ptr ptr ptr ptr

-1

The Linked List

The Stack

6. 2 Reclaiming Space in Files

Page 20: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 20

Deleting Fixed-length Records for Deleting Fixed-length Records for Reclaiming Space Dynamically(3)Reclaiming Space Dynamically(3)

Linking and stacking deleted records arranging and rearranging links are used to make one available

record slot point to the next second field of deleted record points to next record

6. 2 Reclaiming Space in Files

Page 21: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 21

0 1 2 3 4 5 6

0 1 2 3 4 5 6

0 1 2 3 4 5 6

Edwards... Betas... Wills... *-1 Masters.. *3 Chavez...

Edwards... *5 Wills... *-1 Masters.. *3 Chavez...

Edwards.. 1st new rec Wills... 3rd new rec Masters.. 2nd new rec Chavez...

Sample file showing linked list of deleted records

List head(first available record) 5 (delete 3, 5 )

List head(first available record) 1 (delete 1)

List head(first available record) -1 (insert three new records)

(a)

(b)

(c)

6. 2 Reclaiming Space in Files

Page 22: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 22

Deleting Variable-length RecordsDeleting Variable-length Records

Avail list of variable-length records it has byte count of record at beginning of each record use byte offset instead of RRN

Adding and removing records in adding records, search through avail list for right size (=>big

enough)

6. 2 Reclaiming Space in Files

Page 23: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 23

Size47

Size38

Size72

Size68

-1

Size47

Size68

-1Size38

Size72

New Link

Removed record

(a)Before removal

(b)After removal

Removal of a record from an avail list with variable-length records

6. 2 Reclaiming Space in Files

Page 24: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 24

Storage FragmentationStorage Fragmentation Internal fragmentation (in fixed-length record)

waste space within a record in variable-length records, minimize wasted space by doing away with

internal fragmentation External fragmentation (in variable-length record)

unused space outside or between individual records three possible solutions

storage compaction coalescing the holes: a single, larger record slot minimizing fragmentation by adopting placement strategy

6. 2 Reclaiming Space in Files

Page 25: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 25

Internal FragmentationInternal Fragmentationin Fixed-length Recordsin Fixed-length Records

Ames | John | 123 Maple | Stillwater | OK | 740751 |...................................

Morrison | Sebastian | 9035 South Hillcrest | Forest Village | OK | 74820 |

Brown | Martha | 625 Kimbark | Des Moines | IA | 50311 | .........................

64-byte fixed-length records

Unused space ->

Internal fragmentation

6. 2 Reclaiming Space in Files

Page 26: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 26

External FragmentationExternal Fragmentationin Variable-length Recordsin Variable-length Records

40 Ames | Jone | 123 Maple | Stillwater | OK | 740751 | 64 Morrison | Sebastian |

9035 South Hillcrest | Forest Village | OK | 74820 | 45 Brown | Martha | 625 Kimb

bark | Des Moines | IA | 50311 |

Record[1] Record[2]

Record[3]

ex) Delete Record[2] and Insert New Record[i] : 12-byte unused space

52 Adams | Kits | 3301 Washington D.C | Forest Village | IA | 43563 |

External fragmentation

recordlength

Record[i]

6. 2 Reclaiming Space in Files

Page 27: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 27

Placement StrategiesPlacement Strategies First-fit

select the first available record slot suitable when lost space is due to internal fragmentation

Best-fit select the available record slot closest in size avail list in ascending order suitable when lost space is due to internal fragmentation

Worst-fit select the largest record slot avail list in descending order suitable when lost space is due to external fragmentation

6. 2 Reclaiming Space in Files

Page 28: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 28

Finding Things Quickly(1)Finding Things Quickly(1) Goal: Minimize the number of disk accesses Finding things in simple field and record files may have many

seeks Binary search algorithm for fixed-sized record

int BinarySearch(FixedRecordFile &file, RecType &obj, KeyType &key)// binary search for key.{

int low = 0; int high = file.NumRecs() - 1;while (low <= high){

int guess = (high - low)/2;file.ReadByRRN(obj, guess);if(obj.Key () == key) return 1; // record foundif*obj.Key() < key) high = guess - 1; // search before guesselse low = guess + 1; // search after guess

}return 0; // loop ended without finding key

}

6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching

Page 29: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 29

Classes and Methods for Binary SearchClasses and Methods for Binary SearchClass KeyType {public

int operator == (KeyType &);

int operator < (KeyType &);

};

class RecType {public: KeyType Key();};

class FixedRecordFile{public:

int NumRecs();

int ReadByRRN (RecType & Record, int RRN);

};

Page 30: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 30

Finding Things Quickly(2)Finding Things Quickly(2)

Binary search vs. Sequential search binary search

O(log n) list is sorted by key

sequential search O(n)

6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching

Page 31: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 31

Finding Things Quickly(3)Finding Things Quickly(3)

Sorting a disk file in RAM read the entire file from disk to memory use internal sort (=sort in memory)

UNIX sort utility uses internal sort Limitations of binary search & internal sort

binary search requires more than one or two access c.f.) single access by RRN

keeping a file sorted is very expensive an internal sort works only on small files

6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching

Page 32: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 32

Internal SortInternal Sort

unsortedfile

unsortedfile

sortedfile

Read the entire file

Sort in memory

disk

memory

6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching

Page 33: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 33

Key Sorting & Its LimitationsKey Sorting & Its Limitations

So called, “tag sort” : sorted thing is “key” only Sorting procedure

Read only the keys into memory Sort the keys Rearrange the records in file by the sorted keys

Advantage less RAM than internal sort

Disadvantages(=Limitations) reading records in disk twice is required a lot of seeking for records for constructing a new(sorted) file

6.4 Keysorting

Page 34: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 34

12

3

k

HARRISON

KELLOG

HARRIS

BELL

.

.

.

.

Harrison|Susan|387 Eastern....

Kellog|Bill|17 Maple....

Harris|Margaret|4343 West....

Bell|Robert|8912 Hill....

KEY RRN Records

In RAM On secondary storage

k3

1

2

HARRISON

KELLOG

HARRIS

BELL

.

.

.

.

Harrison|Susan|387 Eastern....

Kellog|Bill|17 Maple....

Harris|Margaret|4343 West....

Bell|Robert|8912 Hill....

KEY RRN Records

Conceptualview

after sortingkeys

in RAM

Conceptualview

beforesorting

KEYNODES array6.4 Keysorting

Page 35: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 35

Pseudocode for keysort(1)Pseudocode for keysort(1) Program: keysort

open input file as IN_FILE create output file as OUT_FILE

read header record from IN_FILE and write a copy to OUT_FILE REC_COUNT := record count from header record /* read in records; set up KEYNODES array */ for i := 1 to REC_COUNT

read record from IN_FILE into BUFFER extract canonical key and place it in KEYNODES[i].KEY KEYNODES[i].KEY = i

(continued....)

6.4 Keysorting

Page 36: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 36

Pseudocode for keysort(2)Pseudocode for keysort(2) /* sort KEYNODES[].KEY, thereby ordering RRNs correspondingly */ sort(KEYNODES, REC_COUNT)

/* read in records according to sorted order, and write them out in this order */

for i := 1 to REC_COUNT seek in IN_FILE to record with RRN of KEYNODES[I].RRN write BUFFER

contents to OUT_FILE close IN_FILE and OUT_FILE

end PROGRAM

6.4 Keysorting

Page 37: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 37

Two SolutionsTwo Solutions:why bother to write the file back?:why bother to write the file back?

Write out sorted KEYNODES[] array without writing records back in sorted order

KEYNODES[] array is used as index file

6.4 Keysorting

Page 38: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 38

k3

1

2

HARRISON

KELLOG

HARRIS

BELL

.

.

.

.

Harrison|Susan|387 Eastern....

Kellog|Bill|17 Maple....

Harris|Margaret|4343 West....

Bell|Robert|8912 Hill....

KEY RRN Records

Index file Original file

Relationship between the index file and the data file

6.4 Keysorting

Page 39: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 39

Pinned records(1)Pinned records(1)

Records that are referenced to physical location of themselves by other records

Not free to alter physical location of records for avoiding dangling references

Pinned records make sorting more difficult and sometimes impossible solution: use index file, while keeping actual data file in original

order

6.4 Keysorting

Page 40: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 40

Pinned records(2)Pinned records(2)

File with pinned records

Record(i)

Pinned Record

Record (i+1) Pinned Record

delete pinned record

dangling pointer

6.4 Keysorting

Page 41: Chap6. Organizing Files for  Performance

File StructuresSNU-OOPSLA Lab. 41

Let’s Review !!!Let’s Review !!!

6.1 Data compression

6.2 Reclaiming space in files

6.3 Finding things quickly: An Introduction to internal sorting and binary searching

6.4 Keysorting