View
224
Download
1
Category
Preview:
Citation preview
File Structure SNU-OOPSLA Lab. 1
Chap 7Chap 7. Indexing. Indexing
서울대학교 컴퓨터공학과
객체지향시스템연구실
SNU-OOPSLA-LAB
김 형 주 교수
File Structures by Folk, Zoellick, and Ricarrdi
File Structure SNU-OOPSLA Lab. 2
Chapter Objectives(1)Chapter Objectives(1)
Introduce concepts of indexing that have broad applications in the design of file systems
Introduce the use of a simple linear index to provide rapid access to records in an entry-sequenced, variable-length record file
Investigate the implementation of the use of indexes for file maintenance
Introduce the template features of C++ for object I/O Describe the object-oriented approach to indexed sequential
files
File Structure SNU-OOPSLA Lab. 3
Chapter Objectives(2)Chapter Objectives(2)
Describe the use of indexes to provide access to records by more than one key
Introduce the idea of an inverted list, illustrating Boolean operations on lists
Discuss of when to bind an index key to an address in the data file
Introduce and investigate the implications of self-indexing files
File Structure SNU-OOPSLA Lab. 4
Contents(1)Contents(1)
7.1 What is an Index?
7.2 A Simple Index for Entry-Sequenced Files
7.3 Using Template Classes in C++ for Object I/O
7.4 Object-Oriented Support for Indexed, Entry-
Sequenced Files of Data Objects
7.5 Indexes That Are Too Large to Hold in Memory
File Structure SNU-OOPSLA Lab. 5
Contents(2)Contents(2)
7.6 Indexing to Provide Access by Multiple Keys
7.7 Retrieval Using Combinations of Secondary Keys
7.8 Improving the Secondary Index Structure:
Inverted Lists
7.9 Selective Indexes
7.10 Binding
File Structure SNU-OOPSLA Lab. 6
Overview: Index(1)Overview: Index(1) Index: a data structure which associates given key values with
corresponding record numbers It is usually physically separate from the file (unlike for indexed
sequential files tight binding). Linear indexes (like indexes found at the back of books)
Index records are ordered by key value as in an ordered relative file
Best algorithm for finding a record with a specific key value is binary search
Addition requires reorganization
7.1 What Is an Index?
File Structure SNU-OOPSLA Lab. 7
Overview: Index(2)Overview: Index(2)
k1 k2 k4 k5 k7 k9
k1 k2 k4 k5 k7 k9
AAA ZZZ CCC XXX EEE FFF
Index File
Data File
7.1 What Is an Index?
File Structure SNU-OOPSLA Lab. 8
Overview: Index(3)Overview: Index(3)
Tree Indexes (like those of indexed sequential files) Hierarchical in that each level Beginning with the root level, points to the next record Leaves POINTs only the data file
Indexed Sequential File Binary Tree Index AVL Tree Index B+ tree Index
7.1 What Is an Index?
File Structure SNU-OOPSLA Lab. 9
Roles of Index?Roles of Index?
Index: keys and reference fields
Fast Random Accesses
Uniform Access Speed
Allow users to impose order on a file without
actually rearranging the file
Provide multiple access paths to a file
Give user keyed access to variable-length
record files
7.1 What Is an Index?
File Structure SNU-OOPSLA Lab. 10
A Simple Index(1)A Simple Index(1) Datafile
entry-sequenced, variable-length record
primary key : unique for each entry in a file
Search a file with key (popular need) cannot use binary search in a variable-length recor
d file(can’t know where the middle record)
construct an index object for the file
index object : key field + byte-offset field
7.2 A Simple Index for E-S Files
File Structure SNU-OOPSLA Lab. 11
A Simple Index (2)A Simple Index (2)
ANG3795 167
COL31809 353
COL38358 211
DG18807 256
FF245 442
LON2312 32
MER75016 300
RCA2626 77
WAR23699 132
DG139201 396
LON|2312|Romeo and Juliet|Prokofiev . . .
RCA|2626|Quarter in C Sharp Minor . . .
WAR|23699|Touchstone|Corea . . .
ANG|3795|Sympony No. 9|Beethoven . . .
COL|38358|Nebeaska|Springsteen . . .
DG|18807|Symphony No. 9|Beethoven . . .
MER|75016|Coq d'or Suite|Rimsky . . .
COL|31809|Symphony No. 9|Dvorak . . .
DG|139201|Violin Concerto|Beethoven . . .
FF|245|Good News|Sweet Honey In The . . .
32
77
132
167
211
256
300
353
396
442
Datafile
Actual data recordAddress ofrecord
Referencefield
KeyIndexfile
7.2 A Simple Index for E-S Files
File Structure SNU-OOPSLA Lab. 12
A Simple Index (3)A Simple Index (3) Index file: fixed-size record, sorted
Datafile: not sorted because it is entry sequenced
Record addition is quick (faster than a sorted file) Can keep the index in memory
find record quickly with index file than with a sorted one
Class TextIndex encapsulates the index data and index operations
Key Reference field
7.2 A Simple Index for E-S Files
File Structure SNU-OOPSLA Lab. 13
Let’s See Figure 7.4Let’s See Figure 7.47.2 A Simple Index for E-S Files
Class TextIndex{ public: TextIndex(int maxKeys = 100, int unique = 1);
int Insert(const char*ckey, int recAddr); //add to index int Remove(const char* key); //remove key from index int Search(const char* key) const;
//search for key, return recAddr void Print (ostream &) const; protected: int MaxKeys; // maximum num of entries int NumKeys;// actual num of entries char **Keys; // array of key values int* RecAddrs; // array of record references int Find (const chat* key) const; int Init (int maxKeys, int unique); int Unique;// if true --> each key must be unique}
File Structure SNU-OOPSLA Lab. 14
Index ImplementationIndex Implementation
Page 638, 639, 640 G.1 Recording.h G.2 Recording.cpp G.3 Makere.cpp
Page 641, 642 G.4 Textind.h G.5 Textind.cpp
File Structure SNU-OOPSLA Lab. 15
RetrieveRecording with the Index RetrieveRecording with the Index RetrieveRecording(KEY...) procedure : retrieve a single record by ke
y from datafile. And puts together the index search, file read, and buf
fer unpack operations into single function
int RetriveRecording (Recording & recording, char * key,
TextIndex & RecordingIndex, BufferFile & RecordingFile)
// read and unpack the recording, return TRUE if succeeds
{ int result;
result = RecordingFile . Read (RecordingIndex.Search(key));
if (result == -1) return FALSE;
result = recording.Unpack (RecordingFile.GetBuffer());
return result;
}
File Structure SNU-OOPSLA Lab. 16
Template Class RecordFile we want to make the following code possible
Person p; RecordFile pFile; pFile.Read(p); Recording r; RecordFile rFile; rFile.Read(r);
difficult to support files for different record types without having to modify the class
Template class which is derived from BufferFile the actual declarations and calls
RecordFile <Person> pFile; pFile.Read(p); RecordFile <Recording> rFile; rFile.Read(p);
Template Class for I/O Object(1)Template Class for I/O Object(1)
7.3 Using Template Classes in C++ for Object I/O
File Structure SNU-OOPSLA Lab. 17
Template Class for I/O Object(2)Template Class for I/O Object(2) Template Class RecordFile
7.3 Using Template Classes in C++ for Object I/O
template <class RecType>class RecordFile : public BufferFile{ public:
int Read(RecType& record, int recaddr = -1); int Write(const RecType& record, int recaddr = -1); int Append(const RecType& record); RecordFile(IOBuffer& buffer) : BufferFile(buffer) {}
};//The template parameter RecType must have the following methods//int Pack(IOBuffer &); pack record into buffer//int Unpack(IOBuffer &); unpack record from buffer
File Structure SNU-OOPSLA Lab. 18
Adding I/O to an existing class RecordFile add methods Pack and Unpack to class Recording create a buffer object to use in the I/O
DelimFieldBuffer Buffer; declare an object of type RecordFile<Recording>
RecordFile<Recording> rFile (Buffer);
Declaration and Calls
Template Class for I/O Object(3)Template Class for I/O Object(3)
7.3 Using Template Classes in C++ for Object I/O
Recording r1, r2;rFile.Open(“myfile”);rFile.Read(r1);rFile.Write(r2);
Directly open a file and read andwrite objects of class Recording
File Structure SNU-OOPSLA Lab. 19
Object-Oriented Approach to I/OObject-Oriented Approach to I/O
Class IndexedFile add indexed access to the sequential access provided by class
RecordFile extends RecordFile with Update, Append and Read method
Update & Append : maintain a primary key index of data file Read : supports access to object by key
TextIndex, RecordFile ==> IndexedFile Issues of IndexedFile
how to make a persistent index of a file how to guarantee that the index is an accurate reflection of the con
tents of the data file
7.4 OO Support for Indexed, E-S Files of Data Objects
File Structure SNU-OOPSLA Lab. 20
Create the original empty index and data files Load the index file into memory Rewrite the index file from memory Add records to the data file and index Delete records from the data file Update records in the data file Update the index to reflect changes in the data file Retrieve records
7.4 OO Support for Indexed, E-S Files of Data Objects
Basic Operations of IndexedFile(1)Basic Operations of IndexedFile(1)
File Structure SNU-OOPSLA Lab. 21
Basic Operations of TextIndexedFile (1)Basic Operations of TextIndexedFile (1) Creating the files
initially empty files (index file and data file) created as empty files with header records
implementation ( makeind.cpp in Appendix G ) Create method in class BufferFile
Loading the index into memory loading/storing objects are supported in the IOBuffer classes need to choose a particular buffer class to use for an index
file ( tindbuff.cpp in Appendix G ) define class TextIndexBuffer as a derived class of FixedFieldBuffer to
support reading and writing of index objects
7.4 OO Support for Indexed, E-S Files of Data Objects
File Structure SNU-OOPSLA Lab. 22
Rewriting the index file from memory part of the Close operation on an IndexedFile write back index object to the index file should protect the index when failure write changes when out-of-date(use status flag) Implementation
Rewind and Write operations of class BufferFile
Record Addition
7.4 OO Support for Indexed, E-S Files of Data Objects
Basic Operations of TextIndexedFile(2)Basic Operations of TextIndexedFile(2)
Add an entry to the index
Requires rearrangementif in memory, no file access using TextIndex.Insert
Add a new record to data file
using RecordFile<Recording>::Write
+
File Structure SNU-OOPSLA Lab. 23
Record Deletion data file: the records need not be moved index: delete entry really or just mark it
using TextIndex::Delete
Record Updating (2 categories) the update changes the value of the key field
delete/add approach
reorder both the index and the data file
the update does not affect the key field no rearrangement of the index file
may need to reconstruct the data file
7.4 OO Support for Indexed, E-S Files of Data Objects
Basic Operations of TextIndexedFile(3)Basic Operations of TextIndexedFile(3)
File Structure SNU-OOPSLA Lab. 24
Class TextIndexedFile(1)Class TextIndexedFile(1)
Members methods
Create, Open, Close, Read (sequential & indexed), Append, and Update operations
protected members ensure the correlation between the index in memory (Index),
the index file (IndexFile), and the data file (DataFile) char* key()
the template parameter RecType must have the key method used to extract the key value from the record
7.4 OO Support for Indexed, E-S Files of Data Objects
File Structure SNU-OOPSLA Lab. 25
Class TextIndexedFile(2)Class TextIndexedFile(2)7.4 OO Support for Indexed, E-S Files of Data Objects
Template <class RecType>class TextIndexedFile{ public:
int Read(RecType& record); // read next recordint Read(char* key, RecType& record) // read by key int Append(const RecType& record);int Update(char* oldKey, const RecType& record);int Create(char* name, int mode=ios::in|los::out);int Open(char* name, int mode=ios::in|los::out);int Close();TextIndexedFile(IOBuffer & buffer, int keySize, int maxKeys=100);~TextIndexedFile(); // close and delete
protected:TextIndex Index; BufferFile IndexFile;TextIndexBuffer IndexBuffer;RecordFile<RecType> DataFile;char * FileName; // base file name for fileint SetFileName(char* fName, char*& dFileName, char*&IdxFName);
};
File Structure SNU-OOPSLA Lab. 26
Enhancements to TextIndexedFile(1)Enhancements to TextIndexedFile(1)
Support other types of keys Restriction: the key type is restricted to string (char *) Relaxation: support a template class SimpleIndex with
parameter for key type
Support data object class hierarchies Restriction: every object must be of the same type in
RecordFile Relaxation: the type hierarchy supports virtual pack methods
7.4 OO Support for Indexed, E-S Files of Data Objects
File Structure SNU-OOPSLA Lab. 27
Enhancements to TextIndexedFile(2)Enhancements to TextIndexedFile(2)
7.4 OO Support for Indexed, E-S Files of Data Objects
Support multirecord index files Restriction: the entire index fit in a single record Relaxation: add protected method Insert, Delete, and Searc
h to manipulate the arrays of index objects
Active optimization of operations Obvious: the most obvious optimization is to use binary sea
rch in the Find method Active: add a flag to the index object to avoid writing the ind
ex record back to the index file when it has not been changed
File Structure SNU-OOPSLA Lab. 28
Where are we going?Where are we going?
Plain Stream File
Persistency ==> Buffer support ==> BufferFile
<incremental approach> Deriving BufferFile using
various other classes
Random Access ==> Index support => IndexedFile
<incremental approach> : Deriving TextIndexedFile using RecordFile and TextIndex
File Structure SNU-OOPSLA Lab. 29
Too Large Index(1)Too Large Index(1)
On secondary storage (large linear index) Disadvantages
binary searching of the index requires several seeks(slower than a sorted file)
index rearrangement requires shifting or sorting records on second storage
Alternatives (to be considered later) hashed organization tree-structured index (e.g. B-tree)
7.5 Indexes That Are Too Large to Hold in Memory
File Structure SNU-OOPSLA Lab. 30
Too Large Index (2)Too Large Index (2)
Advantages over the use of a data file sorted by key
even if the index is on the secondary storage can use a binary search
sorting and maintaining the index is less expensive than doing
the data file
can rearrange the keys without moving the data records if
there are pinned records
7.5 Indexes That Are Too Large to Hold in Memory
File Structure SNU-OOPSLA Lab. 31
Index by Multiple Keys(1)Index by Multiple Keys(1)
DB-Schema = ( ID-No, Title, Composer, Artist, Label)
Find the record with ID-NO “COL38358” (primary key - ID-No)
Find all the recordings of “Beethoven” (2ndary key - composer)
Find all the recordings titled “Violin Concerto” (2ndary key - title)
7.6 Indexing to Provide Access by Multiple Keys
File Structure SNU-OOPSLA Lab. 32
Index by Multiple Keys(2)Index by Multiple Keys(2)
Most people don’t want to search only by primary key
Secondary Key can be duplicated Figure -->
Secondary Key Index secondary key --> consult
one additional index (primary key index)
BEETHOVEN ANG3795
BEETHOVEN DG139201
BEETHOVEN COL38358
COREA WAR23699
DVORAK COL31809
PROKOFIEV LON2312
RIMSKY-KORSAKOV MER75016
SPRINGSTEEN COL38358
SWEET HONEY IN THE R FF245
BEETHOVEN DG18807
Secondary key Primary key
Composer index
BEETHOVEN DG18807
7.6 Indexing to Provide Access by Multiple Keys
File Structure SNU-OOPSLA Lab. 33
Secondary Index:Basic Operations(1)Secondary Index:Basic Operations(1) Record Addition
similar to the case of adding to primary index secondary index is stored in canonical form
fixed length (so it can be truncated) original name can be obtained from the data file
can contain duplicate keys local ordering in the same key group
7.6 Indexing to Provide Access by Multiple Keys
File Structure SNU-OOPSLA Lab. 34
Secondary Index:Basic Operations (2)Secondary Index:Basic Operations (2) Record Deletion (2 cases)
Secondary index references directly record delete both primary index and secondary index rearrange both indexes
Secondary index references primary key delete only primary index leave intact the reference to the deleted record advantage : fast disadvantage : deleted records take up space
7.6 Indexing to Provide Access by Multiple Keys
File Structure SNU-OOPSLA Lab. 35
Secondary Index: Basic Operations (3)Secondary Index: Basic Operations (3)
Record Updating primary key index serves as a kind of protective
buffer Secondary index references directly record
update all files containing record’s location
Secondary index references primary key (1) affect secondary index only when either primary or
secondary key is changed
Continued.
7.6 Indexing to Provide Access by Multiple Keys
File Structure SNU-OOPSLA Lab. 36
Secondary Index: Basic Operations (4)Secondary Index: Basic Operations (4)
Secondary index references primary key(2) when changes the secondary key
rearrange the secondary key index
when changes the primary key
update all reference field
may require reordering the secondary index
when confined to other fields
do not affect the secondary key index
7.6 Indexing to Provide Access by Multiple Keys
File Structure SNU-OOPSLA Lab. 37
Retrieval of RecordsRetrieval of Records Types
primary key access
secondary key access
combination of above
Combination of keys using secondary key index, it is easy
boolean operation (AND, OR)
7.7 Retrieval Using Combinations of Secondary Keys
File Structure SNU-OOPSLA Lab. 38
Inverted Lists(1)Inverted Lists(1) Inverted List
a secondary key leads to a set of one or more primary keys
Disadvantages of 2nd-ary index structure rearrange when adding
repeated entry when duplicating
Solution A: by an array of references
Solution B: by linking the list of references
7.8 Improving the Secondary Index Structure
File Structure SNU-OOPSLA Lab. 39
Array of ReferencesArray of References
BEETHOVEN ANG3795 DG139201 DG18807 RCA2626
COREA WAR23699
DVORAK COL31809
PROKOFIEV LON2312
RIMSKY-KORSAKOV MER75016
SPRINGSTEEN COL38358
SWEET HONEY IN THE R FF245
Secondary key Set of primary key references
Revised composer index
7.8 Improving the Secondary Index Structure
* no need to rearrange
* limited reference array
* internal fragmentation
File Structure SNU-OOPSLA Lab. 40
Inverted Lists (2)Inverted Lists (2) Guidelines for better solution
no reorganization when adding no limitation for duplicate key no internal fragmentation
Solution B: by Linking the list of references
A list of primary key references
secondary key field, relative record number of the
first corresponding primary key reference
7.8 Improving the Secondary Index Structure
PROKOFIEV ANG36193
LON2312
File Structure SNU-OOPSLA Lab. 41
Linking List of References (1)Linking List of References (1)
BEETHOVEN
COREA
PROKOFIEV
RIMSKY-KORSAKOV
SPINGSTEEN
SWEET HONEY IN THE R
DVORAK
3
2
7
10
6
4
9
LON2312
RCA2626
ANG23699
COL38358
DG18807
MER75016
COL31809
DG139201
ANG36193
WAR23699
-1
-1
-1
8
-1
1
-1
-1
5
0
0
1
2
3
4
5
6
7
8
9 FF245 -1
Secondary Index file Label ID List file
Improved revision of the composer index
0
1
2
3
4
5
6
10
7.8 Improving the Secondary Index Structure
File Structure SNU-OOPSLA Lab. 42
Linking List of References (2)Linking List of References (2) The primary key references in a separate, entry-
sequenced file Advantages
rearranges only when secondary key changes rearrangement is quick less penalty associated with keeping the secondary index file on
secondary storage (less need for sorting) Label ID List file not need to be sorted reusing the space of deleted record is easy
7.8 Improving the Secondary Index Structure
File Structure SNU-OOPSLA Lab. 43
Linking List of References (3)Linking List of References (3)
Disadvantage same secondary key references may not be
physically grouped lack of locality could involve a large amount of seeking solution: reside in memory
same Label ID list can hold the lists of a number of secondary index files
if too large in memory, can load only a part of it
7.8 Improving the Secondary Index Structure
File Structure SNU-OOPSLA Lab. 44
Selective IndexesSelective Indexes
Selective Index: Index on a subset of records
Selective index contains only some part of
entire index provide a selective view
useful when contents of a file fall into several
categories e.g. 20 < Age < 30 and $1000 < Salary
7.9 Selective Indexes
File Structure SNU-OOPSLA Lab. 45
Index Binding(1)Index Binding(1)
When to bind the key indexes to the physical address of its associated record?
File construction time binding
(Tight, in-the-data binding) tight binding & faster access the case of primary key when secondary key is bound to that time
simpler and faster retrieval reorganization of the data file results in modifications of
all bound index files
7.10 Binding
File Structure SNU-OOPSLA Lab. 46
Index Binding (2)Index Binding (2) Postpone binding until a record is actually
retrieved (Retrieval-time binding) minimal reorganization & safe approach mostly for secondary key
Tight, in-the-data binding is good when static, little or no changes rapid performance during retrieval mass-produced, read-only optical disk
7.10 Binding
File Structure SNU-OOPSLA Lab. 47
Let’s Review (1)Let’s Review (1)
7.1 What is an Index?
7.2 A Simple Index for Entry-Sequenced Files
7.3 Using Template Classes in C++ for Object I/O
7.4 Object-Oriented Support for Indexed, Entry-
Sequenced Files of Data Objects
7.5 Indexes That Are Too Large to Hold in Memory
File Structure SNU-OOPSLA Lab. 48
Let’s Review(2)Let’s Review(2)
7.6 Indexing to Provide Access by Multiple Keys
7.7 Retrieval Using Combinations of Secondary Keys
7.8 Improving the Secondary Index Structure:
Inverted Lists
7.9 Selective Indexes
7.10 Binding
Recommended