View
39
Download
0
Category
Tags:
Preview:
DESCRIPTION
CSC 213 – Large Scale Programming. Lecture 21: Indexed Files. Today’s Goals. Look at how Dictionary s used in real world Where this would occur & why they are used there In real world setting, what problems can/do occur Indexed file usage presented and shown - PowerPoint PPT Presentation
Citation preview
LECTURE 21:INDEXED FILES
CSC 213 – Large Scale Programming
Today’s Goals
Look at how Dictionarys used in real world Where this would occur & why they are
used there In real world setting, what problems can/do
occur Indexed file usage presented and
shown How & why we split index & data files Formatting of each file and how they get
used Describe what problems solved using
indexed files Java coding techniques that simplify using
these files Idea needed when using multiple
indexes shown
Dictionaries in Real World
Often need large database on many machines Split search terms across machines Updating & searching work split between
machines Database way too large for any single
machine If you think about it, this is incredibly
common Where?
Split Dictionaries
Split Dictionaries
Splitting Keys From Values
In real world, we often have many indices Simple units measure where we can find
values Values could be searched for in multiple
ways
Splitting Keys From Values
In real world, we often have many indices Simple units measure where we can find
values Values could be searched for in multiple
ways
Index & Data Files
Split information into two (or more) files Data file uses fixed-size records to store
data Index files contain search terms & data
locations Fixed-size records usually used in data
file Each record will use exactly that much
space Extra space wasted if the value is smaller But limits data size, cannot get more space Makes it far easier to reuse space &
rebuild index
Index File Format
No standard format – depends on type of data Often variable sized, but this not specific
requirement Each entry in index file begins with exact
search term Followed by position containing matching
data As a result, often find indexes smushed
together Can read indexes at start of program
execution Reasonably assumes index file smaller than
data file Changes written immediately, however
When program starts, do NOT read data file
Never Read Entire Data File
Indexed Files
Enables splitting search terms across computers Alphabetical split searches faster on many
serversA - C
D-E
F-HI-P
Q-R
S-T
U-X Y-Z
Indexed Files
Enables splitting search terms across computers Create indexes for different types of
searchingSong name
SongLength
How Does This Work?
Using index files simplified using positions Look in index structure to find position of
data in file With this position can then seek to specific
record Create instance & initialize by reading data
from file
Starting with Indexed Files
American Telephone & Telegraph 112International Business Machines
0
Ford Motorcars, Inc. 224
IBM 106
IBM AT & T 23 T Ford 2 F
F 224IBM 0T 112
Where Was "Searching" Used?
Indexed files used in Maps and Dictionarys Read data into searchable object after
opening file For each record, Entry uses indexed data as
its key Single data file has multiple indexes to
search it Not a problem, each index has own Collection
Cannot have multiple instances for each data item
Cannot have single instance for each data item
Then how can we construct each Entry's value?
Proxy Pattern For The Win!
Proxy Pattern For The Win!
Create proxy instances to use as Entry's value Proxy pretends has data by defining getters
& setters Data's position & file only fields these
objects have Whenever method called looks up &
returns data Other classes will think proxy has fields
declared Simplifies using class & ensures up-to-date
data used But little memory needed, since data
resides on disk!
Starting with Indexed Files
American Telephone & Telegraph 112International Business Machines
0
Ford Motorcars, Inc. 224
IBM 106
IBM AT & T 23 T
F 224IBM 0T 112
Ford 12 F
Coding
public class Stock {private static final int NAME_OFF = 0;private static final int NAME_SZ = 50;private static final int PRC_OFF=NAME_OFF + NAME_SZ;private static final int PRC_SZ = 4;private static final int TICK_OFF = PRC_OFF + PRC_SZ;private static final int TICK_SZ = 6;private static final int SIZE = TICK_OFF + TICK_SZ;
private long position;private RandomAccessFile theFile;
public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file;}
Coding
public class Stock {private static final int NAME_OFF = 0; private static final int NAME_SZ = 50;private static final int PRC_OFF=NAME_OFF + NAME_SZ;private static final int PRC_SZ = 4;private static final int TICK_OFF = PRC_OFF + PRC_SZ;private static final int TICK_SZ = 6;private static final int SIZE = TICK_OFF + TICK_SZ;
private long position;private RandomAccessFile theFile;
public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file;}
Fixed max. sizeof each field
Fixed size of a record in data file
Coding
public class Stock {private static final int NAME_OFF = 0;private static final int NAME_SZ = 50;private static final int PRC_OFF=NAME_OFF + NAME_SZ;private static final int PRC_SZ = 4;private static final int TICK_OFF = PRC_OFF + PRC_SZ;private static final int TICK_SZ = 6;private static final int SIZE = TICK_OFF + TICK_SZ;
private long position;private RandomAccessFile theFile;
public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file;}
Offset in recordto field start
Coding
public class Stock { // Continues from last time
public int getStockPrice() { theFile.seek(position + PRC_OFF); return theFile.readInt();}public void setStockPrice(int price) { theFile.seek(position + PRC_OFF); theFile.writeInt(price);}public void setTickerSymbol(String sym) { theFile.seek(position + TICK_OFFSET); theFile.writeUTF(sym);}// More getters & setters from here…
Visualizing Indexed Files
American Telephone & Telegraph 112International Business Machines
0
Ford Motorcars, Inc. 224
F 224IBM 0T 112
IBM 106
IBM AT & T 23 T Ford 12 F
How Do We Add Data?
Adding new records takes only a few steps Add space for record with setLength on
data file Update index structure(s) to include new
record Records in data file updated at each
change
Adding New Data To The Files
C 336F 224IBM 0T 112
0 Ø
American Telephone & Telegraph 112Citibank 336International Business Machines
0
Ford Motorcars, Inc. 224
IBM 106
IBM AT & T 23 T Ford 12 F
Adding New Data To The Files
C 336F 224IBM 0T 112
Citibank -2 C
American Telephone & Telegraph 112Citibank 336International Business Machines
0
Ford Motorcars, Inc. 224
IBM 106
IBM AT & T 23 T Ford 12 F
How Does This Work?
Removing records even easier To prevent using record, remove items from
indexes Do NOT update index file(s) until program
completes Use impossible magic numbers for record in
data file
Removing Data As We Go
C 336F 224IBM 0T 112
American Telephone & Telegraph 112Citibank 336International Business Machines
0
Ford Motorcars, Inc. 224
Citibank -2 CIBM 106
IBM AT & T 23 T Ford 12 F
Removing Data As We Go
C 336IBM 0T 112
American Telephone & Telegraph 112Citibank 336International Business Machines
0
Citibank -2 CIBM 106
IBM AT & T 23 T 0 Ø
Using Multiple Indexes
Multiple indexes for data file very often needed Provides many ways of searching for
important data Since file read individually could also create
problem Multiple proxy instances for data could
be created Duplicates of instance are created for each
index Makes removing them all difficult, since not
linked Very easy to solve: use Map while loading
index Converts positions in file to proxy instances
to solve this
Linking Multiple Indexes
Use one Map instance while reading all indexes For each position in file, check if already in Map
Use existing proxy instance, if position already in Map
If a search in Map returns null, create new instance
Make sure to call put() when we must create proxy
What to Study for Midterm
Study your Maps and Dictionarys When would we use each of the ADTs? Why?
What do their methods do? Why do they differ?
Consider each implementation of these ADTs Explain why method has its given big-Oh
complexity Why use an implementation? Where is it
used? What are negatives or limitations of
implementation? What fields needed by implementation?
Why is this?
What to Study for Midterm
Hash tables How do hash functions work? What does
mod do? How do we add & remove data from hash
table? What are collisions & how do we handle
them? What is real & pretend big-Oh complexity?
Why? Binary Search Trees
How do we add, remove, & search in these trees?
How are data in BSTs organized? Tricks to their use?
How do we code & use BSTs? What methods exist?
What to Study for Midterm
List-based approaches – Why? When? Hash tables
How do hash functions work? What does mod do?
How do we add & remove data from hash table?
What are collisions & how do we handle them?
What is real & pretend big-Oh complexity? Why?
Binary Search Trees How do we add, remove, & search in these
trees? How are data in BSTs organized? Tricks to
their use? How do we code & use BSTs? What
methods exist?
What to Study for Midterm
AVL Trees How do we add, remove, & search in these
trees? How are data in them organized? Tricks to
their use? When must we reorganize tree? How is this
done? Splay Trees
How do we add, remove, & search in these trees?
For each method is node splayed & which one?
How to chain splayings together? When do we stop?
What to Study for Midterm
Class selection & design Where do classes come from? How do we
know? When to use each connection between
classes? How to list methods & fields in UML class
diagram? Comments & Outlines
When, where, and how much? What should & should not be included?
Midterm Process
Open-book & open-note test; do not memorize But have methods & information at your
fingertips Use my slides ONLY with note(s) on that day's
slides Cannot use daily or weekly activities Must submit all printed pages along
with test Problems resembles tone of those
already seen All new problems, however; do not memorize
answers Includes tracing, showing state of ADT,
method returns Coding, big-Oh analysis, and more can be
asked
For Next Lecture
Midterm #1 in class week on Friday
Project #2 available on Angel on Friday, too
Lab phase #2 due on Friday at midnight I still will be out of town, but lab activity will be posted Due week from Friday; chance to use indexed files
No class on Monday; take some time to relax I will be out-of-town serving on an NSF grant panel Updated schedule on Angel accounts for change
Recommended