22
CSC 213 – Large Scale Programming Lecture 38: BTrees

CSC 213 – Large Scale Programming Lecture 38: BTrees

Embed Size (px)

Citation preview

Page 1: CSC 213 – Large Scale Programming Lecture 38: BTrees

CSC 213 –Large Scale

Programming

Lecture 38:

BTrees

Page 2: CSC 213 – Large Scale Programming Lecture 38: BTrees

Today’s Goal

Look at using advanced Tree structures Examine BTree implementation of (a,b)-Tree Discuss how to size a BTree

Examine how to implement these structures How we can write classes so trees work well Better ways to manipulate these file systems

Page 3: CSC 213 – Large Scale Programming Lecture 38: BTrees

What is “the BTree?”

BTree - common implementation of (a,b) tree Every BTree has an order Usually talk about “BTree of order m” Internal nodes then have m/2 to m children Root node has m or fewer entries

Actually exist many variants of BTree Differences here are very minor Sticking to vanilla BTrees for this lecture

Page 4: CSC 213 – Large Scale Programming Lecture 38: BTrees

BTree Order

Select order to minimize paging Full node, including entries and references to

children, fills a page with no space left over Each node has at least m/2 entries Each page used is at least 50% full

How many pages touched during operation?

Page 5: CSC 213 – Large Scale Programming Lecture 38: BTrees

Removal from BTree

Swap entry with successor on bottom level If node has fewer than m/2 entries

When possible, move entry from sibling to parent and steal one from parent

Otherwise, merge node with sibling & steal entry from parent But this might propagate underflow to parent node!

Page 6: CSC 213 – Large Scale Programming Lecture 38: BTrees

Where to Find BTrees

Databases very common place to find them Both contain far more data than machine’s RAM Perform lots of data accesses, insertions Need simple, efficient organization

Databases also store data permanently Do not want to ever lose information RAM contents lost when powered off But files stored on hard drive (s — l — o —w)

Page 7: CSC 213 – Large Scale Programming Lecture 38: BTrees

Database Implementation

Maintain BTree in memory… … but keep copy of records on disk

Each Entry has unique ID & its location in file

Entry changes written to disk immediately So file is always kept up-to-date In case of program crash, just re-read file

Ignore virtual memory & instead use file Records in file stored in random order Order of Entrys may change as program runs

Page 8: CSC 213 – Large Scale Programming Lecture 38: BTrees

Better Ways To Access Data

BTrees do not read & write file sequentially Instead they must jump around location in file Also need way to specify each of the Entrys that

exist within file Java’s solution: RandomAccessFile

Page 9: CSC 213 – Large Scale Programming Lecture 38: BTrees

RandomAccessFile

Create new files or work with existing onesRandomAccessFile raf =

new RandomAccessFile(“file.txt”, “rw”);

Creates (or rewrites) file.txt Throws IOException when problem arises Allows program to read & write to the file Use raf to access/modify the file

Page 10: CSC 213 – Large Scale Programming Lecture 38: BTrees

Reading RandomAccessFile

Read from RandomAccessFile instance using: boolean readBoolean(), int readInt(), double readDouble()… Reads and returns the appropriate value

int read(byte[] b) Reads up to b.length bytes & stores back in b Returns number of bytes read

Page 11: CSC 213 – Large Scale Programming Lecture 38: BTrees

Writing to RandomAccessFile

Write to RandomAccessFile instance using: void writeInt(int i), void writeDouble(double d)… Writes the value to the next location in the file Extends the file when at the end of the file Otherwise overwrites whatever data had been there

void write(byte[] b) Write contents of array b to the file Overwrites/extends file as it is needed

Page 12: CSC 213 – Large Scale Programming Lecture 38: BTrees

Typical File I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

char c = ‘’;

while (c != ‘s’) {

c = raf.readChar();

}

This is an example file we accessraf:

Page 13: CSC 213 – Large Scale Programming Lecture 38: BTrees

Typical File I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

char c = ‘’;

while (c != ‘s’) {

c = raf.readChar();raf.writeChar(c);

}

This is an example file we access

Page 14: CSC 213 – Large Scale Programming Lecture 38: BTrees

Typical File I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

char c = ‘’;

while (c != ‘s’) {

c = raf.readChar();raf.writeChar(c);

}

TTis is an example file we access

Page 15: CSC 213 – Large Scale Programming Lecture 38: BTrees

Typical File I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

char c = ‘’;

while (c != ‘s’) {

c = raf.readChar();raf.writeChar(c);

}

TTii is an example file we access

Page 16: CSC 213 – Large Scale Programming Lecture 38: BTrees

Typical File I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

char c = ‘’;

while (c != ‘s’) {

c = raf.readChar();raf.writeChar(c);

}

TTii s an example file we access

Page 17: CSC 213 – Large Scale Programming Lecture 38: BTrees

Typical File I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

char c = ‘’;

while (c != ‘s’) {

c = raf.readChar();raf.writeChar(c);

}

TTii ssan example file we access

Page 18: CSC 213 – Large Scale Programming Lecture 38: BTrees

Skipping Around The File

Can position RandomAccessFile to read from/write to anywhere in file void seek(long pos) moves to position in

file Positions specified as bytes from beginning of file

Page 19: CSC 213 – Large Scale Programming Lecture 38: BTrees

RandomAccessFile I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

char c;

raf.seek(raf.length()-1);

c = raf.readChar();

raf.seek(0);

raf.writeChar(c);

This is an example file we access

Page 20: CSC 213 – Large Scale Programming Lecture 38: BTrees

RandomAccessFile I/O

Ordinarily we read and write files sequentiallyRandomAccessFile raf = new …;

char c;

raf.seek(raf.length()-1);

c = raf.readChar();

raf.seek(0);

raf.writeChar(c);

shis is an example file we access

Page 21: CSC 213 – Large Scale Programming Lecture 38: BTrees

How do we use this?

Use positions to simplify everything Entry contains position of record within file

Simplify building nodes from start of program Record new nodes at end of file Store nodes’ size & number of Entrys at file start Node records ID & position of each of its children

Page 22: CSC 213 – Large Scale Programming Lecture 38: BTrees

For Next Lecture

Review end of graphs, (a, b)Tree, & BTree Come with any questions you still have Last of these problem days for the year…