Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson John Paul Vergara Department of Computer Science Virginia Tech Spring

Course notes

CS2606: Data Structures andObject-Oriented Development

Ryan RichardsonJohn Paul Vergara

Department of Computer ScienceVirginia TechSpring 2008

Goals of this Course

1. Reinforce the concept that costs and benefits exist for every data structure.

2. Learn the commonly used data structures.– These form a programmer's basic data

structure ``toolkit.'‘

3. Understand how to measure the cost of a data structure or program.– These techniques also allow you to judge the

merits of new data structures that you or others might invent.

The Need for Data Structures

Data structures organize data

more efficient programs.

More powerful computers

more complex applications.

More complex applications demand more calculations.

Complex computing tasks are unlike our everyday experience.

Organizing Data

Any organization for a collection of records can be searched, processed in any order, or modified.

The choice of data structure and algorithm can make the difference between a program running in a few seconds or many days.

Efficiency

A solution is said to be efficient if it solves the problem within its resource constraints.– Space– Time

• The cost of a solution is the amount of resources that the solution consumes.

Selecting a Data Structure

Select a data structure as follows:

1. Analyze the problem to determine the basic operations that must be supported.

2. Quantify the resource constraints for each operation.

3. Select the data structure that best meets these requirements.

Some Questions to Ask

• Are all data inserted into the data structure at the beginning, or are insertions interspersed with other operations?

• Can data be deleted?

• Are all data processed in some well-defined order, or is random access allowed?

Costs and Benefits

Each data structure has costs and benefits.

Rarely is one data structure better than another in all situations.

Any data structure requires:– space for each data item it stores,– time to perform each basic operation,– programming effort.

Costs and Benefits (cont)

Each problem has constraints on available space and time.

Only after a careful analysis of problem characteristics can we know the best data structure for the task.

Bank example:– Start account: a few minutes– Transactions: a few seconds– Close account: overnight

Example 1.2

Problem: Create a database containing information about cities and towns.

Tasks: Find by name or attribute or location

• Exact match, range query, spatial queryResource requirements: Times can be

from a few seconds for simple queries to a minute or two for complex queries

Abstract Data Types

Abstract Data Type (ADT): a definition for a data type solely in terms of a set of values and a set of operations on that data type.

Each ADT operation is defined by its inputs and outputs.

Encapsulation: Hide implementation details.

Data Structure

• A data structure is the physical implementation of an ADT.– Each operation associated with the ADT is

implemented by one or more subroutines in the implementation.

• Data structure usually refers to an organization for data in main memory.

• File structure: an organization for data on peripheral storage, such as a disk drive.

Metaphors

An ADT manages complexity through abstraction: metaphor.– Hierarchies of labels

Ex: transistors gates CPU.

In a program, implement an ADT, then think only about the ADT, not its implementation.

Logical vs. Physical Form

Data items have both a logical and a physical form.

Logical form: definition of the data item within an ADT.– Ex: Integers in mathematical sense: +, -

Physical form: implementation of the data item within a data structure.– Ex: 16/32 bit integers, overflow.

Data Type

ADT:TypeOperations

Data Items: Logical Form

Data Items: Physical Form

Data Structure:Storage SpaceSubroutines

Example 1.8

A typical database-style project will have many interacting parts.

Binary Search Trees

What is good and bad about BSTs?• Space requirements?• Time requirements?• Average vs. Worst Case?• What drives worst-case behavior?

What problems can a BST solve?

Example: BST Template Interfacetemplate <typename T> class BST {

private:

BinNodeT<T>* Root;

// additional private members not shown

public:

BST(); // create empty BST

BST(const T& D); // root holds D

BST(const BST<T>& D); // deep copy support

BST<T>& operator=(const BST<T>& D) const;

bool Insert(const T& D); // insert element

bool Delete(const T& D); // delete element

T* const Find(const T& D); // return access to D

const T* const Find(const T& D) const; // return access to D

void Clear(); // restore to empty

// state

~BST();

};

Spatial Data Structures

What if we want to search for points in 2D or 3D space? How do we store the data?

• Could store separate data structures organizing by x- and y-dimensions (list, BST, etc.)

• This is OK for exact-match queries, but doesn’t handle range queries well

• Could combine

We need a spatial data structure to handle spatial queries.


• Spatial data records include a sense of location as an attribute.

• Typically location is represented by coordinate data (in 2D or 3D).

• If we are to search spatial data using the locations as key values, we need data structures that efficiently represent selecting among more than two alternatives during a search.


• One approach for 2D data is to employ quadtrees, in which each internal node can have up to 4 children, each representing a different region obtained by decomposing the coordinate space.

• There are a variety of such quadtrees, many of which are described in:

– The Quadtree and Related Hierarchical Data Structures, Hanan Samet, ACM Computing Surveys, June 1984

Spatial Decomposition

• In binary search trees, the structure of the tree depends not only upon what data values are inserted, but also in what order they are inserted.

• In contrast, the structure of a Point-Region Quadtree is determined entirely by the data values it contains, and is independent of the order of their insertion.

• In effect, each node of a PR Quadtree represents a particular region in a 2D coordinate space.

Spatial Decomposition

• Internal nodes have exactly 4 children (some may be empty), each representing a different, congruent quadrant of the region represented by their parent node.

• Internal nodes do not store data.• Leaf nodes hold a single data value. • Therefore, the coordinate space is partitioned as insertions

are performed so that no region contains more than a single point.

• PR quadtrees represent points in a finitely-bounded coordinate space.

PR Quadtree Insertion

• Insertion proceeds recursively, descending until the appropriate leaf node is found, and then partitioning and descending until there is no more than one point within the region represented by each leaf.

• It is possible for a single insertion to add many levels to the relevant subtree, if points lie close enough together.

• Of course, it is also possible for an insertion to require no splitting whatsoever.

• The shape of the tree is entirely independent of the order in which the data elements are added to it.

PR Quadtree Deletion

• Deletion of elements is more complex (naturally) and may involve collapsing of nodes.

• Since deletion is not required for the first project, a discussion of the details will be deferred until a later time.

PR Quadtree Node Implementation

• Of course, the PR Quadtree will be implemented as a C++ template.

• However, it may be somewhat less generic than the general BST discussed earlier.

• During insertion and search, it is necessary to determine whether one point lies NW, NE, SE or SW of another point. Clearly this cannot be accomplished by using the usual relational operators to compare points.

PR Quadtree Node Implementation

Two possible approaches:1) have the data type provide accessors for the x- and y-

coordinates2) have the type provide a comparator that returns NW, NE,

SE or SW• Either is feasible. It is possible to argue either is better,

depending upon the value placed upon various design goals. It is also possible to deal with the issue in other ways.

• In any case, the PR Quadtree implementation will impose fairly strict requirements on any data type that is to be stored in it.

PR Quadtree Implementation

Here's a possible PR Quadtree interface:template <typename T> class prQuadTree {public:

prQuadTree(int xMinimum, int xMaximum, int yMinimum, int yMaximum);

bool Insert(const T& Elem);bool Delete(const T& Elem);T* const Find(const T& Elem);const T* const Find(const T& Elem) const;void Display(std::ostream& Out) const;

private:prQuadNode<T>* Root;int xMin, xMax, yMin, yMax;

// . . .};

PR Quadtree Implementation

Some comments:• the tree must be created to organize data elements that lie

within a particular, bounded region for the partitioning logic to be correct

• the question of how to manage different types for internal and leaf nodes raises some fascinating design and coding issues…

• there will, of course, be a number of private, recursive helper functions

• how to display the tree also raises some fascinating issues…

Documents

Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson John Paul Vergara Department of Computer Science Virginia Tech Spring